[ovs-dev] [PATCH v2 2/2] netdev-dpdk: Support user-defined socket attribs

Thu Jul 7 01:42:44 UTC 2016

On 06/07/2016 11:26, "Aaron Conole" <aconole at redhat.com> wrote:

>Ansis Atteka <aatteka at nicira.com> writes:
>
>> On Wed, Jul 6, 2016 at 7:24 AM, Aaron Conole <aconole at redhat.com> wrote:
>>> Aaron Conole <aconole at redhat.com> writes:
>>>
>>>> Daniele Di Proietto <diproiettod at vmware.com> writes:
>>>>
>>>>> On 10/06/2016 10:51, "Aaron Conole" <aconole at redhat.com> wrote:
>>>>>
>>>>>>Aaron Conole <aconole at redhat.com> writes:
>>>>>>
>>>>>>> Christian Ehrhardt <christian.ehrhardt at canonical.com> writes:
>>>>>>>
>>>>>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Daniele Di Proietto <diproiettod at vmware.com> writes:
>>>>>>>>>
>>>>>>>>> > Hi Aaron,
>>>>>>>>> >
>>>>>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>>>>>> > user controlled file name.
>>>>>>>>>
>>>>>>>>> I agree, that always seems scary.
>>>>>>>>>
>>>>>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>>>>>> >
>>>>>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>>>>>> > guest execution.
>>>>>>>>>
>>>>>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>>>>>> can expound on it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>>>>>>> most involved people needed something to get it working like "now".
>>>>>>>> Since they are complementary (other than the fix removing a bit of the
>>>>>>>> urgency for more MAC) it was kind of the least bad option.
>>>>>>>>
>>>>>>>> You have to be aware that I brought up the discussion on dev at dpdk.org - see
>>>>>>>> [1] and [2]:
>>>>>>>> But this will take time and eventually still be the applications task to
>>>>>>>> "do something" - no matter if via API or via the chmod's right now.
>>>>>>>> So Aaron is trying to get something that works now until the long term
>>>>>>>> things are in place, which I appreciate.
>>>>>>>>
>>>>>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>>>>>> this in time I run with [3] for now.
>>>>>>>> I never intended to suggest that, but with the discussion in place, one
>>>>>>>> could ask if you (Aaron) want to pick up that instead.
>>>>>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>>>>>>> socket ownership control and this then could be implemented in OVS?
>>>>>>>>
>>>>>>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>>>>>>> interim solutions, never the less we need it for now.
>>>>>>>>
>>>>>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>>>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>>>>>> [3]:
>>>>>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>
>>>>>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>>>>>> hopefully I captured them correctly):
>>>>>>>>>
>>>>>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>>>>>    pollute the ovs database.
>>>>>>>>>
>>>>>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>>>>>>    via chown/chmod where permissions come from the database (the
>>>>>>>>>    original context had ovs running as root - but as I described above
>>>>>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>>>>>>
>>>>>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>>>>>    not as great about this solution, TBH.
>>>>>>>>>
>>>>>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>>>>>    layering between them.
>>>>>>>>>
>>>>>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>>>>>> other, and can be used to a complementary effect (if implemented
>>>>>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I also remember several complex changes needed for the #1 and #3 that
>>>>>>>> always would end up with huge effort and a high risk not being accepted.
>>>>>>>> Probably that is what you refer to with "implications on other projects".
>>>>>>>>
>>>>>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>>>>>> which I'd like to summarize as "dpdk got this path from an app, so this app
>>>>>>>> OWNS that path".
>>>>>>>
>>>>>>> I'd like to continue on, but I am not sure what the concerns are right
>>>>>>> now.  Is it possible to enumerate them point by point so that I can
>>>>>>> understand them?  I think there are two outstanding concerns right now:
>>>>>>>
>>>>>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>>>>>
>>>>>>> 2. the proposed approach would be better implemented in the utility
>>>>>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>>>>>
>>>>>>> Am I understanding the concerns correctly?
>>>>>>
>>>>>>Ping?
>>>>>
>>>>> I found another theoretical problem with the chmod approach, let me try to
>>>>> explain:
>>>>>
>>>>> There's an extremely small race window between the socket creation and the
>>>>> chmod which could theoretically be exploited to change the owner of a socket
>>>>> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>>>>>
>>>>> 1. There's no southbound database running, because it's not yet been
>>>>>    started or because it's being restarted.
>>>>> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>>>>>    rte_vhost_driver_register() succeeds and creates a socket in the file
>>>>>    system.
>>>>> 3. The southbound database is started, it removes ovnsb_db.sock and recreates
>>>>>    it.
>>>>> 4. Now OVS changes the owner and the permission of what it thinks is a
>>>>>    vhost-user socket.
>>>>>
>>>>> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
>>>>> window, and it's unlikely that an attacker can control when the southbound
>>>>> database is restarted.
>>>>>
>>>>> I feel like I'm nitpicking, but I'm not sure how serious is the security
>>>>> impact of what I'm describing.
>>>>>
>>>>> I suggested an alternative approach, and I've tried implementing a quick
>>>>> POC on top on your patch:
>>>>>
>>>>> ---8<---
>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>>>> index 24ebb41..d7adc66 100644
>>>>> --- a/lib/netdev-dpdk.c
>>>>> +++ b/lib/netdev-dpdk.c
>>>>> @@ -30,6 +30,7 @@
>>>>>  #include <sys/types.h>
>>>>>  #include <sys/stat.h>
>>>>>  #include <getopt.h>
>>>>> +#include <sys/fsuid.h>
>>>>>
>>>>>  #include "chutil.h"
>>>>>  #include "dirs.h"
>>>>> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>>>       */
>>>>>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>>>>               vhost_sock_dir, name);
>>>>> +    uid_t orig_u = geteuid();
>>>>> +    gid_t orig_g = getegid();
>>>>> +    if (vhost_sock_def_owner) {
>>>>> +        uid_t u;
>>>>> +        gid_t g;
>>>>> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
>>>>> +            VLOG_INFO("UID: %d GID: %d", u, g);
>>>>> +            setfsuid(u);
>>>>> +            setfsgid(g);
>>>>> +        }
>>>>> +    }
>>>>>
>>>>>      err = rte_vhost_driver_register(dev->vhost_id);
>>>>>      if (err) {
>>>>> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>>>          err = vhost_construct_helper(netdev);
>>>>>      }
>>>>>
>>>>> -    ovs_mutex_unlock(&dpdk_mutex);
>>>>> -    if (!err && vhost_sock_def_owner &&
>>>>> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
>>>>> -        VLOG_ERR("vhost-user socket device ownership change failed.");
>>>>> +    if (vhost_sock_def_owner) {
>>>>> +        setfsuid(orig_u);
>>>>> +        setfsgid(orig_g);
>>>>>      }
>>>>>
>>>>> -    if (!err && vhost_sock_def_perms &&
>>>>> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
>>>>> -        VLOG_ERR("vhost-user socket device permission change failed.");
>>>>> -    }
>>>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>>>>
>>>>>      return err;
>>>>>  }
>>>>>
>>>>> ---8<---
>>>>>
>>>>> Compared to the chmod approach this has some limitation:
>>>>>
>>>>> 1. It doesn't support changing permissions, only the owner.  This
>>>>>    could be done with umask, but I couldn't find any system call
>>>>>    to change the umask for a single thread.
>>>>> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>>>>>    cannot be created.  I'm not sure whether this is a reasonable
>>>>>    limitation for the use cases you have in mind.
>>>>> 3. setfsuid() is Linux specific and somehow deprecated according
>>>>>    to the manpage:
>>>>>
>>>>>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>>>>>    in new applications"
>>>>>
>>>>>    I haven't used seteuid, because it changes the euid of the whole
>>>>>    process and that may interfere with other operations on OVS.
>>>>
>>>> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
>>>> like to work on trying to solve the problem.
>>>>
>>>>> If these limitations are unacceptable, I can see how we can use
>>>>> chmod.  After all, as you point out, it's probably better to do it
>>>>> in OVS than in some script.
>>>>
>>>> I think fchmod and fchown may actually be the correct calls to have, and
>>>> will refactor these chown/chmod utils functions as such, which (I
>>>> believe) avoids the race as you describe.
>>>>
>>>>> Thanks for your patience in solving this problem,
>>>>
>>>> Thanks for your reviews!
>>>>
>>>>> Daniele
>>>
>>> I've been trying to find a way to solve this in a portable fashion, but
>>> it appears to either require changes in DPDK to let the application
>>> create the unix server socket, or a linux-only guarantee with a
>>> non-linux racy fallback.  My attempt at the former change was NAK'd by
>>> upstream DPDK.  If the latter is unacceptable, then we can drop these
>>> patches and hope that client mode will be a workable solution.
>>>
>>> Thoughts?
>> Do you mean as if Open vSwitch would be the one connecting to DPDK
>> socket (the client and server roles swapped)?
>
>Exactly that.  The DPDK library calls from open vswitch would be for a
>vhost-user client socket to connect to qemu (or whatever is serving the
>vhost user socket).  The problem has to do with chmod/chown, and race
>conditions therein; fchmod/fchown only work due to linux specific
>behavior (and I've confirmed this).  There are a number of methods to
>try and catch when directories have been modified, but the issue is
>there's no way to close pandora's box (once you've modified the
>permissions, you have no clue which inode you touched).  Since there's
>no way, that I can tell, to build a transaction for this, it seems to me
>either do a linux specific hack, or don't do anything.  I think Daniele
>would prefer the don't do anything approach, but I'd like to get
>confirmation.

The code in netdev-dpdk is already Linux specific, I wouldn't object to
a Linux specific solution.

How would that work?  Does the latest DPDK expose the file descriptor?

Thanks,

Daniele