[ovs-dev] [PATCH v2 2/2] netdev-dpdk: Support user-defined socket attribs

Mon Jun 13 21:36:34 UTC 2016

Daniele Di Proietto <diproiettod at vmware.com> writes:

> On 10/06/2016 10:51, "Aaron Conole" <aconole at redhat.com> wrote:
>
>>Aaron Conole <aconole at redhat.com> writes:
>>
>>> Christian Ehrhardt <christian.ehrhardt at canonical.com> writes:
>>>
>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole at redhat.com> wrote:
>>>>
>>>>> Daniele Di Proietto <diproiettod at vmware.com> writes:
>>>>>
>>>>> > Hi Aaron,
>>>>> >
>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>> > user controlled file name.
>>>>>
>>>>> I agree, that always seems scary.
>>>>>
>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>> >
>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>> > guest execution.
>>>>>
>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>> can expound on it.
>>>>>
>>>>
>>>> Hi,
>>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>>> most involved people needed something to get it working like "now".
>>>> Since they are complementary (other than the fix removing a bit of the
>>>> urgency for more MAC) it was kind of the least bad option.
>>>>
>>>> You have to be aware that I brought up the discussion on dev at dpdk.org - see
>>>> [1] and [2]:
>>>> But this will take time and eventually still be the applications task to
>>>> "do something" - no matter if via API or via the chmod's right now.
>>>> So Aaron is trying to get something that works now until the long term
>>>> things are in place, which I appreciate.
>>>>
>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>> this in time I run with [3] for now.
>>>> I never intended to suggest that, but with the discussion in place, one
>>>> could ask if you (Aaron) want to pick up that instead.
>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>>> socket ownership control and this then could be implemented in OVS?
>>>>
>>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>>> interim solutions, never the less we need it for now.
>>>>
>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>> [3]:
>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>
>>>> [...]
>>>>
>>>>
>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>> hopefully I captured them correctly):
>>>>>
>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>    pollute the ovs database.
>>>>>
>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>>    via chown/chmod where permissions come from the database (the
>>>>>    original context had ovs running as root - but as I described above
>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>>
>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>    not as great about this solution, TBH.
>>>>>
>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>    layering between them.
>>>>>
>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>> other, and can be used to a complementary effect (if implemented
>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>
>>>>
>>>> I also remember several complex changes needed for the #1 and #3 that
>>>> always would end up with huge effort and a high risk not being accepted.
>>>> Probably that is what you refer to with "implications on other projects".
>>>>
>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>> which I'd like to summarize as "dpdk got this path from an app, so this app
>>>> OWNS that path".
>>>
>>> I'd like to continue on, but I am not sure what the concerns are right
>>> now.  Is it possible to enumerate them point by point so that I can
>>> understand them?  I think there are two outstanding concerns right now:
>>>
>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>
>>> 2. the proposed approach would be better implemented in the utility
>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>
>>> Am I understanding the concerns correctly?
>>
>>Ping?
>
> I found another theoretical problem with the chmod approach, let me try to
> explain:
>
> There's an extremely small race window between the socket creation and the
> chmod which could theoretically be exploited to change the owner of a socket
> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>
> 1. There's no southbound database running, because it's not yet been
>    started or because it's being restarted.
> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>    rte_vhost_driver_register() succeeds and creates a socket in the file
>    system.
> 3. The southbound database is started, it removes ovnsb_db.sock and recreates
>    it.
> 4. Now OVS changes the owner and the permission of what it thinks is a
>    vhost-user socket.
>
> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
> window, and it's unlikely that an attacker can control when the southbound
> database is restarted.
>
> I feel like I'm nitpicking, but I'm not sure how serious is the security
> impact of what I'm describing.
>
> I suggested an alternative approach, and I've tried implementing a quick
> POC on top on your patch:
>
> ---8<---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 24ebb41..d7adc66 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -30,6 +30,7 @@
>  #include <sys/types.h>
>  #include <sys/stat.h>
>  #include <getopt.h>
> +#include <sys/fsuid.h>
>  
>  #include "chutil.h"
>  #include "dirs.h"
> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>       */
>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>               vhost_sock_dir, name);
> +    uid_t orig_u = geteuid();
> +    gid_t orig_g = getegid();
> +    if (vhost_sock_def_owner) {
> +        uid_t u;
> +        gid_t g;
> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
> +            VLOG_INFO("UID: %d GID: %d", u, g);
> +            setfsuid(u);
> +            setfsgid(g);
> +        }
> +    }
>  
>      err = rte_vhost_driver_register(dev->vhost_id);
>      if (err) {
> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>          err = vhost_construct_helper(netdev);
>      }
>  
> -    ovs_mutex_unlock(&dpdk_mutex);
> -    if (!err && vhost_sock_def_owner &&
> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
> -        VLOG_ERR("vhost-user socket device ownership change failed.");
> +    if (vhost_sock_def_owner) {
> +        setfsuid(orig_u);
> +        setfsgid(orig_g);
>      }
>  
> -    if (!err && vhost_sock_def_perms &&
> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
> -        VLOG_ERR("vhost-user socket device permission change failed.");
> -    }
> +    ovs_mutex_unlock(&dpdk_mutex);
>  
>      return err;
>  }
>
> ---8<---
>
> Compared to the chmod approach this has some limitation:
>
> 1. It doesn't support changing permissions, only the owner.  This
>    could be done with umask, but I couldn't find any system call
>    to change the umask for a single thread.
> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>    cannot be created.  I'm not sure whether this is a reasonable
>    limitation for the use cases you have in mind.
> 3. setfsuid() is Linux specific and somehow deprecated according
>    to the manpage:
>    
>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>    in new applications"
>
>    I haven't used seteuid, because it changes the euid of the whole
>    process and that may interfere with other operations on OVS.

Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
like to work on trying to solve the problem.

> If these limitations are unacceptable, I can see how we can use
> chmod.  After all, as you point out, it's probably better to do it
> in OVS than in some script.

I think fchmod and fchown may actually be the correct calls to have, and
will refactor these chown/chmod utils functions as such, which (I
believe) avoids the race as you describe.

> Thanks for your patience in solving this problem,

Thanks for your reviews!

> Daniele

Here's an elephant of a question, though.  Would it make sense to try
and work towards some kind of scheme whereby OvS is aware of the various
unix sockets it creates, and allows setting the permissions, ownership,
etc. in a common way?  I'm not committed to finding / solving that
problem, but would it even be acceptable / appropriate?

-Aaron