[ovs-dev] [PATCH v2 2/2] netdev-dpdk: Support user-defined socket attribs

Fri Jun 10 20:50:23 UTC 2016

On 10/06/2016 10:51, "Aaron Conole" <aconole at redhat.com> wrote:

>Aaron Conole <aconole at redhat.com> writes:
>
>> Christian Ehrhardt <christian.ehrhardt at canonical.com> writes:
>>
>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole at redhat.com> wrote:
>>>
>>>> Daniele Di Proietto <diproiettod at vmware.com> writes:
>>>>
>>>> > Hi Aaron,
>>>> >
>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>> > user controlled file name.
>>>>
>>>> I agree, that always seems scary.
>>>>
>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>> >
>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>> > qemu can open the socket as root and drop privileges before starting
>>>> > guest execution.
>>>>
>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>> users.  Additionally, I recall there being something of a "don't even
>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>> can expound on it.
>>>>
>>>
>>> Hi,
>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>> most involved people needed something to get it working like "now".
>>> Since they are complementary (other than the fix removing a bit of the
>>> urgency for more MAC) it was kind of the least bad option.
>>>
>>> You have to be aware that I brought up the discussion on dev at dpdk.org - see
>>> [1] and [2]:
>>> But this will take time and eventually still be the applications task to
>>> "do something" - no matter if via API or via the chmod's right now.
>>> So Aaron is trying to get something that works now until the long term
>>> things are in place, which I appreciate.
>>>
>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>> this in time I run with [3] for now.
>>> I never intended to suggest that, but with the discussion in place, one
>>> could ask if you (Aaron) want to pick up that instead.
>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>> socket ownership control and this then could be implemented in OVS?
>>>
>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>> interim solutions, never the less we need it for now.
>>>
>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>> [3]:
>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>
>>> [...]
>>>
>>>
>>>> I think originally we quickly discussed 4 possible solutions (and
>>>> hopefully I captured them correctly):
>>>>
>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>    group.  I don't actually like this solution because kvm could then
>>>>    pollute the ovs database.
>>>>
>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>    via chown/chmod where permissions come from the database (the
>>>>    original context had ovs running as root - but as I described above
>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>
>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>    implications on other projects.  I don't remember exactly what was
>>>>    not as great about this solution, TBH.
>>>>
>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>    layering between them.
>>>>
>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>> other, and can be used to a complementary effect (if implemented
>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>
>>>
>>> I also remember several complex changes needed for the #1 and #3 that
>>> always would end up with huge effort and a high risk not being accepted.
>>> Probably that is what you refer to with "implications on other projects".
>>>
>>> Also keep in mind the position of dpdk out of the last few discussions
>>> which I'd like to summarize as "dpdk got this path from an app, so this app
>>> OWNS that path".
>>
>> I'd like to continue on, but I am not sure what the concerns are right
>> now.  Is it possible to enumerate them point by point so that I can
>> understand them?  I think there are two outstanding concerns right now:
>>
>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>
>> 2. the proposed approach would be better implemented in the utility
>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>
>> Am I understanding the concerns correctly?
>
>Ping?

I found another theoretical problem with the chmod approach, let me try to
explain:

There's an extremely small race window between the socket creation and the
chmod which could theoretically be exploited to change the owner of a socket
(e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:

1. There's no southbound database running, because it's not yet been
   started or because it's being restarted.
2. The user creates a vhost port, naming it ovnsb_db.sock.
   rte_vhost_driver_register() succeeds and creates a socket in the file
   system.
3. The southbound database is started, it removes ovnsb_db.sock and recreates
   it.
4. Now OVS changes the owner and the permission of what it thinks is a
   vhost-user socket.

If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
window, and it's unlikely that an attacker can control when the southbound
database is restarted.

I feel like I'm nitpicking, but I'm not sure how serious is the security
impact of what I'm describing.

I suggested an alternative approach, and I've tried implementing a quick
POC on top on your patch:

---8<---

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 24ebb41..d7adc66 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -30,6 +30,7 @@
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <getopt.h>
+#include <sys/fsuid.h>
 
 #include "chutil.h"
 #include "dirs.h"
@@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
      */
     snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
              vhost_sock_dir, name);
+    uid_t orig_u = geteuid();
+    gid_t orig_g = getegid();
+    if (vhost_sock_def_owner) {
+        uid_t u;
+        gid_t g;
+        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
+            VLOG_INFO("UID: %d GID: %d", u, g);
+            setfsuid(u);
+            setfsgid(g);
+        }
+    }
 
     err = rte_vhost_driver_register(dev->vhost_id);
     if (err) {
@@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
         err = vhost_construct_helper(netdev);
     }
 
-    ovs_mutex_unlock(&dpdk_mutex);
-    if (!err && vhost_sock_def_owner &&
-        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
-        VLOG_ERR("vhost-user socket device ownership change failed.");
+    if (vhost_sock_def_owner) {
+        setfsuid(orig_u);
+        setfsgid(orig_g);
     }
 
-    if (!err && vhost_sock_def_perms &&
-        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
-        VLOG_ERR("vhost-user socket device permission change failed.");
-    }
+    ovs_mutex_unlock(&dpdk_mutex);
 
     return err;
 }

---8<---

Compared to the chmod approach this has some limitation:

1. It doesn't support changing permissions, only the owner.  This
   could be done with umask, but I couldn't find any system call
   to change the umask for a single thread.
2. Unless vhost-sock-dir is owned by the target owner, the socket
   cannot be created.  I'm not sure whether this is a reasonable
   limitation for the use cases you have in mind.
3. setfsuid() is Linux specific and somehow deprecated according
   to the manpage:
   
   "Thus, setfsuid() is nowadays unneeded and should be avoided
   in new applications"

   I haven't used seteuid, because it changes the euid of the whole
   process and that may interfere with other operations on OVS.

If these limitations are unacceptable, I can see how we can use
chmod.  After all, as you point out, it's probably better to do it
in OVS than in some script.

Thanks for your patience in solving this problem,

Daniele