[ovs-dev] [ovs-discuss] Somebody making --user and dpdk compatible again?

Fri Feb 5 03:27:48 UTC 2016

On 2 February 2016 at 23:33, Serge Hallyn <serge.hallyn at ubuntu.com> wrote:

> Quoting Ansis Atteka (ansisatteka at gmail.com):
> > On 29 January 2016 at 12:10, Serge Hallyn <serge.hallyn at ubuntu.com>
> wrote:
> > > Sorry I've not really had anything to add here, I'm just not familiar
> > > enough with the ovs codebase.  Absolutely running as non-root as much
> as
> > > possible would be far preferred, so that the MAC policy can be just
> another
> > > layer in our defense in depth rather than our only card.  It sounds
> from
> > > the above as though you don't just create one socket but create them on
> > > the fly, one per started VM?  I realize this becomes more work for ovs,
> > > but perhaps ovs could fork off a privileged child which just creates
> > > and chowns sockets on demand and hands them back over a socketpair to
> >
> > the parent, which drops privs immediately after the fork?  (I would have
> > > thought ovs needed privs for a lot more of its work than that, but if
> > > not...)
> >
> >
> > Hi Serge,
> >
> > I think that there is catch-22 paradox with Discretionary Access Control.
> > Here is why:
> > 1. Processes need access control because domain logic does not implement
> > proper input and error code validation (e.g. allows shell escape, path
> > escape ...).
> > 2. Discretionary Access Control is intertwined with Domain Logic (ie
> access
> > control at least partially resides in the same process space as domain
> > logic, at least the "chown" invocations); This means that developers are
> > tempted to reuse the same input validation for access control and hence
> > have the same set of bugs.
>
> Sorry I'm ignorant here - can you explain what goes on over this socket
> which we would be chowning?
>

This is DPDK user socket. Basically Open vSwitch calls into Intel DPDK
library and then Intel DPDK library creates this socket.

Then KVM process attempts to connect to this UNIX domain socket. I believe
this is where is the Permission Denied issue (see below my proposed
solution for this that I haven't tried yet)

> > I think [http://openvswitch.org/pipermail/dev/2016-January/065351.html]
> > patch demonstrates this Catch-22 paradox very well where with `ovs-vsctl
> > add-port "../../../ ...` command it became possible to chown arbitrary
> > files owned by root because domain logic had "path escape" flaw that was
> > inherited in access control too.
> >
> > Of course this could be solved by having a separate "Access Control
> Policy
> > Engine" in the OVS user-space that runs as root and has some restricting
> > white list rules. However, if we would be going this far, then wouldn't
> we
> > be reinventing SELinux and Apparmor? It would be just like
> SELinux/Apparmom
> > Linux Security Module that sits between the confined user space part and
> > the actual system call.
>
> Well, not really.  Just like policykit, pam, 'login', and sudo are not
> reinventing selinux/apparmor.  The point of selinux/apparmor is to have

a configurable policy which applies regardless of the user, i.e. so
> that admins cannot violate certain rules like 'don't leak that top
> secret data over there'.
>

What you call as "don't leak that top secret data over there" is called
Multilevel security. Yes, SELinux can do Multilevel security, but it does
not disqualify SELinux as candidate to do Type Enforcement security. In
fact this is exactly how Fedora, RHEL and CentOS configure SELinux.

I imagine that Multilevel security will be mostly used by organizations
where they have certain criteria to figure out sensitivity of information.

> There's absolutely nothing inappropriate about having access control
> rules built into a openvswitch layer which is mediating user input.
>

Depends on use-cases and definition of what is "appropriate":
1. if you can start process from the beginning under confined user, then I
agree that DAC is a reasonable solution to confine processes.
2. if you have to start process as root and then downgrade, then developers
need to put a lot of commitment to audit C code that is executed before
downgrade.

> However, that doesn't mean that you have time to implement that, and
> it's certainly not something that should be rushed.

> So with this socket, if chowning it to the unprivileged user would be
> like opening a root shell over a pty and then chowning the pty to
> an unpriv user, well that's pretty convincing.  On the other hand,
> perhaps then the solution is a small separate interlocutor which
> filters the traffic between qemu and the root-owned socket.  (which
> may not be feasible, it's just a thought)

> > Note that dnsmasq has similar issue where you can create Trojan files
> under
> > file system even when using "--user nobody" flag. Try, `dnsmasq
> > --log-file=/etc/rsyslog.d/trojan.conf --user=nobody`. Yes, if input is
> > hardcoded this does not sound like a big deal, but OVSDB receives most of
> > configuration at run time from OVSDB and I believe most of the bugs would
> > be of nature where input is not properly validated.
> >
> > Also, I agree that there are cases when Discretionary Access Control
> makes
> > total sense, for example, when one is running webserver on port >1024 and
> > can start process under the restricted user from the very beginning.
> >
> > The arguments I have heard so far in favor of DAC are:
> > 1. Hard to disable (but only because DAC is hard-coded in the Application
> > itself and admin would have to edit systemd or binaries themselves to
> > switch back to root).
> > 2. More consistent across different Linux distributions.
> > 3. A lot of processes have been already ported to use some sort of DAC
> > hence this is least resistance path when we look at the distribution as
> > whole where process need to talk with each other.
>
> It's not just least resistance.  We'll have to completely remove the DAC
> layer protecting qemu+libvirt for such deployments, leaving us with just
> MAC rather than DAC+MAC.
>

That is my exact definition of least resistance path. :)

Aaron, Christian:

Perhaps I miss the big picture, but isn't the problem here actually in
libvirt or whoever starts kvm in too confined way? For example, on my
workstation, where I run libvirt I see that I have:

root     28043     1  0 Feb01 ?        00:00:00 /usr/sbin/libvirtd -d
libvirt+ 13100     1 82 18:51 ?        00:00:18 qemu-system-x86_64
-enable-kvm -name Console -S -machine pc-i440fx-utopic,accel=kvm,usb=off
-cpu Nehalem -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1
-uuid 4e7a714a-8588-4b41-ad89-ab3f9508f9ce -no-user-config -nodefaults ...

However, nowhere in qemu command line I see -runas <user> flag being used.
Would KVM be able to open the socket and downgrade user by using this flag?
Perhaps we should pull in libvirt developers to see feasibility of this?

Nevertheless, I am still fine with Aaron's and Christian's suggested
approach where Unix Domain socket is chown'ed by OVS from "root" to
"libvirt+" user. Though, there is small race condition window between
creation of Unix Domain Socket and the chown call during which kvm pocess
could get "Permission Denied" error, if OVS did not yet chown the socket.
Would KVM process retry connect, if this ever happens?