[ovs-dev] [PATCH 5/5] dpif-linux: Prevent a single port from monopolizing upcalls.

Fri Sep 23 16:16:32 UTC 2011

On Thu, Sep 22, 2011 at 03:59:30PM -0700, Jesse Gross wrote:
> On Thu, Sep 22, 2011 at 1:11 PM, Ben Pfaff <blp at nicira.com> wrote:
> > On Mon, Sep 19, 2011 at 03:00:08PM -0700, Jesse Gross wrote:
> >> Currently it is possible for a client on a single port to generate
> >> a huge number of packets that miss in the kernel flow table and
> >> monopolize the userspace/kernel communication path. ??This
> >> effectively DoS's the machine because no new flow setups can take
> >> place. ??This adds some additional fairness by separating each upcall
> >> type for each object in the datapath onto a separate socket, each
> >> with its own queue. ??Userspace then reads round-robin from each
> >> socket so other flow setups can still succeed.
> >>
> >> Since the number of objects can potentially be large, we don't always
> >> have a unique socket for each. ??Instead, we create 16 sockets and
> >> spread the load around them in a round robin fashion. ??It's theoretically
> >> possible to do better than this with some kind of active load balancing
> >> scheme but this seems like a good place to start.
> >>
> >> Feature #6485
> >
> > I'm not sure that we should increment last_assigned_upcall from
> > dpif_linux_execute__() as well as from dpif_flow_put(). ??Most
> > dpif_flow_put() calls are right after a dpif_linux_execute__() for the
> > same flow (manually sending the first packet), so this will tend to
> > use only the even-numbered upcall sockets.
> >
> > Also, manually sending the first packet of a flow with one
> > upcall_sock, then setting up a kernel flow that uses a different
> > upcall_sock could mean that userspace sees packet reordering, if it
> > checks the upcall_socks in the "wrong" order.
> >
> > One way to avoid these problems would be to choose the upcall_sock
> > using a hash of the flow instead of a counter.
> 
> I think that probably makes sense and at the same time it addresses
> Pravin's concerns about assigning ports to sockets in a purely
> round-robin fashion.
> 
> This is easier to do correctly if we have "struct flow" here instead
> of the Netlink attributes.  It actually seems more correct to do the
> conversion in dpif-linux anyways.  In theory, it could result in some
> unnecessary conversions for things that are bouncing back and forth
> between userspace and kernel but I think in practice we end up doing
> the conversions for most operations anyways.  Is there a reason to
> layer it the way it is?

It's layered this way to support decoupling the kernel and user flow
keys, as described in commit 856081f683:

    datapath: Report kernel's flow key when passing packets up to userspace.

    One of the goals for Open vSwitch is to decouple kernel and userspace
    software, so that either one can be upgraded or rolled back independent of
    the other.  To do this in full generality, it must be possible to change
    the kernel's idea of the flow key separately from the userspace version.

    This commit takes one step in that direction by making the kernel report
    its idea of the flow that a packet belongs to whenever it passes a packet
    up to userspace.  This means that userspace can intelligently figure out
    what to do:

       - If userspace's notion of the flow for the packet matches the kernel's,
         then nothing special is necessary.

       - If the kernel has a more specific notion for the flow than userspace,
         for example if the kernel decoded IPv6 headers but userspace stopped
         at the Ethernet type (because it does not understand IPv6), then again
         nothing special is necessary: userspace can still set up the flow in
         the usual way.

       - If userspace has a more specific notion for the flow than the kernel,
         for example if userspace decoded an IPv6 header but the kernel
         stopped at the Ethernet type, then userspace can forward the packet
         manually, without setting up a flow in the kernel.  (This case is
         bad from a performance point of view, but at least it is correct.)

    This commit does not actually make userspace flexible enough to handle
    changes in the kernel flow key structure, although userspace does now
    have enough information to do that intelligently.  This will have to wait
    for later commits.

At the time I thought that we'd want to implement the userspace half
of this quickly, but now it's become clear that it's a low priority,
so it's fine with me if you want to change the layering for now.