[ovs-dev] upstreaming datapath
shemminger at vyatta.com
Mon Oct 19 22:05:06 UTC 2009
> Removing compatibility code should be easy, because we've been
> conscientiously following a model where we always update our
> kernel code to the latest kernel version and insert compatibility
> code into compatibility headers that make old kernels look like
> newer ones. There are a few places where this didn't quite work,
> but by and large just "rm -r compat-2.6" should be sufficient to
> delete compatibility code.
> Oh, and we'd delete the brcompat module entirely, too, of course.
> Its initial purpose is obsolete, so we don't need it longer
> The main place where the openvswitch module is actively
> incompatible with anything in the upstream kernel is the bridge
> hook (br_handle_frame_hook). My thought there is that this hook
> should become per-net_device, so that the existing Linux bridge
> and openvswitch can coexist in a single system (which is useful,
> and yet OVS can't support it right now). Does that make sense to
There is active discussion on netdev, of providing a better hook
for incoming packet redirection. PF_RING needs it and the existing
open coded stack of hooks is awkward.
My plan is to replace this with a new hook chain similar to existing
protocol hooks, and make bridge/bond/macvlan and OVS and PF_RING
> We'd also need to decide on a sysfs interface for openvswitch.
> Currently the code emulates the existing bridge's sysfs
> interface, because we needed compatibility, but clearly it's not
> completely suitable and we should design something better.
I prefer the netlink API used by vlan's with some additions
for the control interface. sysfs is handy as a side door interface
for shell scripts and parameter tweeking
> What kind of unified interface do you have in mind? I can
> imagine using the same netlink calls for, say, adding and
> removing bridges and ports. But both the existing bridge and the
> openvswitch also have functionality that the other does not. It
> would not make sense to try to shoehorn both into exactly the
> same interface.
There is a netlink interface used for macvlan, vlan, gre and veth.
It is missing support for bridge and bonding. The idea is to
add types for adding, deleting and modifying slaves and parameters.
> Initially (I think that this was so long ago that it is not in
> our current Git tree), Open vSwitch used Netlink entirely for
> communication with userspace (whereas now it uses character
> devices). But this proved not to work well for transactional
> operations that are not idempotent, because responses to Netlink
> messages can get lost. For example, Open vSwitch has a datapath
> operation to delete a flow and return its statistics. When this
> was implemented as a Netlink request and response, it was
> possible for the response to get lost (because a kernel memory
> allocation failed). But re-sending the request would not work,
> because the first command had deleted the flow. And breaking it
> into two separate commands (get flow stats, delete flow)
> introduces a race where statistics on packets that arrive between
> the commands are lost. This is the main reason that we are not
> using Netlink now. I think there were other reasons, too, but
> that is the one that comes to mind first.
Netlink will not drop responses to message, the only case where
messages can get lost is when it is used for monitoring. The normal
usage of request/response (even for dumping large tables), is supposed
to be guaranteed.
> But the biggest reason that we have not already submitted OVS for
> inclusion is this one: currently the interface is not flexible
> and not extensible. In particular, beyond the L2 Ethernet
> header, it can only match IPv4 packets. I have some thoughts on
> how to make it more flexible and extensible, but I have not had
> time to work any of it out in detail or to start writing code for
Wild idea would be to build off of nftables state machine engine.
It is better not to have to build full protocol possibilities in the
kernel, that is why Patrick is working on nftables as a long term
replacement for iptables.
More information about the dev