[ovs-dev] Merging datapath into the upstream kernel tree

Arnd Bergmann arnd at arndb.de
Thu Aug 5 09:31:00 UTC 2010


On Thursday 05 August 2010, Simon Horman wrote:

> On Wed, Aug 04, 2010 at 06:51:26PM -0700, Justin Pettit wrote:
> > > I am wondering what the thoughts on merging datapath into
> > > the upstream kernel tree are. I would be more than willing
> > > to volunteer for the task.
> > 
> > That would be fantastic.  We've been discussing how to move forward with
> > this.  Here's a quick list of items that we think need to be addressed
> > before it would be considered:
> 
> Great. I'm pretty excited about this.

I think we should easily be able to get it into drivers/staging, though I
suppose it's too late for the 2.6.36 cycle.
 
> > 	- The kernel module is currently a character device that is
> > 	controlled by ioctls.  This should be changed to a module that just
> > 	uses netlink.  This is more inline with Linux network
> > 	configuration, and it will be more flexible if new features are
> > 	added, so a consistent userspace application can be used across
> > 	kernels.
> 
> 	I was pondering that the code really should use netlink :-)

Netlink sounds good here, but it does come with a performance penalty
compared with ioctl. How performance critical is the control path?

For drivers/staging, ioctl is ok, and we can migrate the interface there.

> > 	- Don't steal the bridge hooks.  This should be trivial with the
> > 	new hooks that should appear in 2.6.36.
> 
> 	Could you give me a pointer to the new hooks?

Current kernels have the br_handle_frame_hook and macvlan_handle_frame_hook
in net/core/dev.c. 2.6.36 (I actually thought this was in .35, but I was
mistaken) has unified the two and made them more generic as a 
netdev_rx_handler_register, which can be used by openvswitch.

> > 	- Break out vports.  OVS introduces the concept of a vport, which
> > 	is an interface abstraction.  It's fairly monolithic at the moment,
> > 	so it will likely need to be more modular.
> 
> 	This I am the least familiar with.

Me too.

> > 	- Support network namespaces.
> 
> 	That sounds more like post-merge material to me.

Yes, for drivers/staging, it's fine to make the code depend on
!CONFIG_NET_NS, but I think it's important enough to fix this
before the code can move to drivers/net.
 
> > 	- Rip out bridge compatibility code.  This is for backwards
> > 	compatibility with some hypervisors, but shouldn't be needed on new
> > 	kernels.
> 
> 	Could you explain in a little more detail why
> 	the compatibility code won't be needed (or will be needed less)
> 	moving forwards? Certainly I'm very much in favour of removing
> 	compatibility code if it isn't needed. Not least because
> 	its likely to help with getting the code accepted.

IMHO it's actually more harmful than good, so ripping it out
is a requirement for upstream inclusion. The compatiblity code
implements the same user API as the bridge and vlan modules,
which means that it cannot coexist with those modules in the
same kernel.

I believe that the existence of the compatibility code has been the
most significant problem for upstream acceptance so far, because
it makes people see openvswitch as a more complex implementation of
existing features, rather than an independent feature that has
value of its own.

> > 	- Add sysfs information.  Currently we only support items that
> > 	allows us to impersonate the bridge.
> 
> 	That sounds reasonable, I should be in a better position
> 	to know what to put in there once I'm a bit more familiar
> 	with the code.

We shouldn't introduce sysfs interfaces just for the sake of it.
If there is some information that the kernel should expose, sysfs
may be the right choice for that, but it's not a requirement.

The code to impersonate a bridge in sysfs should probably get
removed in the upstream version, though IIRC it is actually implemented
by the bridge compat module mentioned above, so it won't be there
anywhere.

> > 	- Possibly call hooks for netfilter/iptables, if necessary.
> 
> 	That also sounds like post-merge work.

Yes. Let's not add new features until the code is fully merged.

> > We think reworking the kernel module is the greatest single amount of
> > work, and it's already on Ben Pfaff's to-do list.  We would love to hear
> > your input on areas that we may have missed and suggestions you may have
> > for smoothing out the process.  If there are any parts you'd like to work
> > on, that would be great, too.  Let us know how you'd like to contribute.
>
> I'm still getting to grips with the code, and I expect to have more
> ideas as that progresses. Looking over the code the things that
> struck me were fairly simple - removing compatibility with older kernels
> and the like, as thats not appropriate for the upstream kernel tree.

I think the requirements for drivers/staging are actually very low, all we
need is a patch or a set of patches that

- applies cleanly to the upstream git tree
- adds files into a single directory under drivers/staging
- compiles without errors
- has a Signed-off-by line from someone
- adds a TODO file in the same directory with a list of things to
  work on, before the code gets moved to drivers/net.

> Certainly I think that I can handle the netlink work. I may as
> well jump in at the deep-end.

Sounds good.
 
> Unfortunately/fortunately I will be more or less off-line next week
> for LinuxCon in Boston. So it will be difficult for me to make a concrete
> start on anything before that is finished. If any Open vSwitch people
> will be in Boston next week, it would be a good chance to meet.

I'll also be in Boston. A few weeks ago, I have tried getting the module
code into a state where it could be sent as a patch to the staging tree,
but before I had it complete, I realized that there is no Signed-off-by:
in the git tree, which means I don't want to send it myself.

	Arnd




More information about the dev mailing list