[ovs-dev] [PATCH 05/16] datapath: Add generic virtual port layer.

Jesse Gross jesse at nicira.com
Thu Apr 15 20:35:40 UTC 2010

On Wed, Apr 14, 2010 at 6:39 PM, Ben Pfaff <blp at nicira.com> wrote:

> On Tue, Apr 13, 2010 at 10:41:07AM -0400, Jesse Gross wrote:
> > Currently the datapath directly accesses devices through their
> > Linux functions.  Obviously this doesn't work for virtual devices
> > that are not backed by an actual Linux device.  This creates a
> > new virtual port layer which handles all interaction with devices.
> >
> > The existing support for Linux devices was then implemented on top
> > of this layer as two device types.  It splits out and renames dp_dev
> > to internal_dev.  There were several places where datapath devices
> > had to handled in a special manner and this cleans that up by putting
> > all the special casing in a single location.
> Wow, this is major infrastructure.
> I see a few lines that are indented with four spaces instead of one tab.

I think I got all of them now.

> I think that the relationship between vport and net_bridge_port is this:
> a vport may be attached to 0 or 1 net_bridge_ports; a net_bridge_port is
> always attached to exactly one vport.  Is that right?  If it is, I would
> be tempted to embed the net_bridge_port in the vport structure or to
> otherwise combine them.  Is that feasible/sensible?

Yes, that's the correct relationship between them.

First of all, I realized that net_bridge_port is overloaded into two now
unrelated things: the first is representing a port on a datapath and
contains datapath specific state: a pointer to the datapath, port number,
etc.  The second is to use to set dev->br_port so that packets will be
received for Linux system devices.  Now that all the bridge hook stuff is in
vport-netdev.c there is no reason that these should be the same structure.
 I separated them so there is now a struct dp_port which contains all of the
thing net_bridge_port previously contained and dev->br_port now points
directly to the vport.  Now vport implementors don't have to care about
dp_port at all, which is nice.

Back to the point of embedding net_bridge_port (now dp_port) into a vport: I
don't think that it really makes sense.  As a practical matter, dp_port can
change during the lifetime of a vport (during attach/detach).  This means
that we need to do RCU, which means that we need another structure, which
puts us back where we are now.  However, I also don't think that they belong
together - struct dp_port is managed by the datapath (and protected by its
locks) and struct vport is managed by vport (and protected by its locks).
 It seems much easier to reason about if they aren't tightly bound.

> Kernel abstraction layers are usually very thin and transparent.  This
> one is a little "thicker".  vport_init() seems more elaborate than usual
> for kernel code.  I don't usually see a separation between client and
> implementor headers (vport.h, vport-provider.h).  Trivial accessor
> functions are usually omitted in favor of using members directly
> (e.g. vport_get_class(), vport_get_nbp()).

The layer started out much thinner.  Part of the problem with having a thin
layer is that the thickness is not consistent between functions and tends to
grow over time. It's much nicer to work with a consistent abstraction and
much easier to keep it that way as new features are added if the layer
already inserts itself consistently between the participants.

vport_init() is a little more complicated than it currently needs to be but
I'm planning on supporting dynamically loading vport providers at which
point it will need that extra complexity.  I think it's nicer if the client
and implementor headers are separate but you are right they are usually
together, so I merged those two files.  Using vport_get_class() is probably
being a little overzealous, so I removed it and vport_get_nbp() is now no
longer just a pass through.  However, there are still some other functions
which directly call through that I left.  Having these give us more
flexibility in the future to make changes without having to update lots of
other code paths.  In general the layers in Linux are too thin - it's a
large part of the reason why we need all the compatibility code.

> The vport_class flags bother me a little.

Is it the existence of the flags or the flags that are currently defined?
 If it's the existence, it's not too different from dev->features.

>  VPORT_F_REQUIRED seems to be
> a very special case, only there for mutual exclusion with the bridge
> module.

Bridge exclusion is a special case and assuming that we want to keep the
current behavior this seems the cleanest way to expose it.  Actually the
only truly required type is the internal port but it has no init function so
it can't fail.  However, you could reverse the importance of the netdev and
GRE providers.  In this case you would only load if there was no GRE handler
registered but could have a datapath with only GRE devices that would work
even if the Linux bridge is in use.

>  My "feeling" is that something analogous to VPORT_F_GEN_STATS
> would usually be implemented in the kernel as helper functions or
> "generic" functions, e.g. like how many Linux file systems define their
> aio functions:
>        .aio_read       = generic_file_aio_read,
>        .aio_write      = generic_file_aio_write,

I don't think that these are quite analogous because aio can be implemented
on top of existing synchronous operations while stats requires interposing
on the send and receive operations.  Of course you could check whether the
get_stats function points to the generic version but that seems kind of
hacky.  In fact:

const struct net_device_stats *dev_get_stats(struct net_device *dev)
	const struct net_device_ops *ops = dev->netdev_ops;

	if (ops->ndo_get_stats)
		return ops->ndo_get_stats(dev);

	dev_txq_stats_fold(dev, &dev->stats);
	return &dev->stats;

It looks fairly similar to mine though I think mine is a little more flexible.

> VPORT_F_TUN_ID seems like maybe it should be replaced by a comment above
> vport_receive() that says "Caller must initialize OVS_CB(skb)->tun_id."
> Or perhaps vport_receive() could take a tun_id argument and initialize
> OVS_CB(skb)->tun_id from that itself, which would really be foolproof.
I think expecting the caller to set tun_id is asking for trouble.  Really
setting a flag if you support a feature and allow the generic layer to
handle it seems a lot more convenient to me and similar to how most device
features are handled (GSO, checksum offloading, etc).

> Usually kernel "class" structures are named something_ops instead of
> something_class.

Yes, that's a bit nicer.

> I see some uses of "sizeof" on objects instead of on types, and uses
> without parentheses.  Kernel style discourages both.

This just seems like a step backwards but that is the more common style, so

> In vport_locate(), WARN_ON_ONCE would avoid spewing lots of errors if
> something did go wrong.

vport_locate() isn't called that frequently (not for every packet) plus it
is nice to get stack traces of different paths that might cause the problem.
 It's also a significant bug if it triggers (ASSERT_RTNL has the same

> Kernel style would call for __vport_add() to use ERR_PTR to report both
> successful and error return values.  Similarly for the "create" member
> function of vport_class.


> Could vport_record_error() be made simpler, getting rid of VPORT_E_*
> entirely, by passing in a pointer to the counter to increment?  (It's
> hard to tell, because this commit does not add any callers.)  If
> rx_errors and tx_errors are considered as the sum of their own values
> plus the other errors that contribute to them, then there would only be
> one value to increment each time.

I now sum the error values when stats are requested instead of when the
errors occur.  However, while it is possible to directly pass in a counter,
this is the same issue with thiner vs. thicker layers.  I already changed
the layout of the stats once and this design meant that I didn't have to
changed anything else.

> I kept thinking that some functions don't have comments but need them,
> but them I realized that the comments were above the prototypes in the
> header files.  I don't think that the kernel uses this convention; I
> don't remember seeing them there before.

OK, I moved the comments above the implementation and documented all the
public interfaces.

> It's probably not an appropriate change for this patch, but forcing the
> internal device MTU to the minimum MTU of the other ports has always
> bothered me.  Only userspace, or possibly even the OpenFlow controller,
> can know whether a packet received on a given port might ever be sent
> out another given port.  For example, a single datapath could be broken
> up into two entirely independent bridges.  So it seems to me that
> userspace should be allowed to set appropriate MTUs on internal
> devices.  (dp_min_mtu() seems like a reasonable default though.)

It really depends on whether you see a datapath as a single bridge or not.
 There also isn't currently anyway for userspace to set the MTU (it's not
exposed in the netdev library).  In any case, it is definitely orthogonal to
this patch.

> The following:
>   struct sk_buff *dummy_skb;
>  int err;
>   BUILD_BUG_ON(sizeof(struct ovs_skb_cb) > sizeof(dummy_skb->cb));
> may be written as:
>  BUILD_BUG_ON(sizeof(struct ovs_skb_cb) > sizeof(((struct sk_buff
> *)0)->cb));
> Which you like better is up to you :-)

Hmm, they are both pretty ugly but I think I'll stick with my version - it's
also the way it is done in other places.

> There are no dpif functions to call any of the new ODP_VPORT_* ioctls.
> I guess those will be added later.

Yes, they're coming...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20100415/239868ae/attachment-0003.html>

More information about the dev mailing list