[ovs-dev] [PATCH v2] openvswitch: Userspace tunneling.

Thu Nov 6 00:44:08 UTC 2014

On Sun, Nov 02, 2014 at 09:29:28PM -0800, Pravin B Shelar wrote:
> Following patch adds support for userspace tunneling. Tunneling
> needs three more component first is routing table which is configured by
> caching kernel routes and second is ARP cache which build automatically
> by snooping arp. And third is tunnel protocol table which list all
> listening protocols which is populated by vswitchd as tunnel ports
> are added. GRE and VXLAN protocol support is added in this patch.
> 
> Tunneling works as follows:
> On packet receive vswitchd check if this packet is targeted to tunnel
> port. If it is then vswitchd inserts tunnel pop action which pops
> header and sends packet to tunnel port.
> On packet xmit rather than generating Set tunnel action it generate
> tunnel push action which has tunnel header data. datapath can use
> tunnel-push action data to generate header for each packet and
> forward this packet to output port. Since tunnel-push action
> contains most of packet header vswitchd needs to lookup routing
> table and arp table to build this action.
> 
> Signed-off-by: Pravin B Shelar <pshelar at nicira.com>
> Acked-by: Jarno Rajahalme <jrajahalme at nicira.com>
> ---
> Fixed according to comments from Jarno and Ben.
> Added test cases.

This must have been a huge amount of work.  Kudos.

I have a lot of comments but most of them are trivial.  I will divide
the trivial stuff from the stuff that might be bugs.

Potentially Important
---------------------

In netdev_gre_push_header__(), I would use ovs_16aligned_be32 instead
of ovs_be32, to make this code safer on RISC machines.  Ditto for
netdev_gre_build_header__() and for struct vxlanhdr.

I would rate-limit the VLOG_ERR in vxlan_extract_md().

I don't understand why struct udp_header is now marked as packed, or
why struct gre_base_hdr is marked as aligned on a 4-byte boundary, or
why struct vxlanhdr is marked as packed.

I don't think that tnl_arp_snoop() needs to set dl_type in wc, because
dl_type is never wildcarded anyway.

tnl_arp_snoop() doesn't look at nw_dst or arp_tha, so I don't think it
needs to un-wildcard them.

I am a little surprised that tnl_arp_snoop() doesn't need to call
seq_change() when it changes something.

I think that all of the flows added to the tnl-ports classifier have
the same wildcard pattern.  I don't know why a classifier is the best
choice; wouldn't an hmap work?

I am surprised that the user has the ability to enable and disable
tunnel push-pop, in ofproto-dpif.c.  I would think that ofproto-dpif
should find out whether the datapath supports it.  (It is easy to find
out, currently, since only dpif-netdev supports it.)

Trivial
-------

netdev_dummy_ip4addr() uses a %n scan format but does not check the
value scanned.  It would make more sense either to check that the
value scanned is the length of the string, or to omit the %n format.

The new functions in struct netdev_class deserve comments.

We often abbreviate tunnel to "tun" or "tnl".  That is probably one
too many abbreviations already.  I would rename reset_tunl_md() to
follow one of those conventions (or just spell out tunnel).

Do you think that a unixctl command for setting the port range is
better than setting it through options on the Interface in OVSDB?  (It
might be a good idea to document the unixctl command in
ovs-vswitchd.8.)

In format_odp_tunnel_header(), I see a few places that use hexadecimal
formatting without writing 0x or using the # flag, e.g. the "frag" here:
+    ds_put_format(ds, "ipv4(src="IP_FMT",dst="IP_FMT",proto=%"PRIu8
+                  ",tos=%#"PRIx8",ttl=%"PRIu8",frag=%"PRIx16"),",
both fields here:
+        ds_put_format(ds, "vxlan(flags=%"PRIx32",vni=%"PRIx32")",
+                      vxh->vx_flags, vxh->vx_vni);
and all of the fields for GRE.  I prefer to always indicate hexadecimal
explicitly, because it will make it much easier to add support for a
more uniform method of parsing someday in the future (which I do think
will come).

If for some reason that is not fixed (though I prefer to fix it), then
ovs_parse_tnl_push() needs to use %x instead of %i in the same places.
(That tells me that tests/odp.at didn't get updated to test parsing
the new fields, because an update would have caught this discrepancy.
Please update the test.)

In userspace, it is better not to begin identifiers with __, because
those identifiers are reserved to the implementation.  This is why I
usually use a __ suffix instead.  I would rename __tnl_arp_lookup().

In __tnl_arp_lookup(), if arp->br_name and brname are both
null-terminated, then I would suggest using strcmp() instead of
strncmp().  Otherwise, if br_name happens to be longer than IFNAMSIZ-1
bytes, it will still compare equal to arp->br_name; I doubt that that
is desired.  (Maybe it is?  I don't know your goal here.)

The ARP cache unixctl commands,, and the one for tunnel ports, should
probably be documented in ovs-vswitchd(8).

tnl_arp_cache_show() could use ds_put_char_multiple() instead of a
loop (did you really mean <= instead of < in the loop condition?).

I don't like the conditional here in tnl_port_cast() that much:
    if (offsetof(struct tunnel_port, cr) == 0) {
        return CONTAINER_OF(cr, struct tunnel_port, cr);
    } else {
        return cr ? CONTAINER_OF(cr, struct tunnel_port, cr) : NULL;
    }
I think it would be better to build-assert on offset 0 and then just return
a cast to tunnel_port *.  That is less risky standards-wise than using
CONTAINER_OF on a null pointer (eventually, some compiler will get its
revenge for that).

tnl-ports.c includes fat-rwlock.h, but it does not use fat_rwlock.
tnl-ports.c should include <errno.h>, not "errno.h" (actually I am not
sure it uses anything from <errno.h>.

I think that tnl-port.h includes a number of unnecessary headers, too.

Is tnl_port_build_header() on a fast-path?  If so, then maybe the
'rwlock' in that file should become a fatlock.

Thanks a lot!

Ben.