[ovs-dev] proposed flow key compatibility rules

Ben Pfaff blp at nicira.com
Fri Nov 4 19:22:08 UTC 2011


I'm working on a file that would go in Documentation/networking in the
kernel tree and probably in datapath/README in the OVS tree.  It
describes OVS in general just a little but it's mostly about flow key
compatibility rules.  It actually proposes a change to how we do VLANs
in flow keys (which I haven't implemented yet) so at the very least I
need feedback on that.  It probably needs to be extended slightly too.

I'm also thinking about changing the flow key format by dropping the
ordering restrictions.  There's no real benefit to them unless
anything is actually sensitive to ordering (e.g. we allow duplicate
attributes, which my proposal below would avoid).  I've already
implemented the userspace half of this as part of another change that
I'm working on.

Thanks,

Ben.

----------------------------------------------------------------------

Open vSwitch datapath developer documentation
=============================================

The Open vSwitch kernel module allows flexible userspace control over
flow-level packet processing on selected network devices.  It can be
used to implement a plain Ethernet switch, an OpenFlow-enabled switch,
network device bonding, VLAN processing, network access control, and
so on.

The kernel module implements multiple "datapaths" (analogous to
bridges), each of which can have multiple "vports" (analogous to ports
within a bridge).  Each datapath also has associated with it a "flow
table" that userspace populates with "flows" that map from keys based
on packet headers and metadata to sets of actions.  The most common
action forwards the packet to another vport; other actions are also
implemented.

When a packet arrives on a vport, the kernel module processes it by
extracting its flow key and looking it up in the flow table.  If there
is a matching flow, it executes the associated actions.  If there is
no match, it queues the packet to userspace for processing (as part of
its processing, userspace will likely set up a flow to handle further
packets of the same type entirely in-kernel).


Flow key compatibility
----------------------

Network protocols evolve over time.  New protocols become important
and existing protocols lose their prominence.  For the Open vSwitch
kernel module to remain relevant, it must be possible for newer
versions to parse additional protocols as part of the flow key.  It
might even be desirable, someday, to drop support for parsing
protocols that have become obsolete.  Therefore, the Netlink interface
to Open vSwitch is designed to allow carefully written userspace
applications to work with any version of the flow key, past or future.

To support this forward and backward compatibility, whenever the
kernel module passes a packet to userspace, it also passes along the
flow key that it parsed from the packet.  Userspace then extracts its
own notion of a flow key from the packet and compares its against the
kernel-provided version:

    - If userspace's notion of the flow key for the packet matches the
      kernel's, then nothing special is necessary.

    - If the kernel's flow key includes more fields than the userspace
      version of the flow key, for example if the kernel decoded IPv6
      headers but userspace stopped at the Ethernet type (because it
      does not understand IPv6), then again nothing special is
      necessary.  Userspace can still set up a flow in the usual way,
      as long as it uses the kernel-provided flow key to do it.

    - If the userspace flow key includes more fields than the
      kernel's, for example if userspace decoded an IPv6 header but
      the kernel stopped at the Ethernet type, then userspace can
      forward the packet manually, without setting up a flow in the
      kernel.  This case is bad for performance because every packet
      must go to userspace, but the forwarding behavior is correct.
      (If userspace can determine that the values of the extra fields
      would not affect forwarding behavior, then it could set up a
      flow anyway.)

How flow keys evolve over time is important to making this work, so
the following sections go into detail.


Flow key format
---------------

A flow key is passed over a Netlink socket as a sequence of Netlink
attributes.  Some attributes represent packet metadata, defined as any
information about a packet that cannot be extracted from the packet
itself, e.g. the vport on which the packet was received.  Most
attributes, however, are extracted from headers within the packet,
e.g. source and destination addresses from Ethernet, IP, or TCP
headers.

The <linux/openvswitch.h> header file defines the exact format of the
flow key attributes.  For informal explanatory purposes here, we write
them as comma-separated strings, with parentheses indicating arguments
and nesting.  For example, the following could represent a flow key
corresponding to a TCP packet that arrived on vport 1:

    in_port(1),eth(src=e0:91:f5:21:d0:b2,dst=00:02:e3:0f:80:a4),eth_type(0x0800),ipv4(src=172.16.0.20,dst=172.18.0.52,proto=17,tos=0,frag=no),tcp(src=49163,dst=80)

Often we ellipsize arguments not important to the discussion, e.g.:

    in_port(1),eth(...),eth_type(0x0800),ipv4(...),tcp(...)


Rules for evolving flow keys
----------------------------

Some care is needed to really maintain forward and backward
compatibility for applications that follow the rules listed under
"Flow key compatibility" above.

The basic rule is obvious:

    ------------------------------------------------------------------
    New network protocol support must only supplement existing flow
    key attributes.  It must not change the meaning of already defined
    flow key attributes.
    ------------------------------------------------------------------

This rule does have less-obvious consequences so it is worth working
through a few examples.  Suppose, for example, that the kernel module
did not already implement VLAN parsing.  Instead, it just interpreted
the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
packet.  The flow key for any packet with an 802.1Q header would look
essentially like this, ignoring metadata:

    eth(...),eth_type(0x8100)

Naively, to add VLAN support, it makes sense to add a new "vlan" flow
key attribute to contain the VLAN tag, then continue to decode the
encapsulated headers beyond the VLAN tag using the existing field
definitions.  With this change, an TCP packet in VLAN 10 would have a
flow key much like this:

    eth(...),vlan(vid=10,pcp=0),eth_type(0x800),ip(proto=6,...),tcp(...)

But this change would negatively affect a userspace application that
has not been updated to understand the new "vlan" flow key attribute.
The application could, following the flow compatibility rules above,
ignore the "vlan" attribute that it does not understand and therefore
assume that the flow contained IP packets.  This is a bad assumption
(the flow only contains IP packets if one parses and skips over the
802.1Q header) and it could cause the application's behavior to change
across kernel versions even though it follows the compatibility rules.

The solution is to use a set of nested attributes.  This is, for
example, why 802.1Q support uses nested attributes.  A TCP packet in
VLAN 10 is actually expressed as:

    eth(...),eth_type(0x8100),vlan(vid=10,pcp=0,eth_type(0x0800),ip(proto=6,...),tcp(...))

Notice how the encapsulated "eth_type", "ip", and "tcp" flow key
attributes are nested inside the "vlan" attribute.  Thus, an
application that does not understand the "vlan" key will not see
either of those attributes and therefore will not misinterpret them.
(Also, the outer eth_type is still 0x8100, not changed to 0x0800.)



More information about the dev mailing list