[ovs-dev] plan to integrate Hyper-V support

Fri Jul 25 13:47:50 UTC 2014

Hi Ben,

Currently both the Cloudbase and VMWare implementations initially proposed to
the ML need non trivial amounts of rework before being submitted.
On the Cloudbase side we already addressed most of the points that you noted
and we are about to submit the new patches, but that is not the topic I’d like
to discuss here. 

IMO the real architectural difference between the two initial proposals was
related to the Netlink vs separate component approach.
Now that we agreed on the Netlink option, the rest are mostly minor differences
and implementation details.

I suggest at this point to stop working on separate implementations and join
forces.

Beside using the ML, we could simply hold meetings on #openvswitch and
distribute the work across the Cloudbase and VMWare teams (and any other 
community contributors of course).
This way we can surely produce a better implementation, reach a stable version
faster and spare precious review bandwidth.

As already pointed out, there is also a blocker in the current VMWare
implementation and to some extent the Cloudbase one related to the licensing of
the Microsoft forwarding extension sample, due to the incompatibility between
MS-PL and Apache 2.
We already asked Microsoft to re-license the sample as Apache 2 (or an Apache 2
compatible license) and we are currently waiting for a reply.

One point which was not discussed is performance. Beside having logarithmic
or anyway sub-linear access times to the relevant data structures and avoid
unnecessary packet copies (where possible) as pointed out, we need to make sure
that we can leverage properly native features like VMQ and avoid design
bottlenecks. Allowing encapsulation offloading (VXLAN, GRE) for 3rd parties
should also be taken into consideration.

Going forward, we should also think ASAP about continuous integration testing
(CI), as otherwise the upstream code will become very brittle once this effort
merges. We gained a lot of experience in developing and running the Hyper-V CI
in OpenStack together with Microsoft (including Nova and Neutron) so we could
provide IMHO some knowledgeable input on how to do OVS testing on Hyper-V.
I am also adding Peter from Microsoft in copy for this topic.

It would also be very important to target reference hardware with state of the
art NICs (e.g. 40GbE) to make sure that bottlenecks are properly identified.

Thanks,

Alessandro

On 25 Jul 2014, at 03:03, Ben Pfaff <blp at nicira.com> wrote:

> After thinking about the two Hyper-V ports of Open vSwitch, I have a
> proposal to make progress on upstreaming.
> 
> After reviewing code, following the on-list discussions, and
> discussing both implementations with various people, I believe that
> the following are important points of comparision between the VMware
> and Cloudbase ports to Hyper-V (which I'll just call "VMware" and
> "Cloudbase" for brevity):
> 
>    * Cloudbase uses the forward- and backward-compatible Netlink
>      protocol whereas VMware uses an ad hoc protocol that is not as
>      flexible.
> 
>      However: compared to the OVS userspace implementation (or the
>      Linux kernel implmentation), the Cloudbase Netlink
>      implementation requires much more error-prone and redundant code
>      to implement messages in common cases.  (I mentioned a specific
>      case in OFPort_New() in my earlier review.)
> 
>    * Cloudbase supports megaflows (but the search is linear).  VMware
>      has a more efficient hash-based classifier (but no megaflows).
> 
>    * Cloudbase appears to deep-copy packets in many circumstances.
>      VMware includes a buffer management library and uses it to avoid
>      unneeded packet copies
> 
>    * VXLAN:
> 
>      - Cloudbase VXLAN implementation appears to be complicated by
> 	support for features not used by userspace, e.g. multiple
> 	VXLAN ports for specific tunnel outer IP addresses (userspace
> 	only ever sets up one VXLAN port per VXLAN UDP port and
> 	multiplexes all traffic through that).
> 
>      - Cloudbase VXLAN code doesn't reassemble fragmented outer
> 	packets.  (The VMware implementation does.)
> 
>      - VMware VXLAN implementation supports flow lookup for outer
> 	packet, enabling the common use case where both the inner and
> 	outer packets go through an OVS bridge.
> 
>      - VMware VXLAN implementation uses Windows builtin features to
> 	the extent possible.  The Cloudbase implementation implements
> 	its own ARP stack and seems to aim toward ICMP support.
> 
>      - VMware VXLAN implementation can't configure the UDP port.
> 
>    * GRE: Cloudbase include support, VMware does not.
> 
>    * Cloudbase lacks a mechanism to notify userspace of changes
>      (e.g. new ports).  VMware has one.
> 
>    * Cloudbase coding style is inconsistent enough to make reading
>      some of the code difficult.  VMware style is more consistent
>      (with a style guide!).
> 
>    * Cloudbase code contains much cut-and-pasted code duplication.
> 
>    * I've heard conflicting stability and performance test results
>      for both implementations.  It seems likely that in practice both
>      are still works in progress on these fronts.
> 
>    * Cloudbase has an installer.
> 
>    * VMware integration requires an extra daemon that is not yet
>      written.
> 
>    * Cloudbase supports checksum offloading.  However, it appears
>      that the kernel does not complete offloads (e.g. checksum, TSO)
>      before sending packets to userspace, though it should.
>      Similarly, looks like the kernel does not handle TSO packets
>      from virtual interfaces.
> 
>    * Cloudbase has a powershell extension to persistently name a port
>      (and other powershell extensions that might also be useful).
> 
>    * Cloudbase has the ability to set AllowManagementIP to false.  (I
>      don't understand what this means.)
> 
>    * It looks like the Cloudbase "execute" operation creates a flow.
>      It should not.
> 
>    * Cloudbase may have more thorough RX parsing in some cases.
>      However, the code appears to validate the header format for many
>      protocols, although it should not.
> 
> In short, both implementations have strengths and weaknesses.  Each of
> them could evolve by eliminating their weaknesses and adding the
> strengths of the other, converging to meet somewhere in the middle,
> but this would duplicate work and waste developer time.  A better
> approach would be for both teams to collaborate on a common codebase.
> 
> Here is what I want to do:
> 
>    * Apply the next version of Alin's series to support Netlink
>      userspace for Windows (as long as it doesn't break other
>      platforms) to master.
> 
>    * Apply the next version of the VMware patch series to add Windows
>      kernel support, again assuming that it doesn't break other
>      platforms.  It is my judgment that, on balance, this is a better
>      place to start.
> 
>      At this point we will have Windows userspace and kernel support,
>      but they will be incompatible because they do not understand the
>      same protocol.
> 
>    * Take granular patches to improve what we have.  The present
>      difficulties have arisen because of duplicated work, which
>      happened because of large units of work and lack of visibility.
>      If work continues to happen in large chunks, and visibility into
>      what is happening is poor, then this may happen again.  On the
>      other hand, if work is done in small chunks, at most a small
>      amount of work is duplicated at one time, and if developers
>      communicate about what they are working on, then they naturally
>      avoid duplicating work.  I want to encourage all OVS developers
>      to adopt both strategies: work on one logical change at a time,
>      then post it to ovs-dev, as well as communicating on ovs-dev
>      about what you are working on.
> 
>      By the way: applying a patch to OVS master does not normally
>      need time-consuming formal testing and QA cycles.  In the OVS
>      development workflow, we stabilize release branches, not the
>      master branch.  The master branch usually does remain quite
>      stable due to the quantity and quality of unit tests, and we fix
>      bugs as we notice them (or users report them), but bugs on
>      master are not critical problems.
> 
> I hope that initial patches will focus on adding Netlink support to
> the kernel module, based on the userspace Netlink code or the
> Cloudbase kernel implementation or some combination.  Here is a list
> of other features that I'd like to see added, based on the list of
> differences above.  To the extent that it makes sense, I hope that
> this could reuse code from the Cloudbase implementation rather than
> duplicating work further:
> 
>    * The Windows support daemon required for full integration.
> 
>    * Megaflow classifier.
> 
>    * GRE support.
> 
>    * Installer.
> 
>    * Support for configurable VXLAN UDP port (I think the Linux
>      kernel module can support VXLAN on multiple UDP ports; that's
>      nice to have but less important than a configurable port
>      number).
> 
>    * Support for checksum offloading and any other offloads that make
>      sense.
> 
>    * Persistent port naming.
> 
> I think that when we're done, we'll end up with a flexible and
> well-integrated HyperV port.
> 
> Thanks,
> 
> Ben.