[ovs-dev] plan to integrate Hyper-V support

Eitan Eliahu eliahue at vmware.com
Fri Jul 25 15:52:20 UTC 2014

Hi Alessandro,
We definitely need to have a tight coordination at least until we have a working version of the integrated software.
On the multiple instances of the datapath: the issue is less related to implementation (most of the data is per- device object data) rather than a design decision we made before. We would be glad to discuss this issue.
Please let us get back to you on the  meetings, we definitely need to correlate our effort  for the next phase to take place.

-----Original Message-----
From: Alessandro Pilotti [mailto:apilotti at cloudbasesolutions.com] 
Sent: Friday, July 25, 2014 8:26 AM
To: Eitan Eliahu
Cc: Ben Pfaff; dev at openvswitch.org; Alin Serdean; Saurabh Shah; Nithin Raju; Guolin Yang; Rajiv Krishnamurthy; Peter Pouliot
Subject: Re: plan to integrate Hyper-V support

Hi Eitan,

On 25 Jul 2014, at 18:08, Eitan Eliahu <eliahue at vmware.com> wrote:

> Hi Alessandro,
> I would let Ben to respond to your full message but wanted to comment about few potential issues you raised:
> [1] Yesterday we posted a new patch which includes a totally new rewrite of the base code. This would resolves the licensing issue.
> [2] We leveraged the NDIS NdisAllocateFragmentNetBufferList service 
> and we basically avoid any copy of data (with the exception of(~50 bytes)  headers which must be span a single physical memory) [3] I currently don't see an issue with VMQ (we take care of "safe" memory) or RSS but it would be great of someone from MS could take another look into it.

Thanks, we're currently reviewing your new patches.

> [4] We would like to discuss the OpenStach provisions. We spent some time looking into and it seems there is a difference in approaches (we don't create multiple instances of the datapath or Hyper-V switches per each Virtual Network).

OpenStack is our main use case. Support for multiple datapaths and multiple vswitches is defintely necessary and not too difficult if properly designed from the beginning.
The expected changes in the Neutron OVS agent code are fairly minimal.

> [5] Offload encapsulation is important for GENEVE as well. I would expect NDIS support for it. We might a need special intermediate support though.

Agreed. I limited the list to VXLAN and GRE because that's what we had ATM, but suppot GENEVE offloading support has definitely to be included.

> [6] On the NetLink implementation, it would be nice if we could look on issues like scaling across cores and kernel initiated, asynchronous completion of events and upcalls. 

Definitely. What about scheduling a weekly IRC meeting to discuss this and other topics?

> We are looking forward to working  with you and your team.

Same here! 



> Thank you,
> Eitan
> -----Original Message-----
> From: Alessandro Pilotti [mailto:apilotti at cloudbasesolutions.com]
> Sent: Friday, July 25, 2014 6:48 AM
> To: Ben Pfaff
> Cc: dev at openvswitch.org; Alin Serdean; Saurabh Shah; Nithin Raju; 
> Guolin Yang; Eitan Eliahu; Rajiv Krishnamurthy; Peter Pouliot
> Subject: Re: plan to integrate Hyper-V support
> Hi Ben,
> Currently both the Cloudbase and VMWare implementations initially proposed to the ML need non trivial amounts of rework before being submitted.
> On the Cloudbase side we already addressed most of the points that you noted and we are about to submit the new patches, but that is not the topic I'd like to discuss here. 
> IMO the real architectural difference between the two initial proposals was related to the Netlink vs separate component approach.
> Now that we agreed on the Netlink option, the rest are mostly minor differences and implementation details.
> I suggest at this point to stop working on separate implementations and join forces.
> Beside using the ML, we could simply hold meetings on #openvswitch and distribute the work across the Cloudbase and VMWare teams (and any other community contributors of course).
> This way we can surely produce a better implementation, reach a stable version faster and spare precious review bandwidth.
> As already pointed out, there is also a blocker in the current VMWare implementation and to some extent the Cloudbase one related to the licensing of the Microsoft forwarding extension sample, due to the incompatibility between MS-PL and Apache 2.
> We already asked Microsoft to re-license the sample as Apache 2 (or an Apache 2 compatible license) and we are currently waiting for a reply.
> One point which was not discussed is performance. Beside having logarithmic or anyway sub-linear access times to the relevant data structures and avoid unnecessary packet copies (where possible) as pointed out, we need to make sure that we can leverage properly native features like VMQ and avoid design bottlenecks. Allowing encapsulation offloading (VXLAN, GRE) for 3rd parties should also be taken into consideration.
> Going forward, we should also think ASAP about continuous integration testing (CI), as otherwise the upstream code will become very brittle once this effort merges. We gained a lot of experience in developing and running the Hyper-V CI in OpenStack together with Microsoft (including Nova and Neutron) so we could provide IMHO some knowledgeable input on how to do OVS testing on Hyper-V.
> I am also adding Peter from Microsoft in copy for this topic.
> It would also be very important to target reference hardware with state of the art NICs (e.g. 40GbE) to make sure that bottlenecks are properly identified.
> Thanks,
> Alessandro
> On 25 Jul 2014, at 03:03, Ben Pfaff <blp at nicira.com> wrote:
>> After thinking about the two Hyper-V ports of Open vSwitch, I have a 
>> proposal to make progress on upstreaming.
>> After reviewing code, following the on-list discussions, and 
>> discussing both implementations with various people, I believe that 
>> the following are important points of comparision between the VMware 
>> and Cloudbase ports to Hyper-V (which I'll just call "VMware" and 
>> "Cloudbase" for brevity):
>>   * Cloudbase uses the forward- and backward-compatible Netlink
>>     protocol whereas VMware uses an ad hoc protocol that is not as
>>     flexible.
>>     However: compared to the OVS userspace implementation (or the
>>     Linux kernel implmentation), the Cloudbase Netlink
>>     implementation requires much more error-prone and redundant code
>>     to implement messages in common cases.  (I mentioned a specific
>>     case in OFPort_New() in my earlier review.)
>>   * Cloudbase supports megaflows (but the search is linear).  VMware
>>     has a more efficient hash-based classifier (but no megaflows).
>>   * Cloudbase appears to deep-copy packets in many circumstances.
>>     VMware includes a buffer management library and uses it to avoid
>>     unneeded packet copies
>>   * VXLAN:
>>     - Cloudbase VXLAN implementation appears to be complicated by
>> 	support for features not used by userspace, e.g. multiple
>> 	VXLAN ports for specific tunnel outer IP addresses (userspace
>> 	only ever sets up one VXLAN port per VXLAN UDP port and
>> 	multiplexes all traffic through that).
>>     - Cloudbase VXLAN code doesn't reassemble fragmented outer
>> 	packets.  (The VMware implementation does.)
>>     - VMware VXLAN implementation supports flow lookup for outer
>> 	packet, enabling the common use case where both the inner and
>> 	outer packets go through an OVS bridge.
>>     - VMware VXLAN implementation uses Windows builtin features to
>> 	the extent possible.  The Cloudbase implementation implements
>> 	its own ARP stack and seems to aim toward ICMP support.
>>     - VMware VXLAN implementation can't configure the UDP port.
>>   * GRE: Cloudbase include support, VMware does not.
>>   * Cloudbase lacks a mechanism to notify userspace of changes
>>     (e.g. new ports).  VMware has one.
>>   * Cloudbase coding style is inconsistent enough to make reading
>>     some of the code difficult.  VMware style is more consistent
>>     (with a style guide!).
>>   * Cloudbase code contains much cut-and-pasted code duplication.
>>   * I've heard conflicting stability and performance test results
>>     for both implementations.  It seems likely that in practice both
>>     are still works in progress on these fronts.
>>   * Cloudbase has an installer.
>>   * VMware integration requires an extra daemon that is not yet
>>     written.
>>   * Cloudbase supports checksum offloading.  However, it appears
>>     that the kernel does not complete offloads (e.g. checksum, TSO)
>>     before sending packets to userspace, though it should.
>>     Similarly, looks like the kernel does not handle TSO packets
>>     from virtual interfaces.
>>   * Cloudbase has a powershell extension to persistently name a port
>>     (and other powershell extensions that might also be useful).
>>   * Cloudbase has the ability to set AllowManagementIP to false.  (I
>>     don't understand what this means.)
>>   * It looks like the Cloudbase "execute" operation creates a flow.
>>     It should not.
>>   * Cloudbase may have more thorough RX parsing in some cases.
>>     However, the code appears to validate the header format for many
>>     protocols, although it should not.
>> In short, both implementations have strengths and weaknesses.  Each 
>> of them could evolve by eliminating their weaknesses and adding the 
>> strengths of the other, converging to meet somewhere in the middle, 
>> but this would duplicate work and waste developer time.  A better 
>> approach would be for both teams to collaborate on a common codebase.
>> Here is what I want to do:
>>   * Apply the next version of Alin's series to support Netlink
>>     userspace for Windows (as long as it doesn't break other
>>     platforms) to master.
>>   * Apply the next version of the VMware patch series to add Windows
>>     kernel support, again assuming that it doesn't break other
>>     platforms.  It is my judgment that, on balance, this is a better
>>     place to start.
>>     At this point we will have Windows userspace and kernel support,
>>     but they will be incompatible because they do not understand the
>>     same protocol.
>>   * Take granular patches to improve what we have.  The present
>>     difficulties have arisen because of duplicated work, which
>>     happened because of large units of work and lack of visibility.
>>     If work continues to happen in large chunks, and visibility into
>>     what is happening is poor, then this may happen again.  On the
>>     other hand, if work is done in small chunks, at most a small
>>     amount of work is duplicated at one time, and if developers
>>     communicate about what they are working on, then they naturally
>>     avoid duplicating work.  I want to encourage all OVS developers
>>     to adopt both strategies: work on one logical change at a time,
>>     then post it to ovs-dev, as well as communicating on ovs-dev
>>     about what you are working on.
>>     By the way: applying a patch to OVS master does not normally
>>     need time-consuming formal testing and QA cycles.  In the OVS
>>     development workflow, we stabilize release branches, not the
>>     master branch.  The master branch usually does remain quite
>>     stable due to the quantity and quality of unit tests, and we fix
>>     bugs as we notice them (or users report them), but bugs on
>>     master are not critical problems.
>> I hope that initial patches will focus on adding Netlink support to 
>> the kernel module, based on the userspace Netlink code or the 
>> Cloudbase kernel implementation or some combination.  Here is a list 
>> of other features that I'd like to see added, based on the list of 
>> differences above.  To the extent that it makes sense, I hope that 
>> this could reuse code from the Cloudbase implementation rather than 
>> duplicating work further:
>>   * The Windows support daemon required for full integration.
>>   * Megaflow classifier.
>>   * GRE support.
>>   * Installer.
>>   * Support for configurable VXLAN UDP port (I think the Linux
>>     kernel module can support VXLAN on multiple UDP ports; that's
>>     nice to have but less important than a configurable port
>>     number).
>>   * Support for checksum offloading and any other offloads that make
>>     sense.
>>   * Persistent port naming.
>> I think that when we're done, we'll end up with a flexible and 
>> well-integrated HyperV port.
>> Thanks,
>> Ben.

More information about the dev mailing list