[ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch

Paul Blakey paulb at mellanox.com
Wed Oct 19 15:24:28 UTC 2016

On 12/10/2016 23:36, Pravin Shelar wrote:
> Sorry for jumping in a bit late. I have couple of high level comments below.
> On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim <ronye at mellanox.com> wrote:
>> From: Joe Stringer [mailto:joe at ovn.org]  Sent: Thursday, October 06, 2016 5:06 AM
>>> Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for
>>> openvswitch
>>> On 27 September 2016 at 21:45, Paul Blakey <paulb at mellanox.com> wrote:
>>>> Openvswitch currently configures the kerenel datapath via netlink over an
>>> internal ovs protocol.
>>>> This patch series offers a new provider: dpif-netlink-tc that uses the
>>>> tc flower protocol to offload ovs rules into HW data-path through netdevices
>>> that e.g represent NIC e-switch ports.
>>>> The user can create a bridge with type: datapath_type=dpif-hw-netlink in
>>> order to use this provider.
>>>> This provider can be used to pass the tc flower rules to the HW for HW
>>> offloads.
>>>> Also introducing in this patch series a policy module in which the
>>>> user can program a HW-offload policy. The policy module accept a ovs
>>>> flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY --
>>> currently the policy is to HW offload all rules.
>>>> If the HW_OFFLOAD rule assignment fails the provider will fallback to the
>>> system datapath.
>>>> Flower was chosen b/c its sort of natural to state OVS DP rules for
>>>> this classifier. However, the code can be extended to support other
>>>> classifiers such as U32, eBPF, etc which have HW offloads as well.
>>>> The use-case we are currently addressing is the newly introduced SRIOV
>>>> switchdev mode in the Linux kernel which is introduced in version 4.8
>>>> [1][2]. This series was tested against SRIOV VFs vports representors of the
>>> Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.
>>>> Paul and Shahar.
>>>> [1]
>>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
>>>> 13334e18a74f70c0be58c2eb73af1715325b870
>>>> [2]
>>>> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
>>>> 3d94892e27409bb2b48140207c0273b2ba65f61
>>> Thanks for submitting the series. Clearly this is a topic of interest for multiple
>>> parties, and it's a good starting point to discuss.
>>> A few of us also discussed this topic today at netdev, so I'll list a few points that
>>> we talked about and hopefully others can fill in the bits I miss.
>> Thanks for summarize our meeting today.
>> Attached a link to the pdf pic that show the idea (picture <= 1,000 words)
>> https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing
>>> Positives
>>> * Hardware offload decision is made in a module in userspace
>>> * Layered dpif approach means that the tc-based hardware offload could sit in
>>> front of kernel or userspace datapaths
>>> * Separate dpif means that if you don't enable it, it doesn't affect you. Doesn't
>>> litter another dpif implementation with offload logic.
> Because of better modularity and usage of existing kernel interfaces
> for flow offload, I like this approach.
>>> Drawbacks
>>> * Additional dpif to maintain. Another implementation to change when
>>> modifying dpif interface. Maybe this doesn't change too often, but there has
>>> been some discussions recently about whether the flow_{put,get,del} should be
>>> converted to use internal flow structures rather than OVS netlink
>>> representation. This is one example of potential impact on development.
>> [RONY] you are right, but I don't think we can add it outher way. I think that the approach of use dpif_netlink will saved us a lot of maintenance.
>>> * Fairly limited support for OVS matches and actions. For instance, it is not yet
>>> useful for OVN-style pipeline. But that's not a limitation of the design, just the
>>> current implementation.
>> [RONY] sure, we intend to support OVN and connection tracking, we start with the simple case.
>>> Other considerations
>>> * Is tc flower filter setup rate and stats dump fast enough? How does it
>>> compare to existing kernel datapath flow setup rate? Multiple threads inserting
>>> at once? How many filters can be dumped per second?
>>> etc.
>> [RONY] we will test it, and will try to improve the TC if it will be needed
> I think there are two part in flow offloading.
> 1. Time spent to Add the flow to TC.
> 2. Time spent on pushing the flow to hardware.
> It would be interesting to know which one is dominant in this case.

We achieve about 1K rule insertions per second, we will be looking into 
the time distribution.

>>> * Currently for a given flow, it will exist in either the offloaded implementation
>>> or the kernel datapath. Statistics are only drawn from one location. This is
>>> consistent with how ofproto-dpif-upcall will insert flows - one flow_put
>>> operation and one flow is inserted into the datapath. Correspondingly there is
>>> one udpif_key which reflects the most recently used stats for this datapath
>>> flow. There may be situations where flows need to be in both datapaths, in
>>> which case there either needs to be either one udpif_key per datapath
>>> representation of the flow, or the dpif must hide the second flow and aggregate
>>> stats.
>> [RONY] as you wrote the dpif responsible to hide it, if the flow is offloaded to the HW this traffic won't came to the datapath
>> We will handle it when we will support this combination.
>>> Extra, not previously discussed
>>> * Testing - we may want a mode where tc flower is used in software mode, to
>>> test the tc netlink interface. It would be good to see extension of kernel module
>>> testsuite to at least test some basics of the interface, perhaps also the flower
>>> behaviour (though that may be out of scope of the testsuite in the OVS tree).
> I have question about hardware offload capability. How the
> capabilities are checked in the dpif module for accelerating
> particular flow? or is it try to offload and fallback to software
> datapath in case of an error?

For now there is no kernel API to check offload capabilities so our dpif 
provider will try and offload any flow that the can be entirely
translated from OVS attributes to TC/flower terms as the dummy offload 
policy always returns HW offload. The offload policy could check
HW capabilities at a later date if such API is exposed. If a unsupported 
OVS attribute is masked out (wildcard/zeroed mask) it is ignored,
and if it's supported but TC doesn't support masking it, an exact match 
is taken. If TC/flower/HW fails to offload it, it will be directed to
dpif-netlink for software datapath.

More information about the dev mailing list