[ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch

Pravin Shelar pshelar at ovn.org
Wed Oct 12 20:36:44 UTC 2016


Sorry for jumping in a bit late. I have couple of high level comments below.

On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim <ronye at mellanox.com> wrote:
> From: Joe Stringer [mailto:joe at ovn.org]  Sent: Thursday, October 06, 2016 5:06 AM
>>
>> Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for
>> openvswitch
>>
>> On 27 September 2016 at 21:45, Paul Blakey <paulb at mellanox.com> wrote:
>> > Openvswitch currently configures the kerenel datapath via netlink over an
>> internal ovs protocol.
>> >
>> > This patch series offers a new provider: dpif-netlink-tc that uses the
>> > tc flower protocol to offload ovs rules into HW data-path through netdevices
>> that e.g represent NIC e-switch ports.
>> >
>> > The user can create a bridge with type: datapath_type=dpif-hw-netlink in
>> order to use this provider.
>> > This provider can be used to pass the tc flower rules to the HW for HW
>> offloads.
>> >
>> > Also introducing in this patch series a policy module in which the
>> > user can program a HW-offload policy. The policy module accept a ovs
>> > flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY --
>> currently the policy is to HW offload all rules.
>> >
>> > If the HW_OFFLOAD rule assignment fails the provider will fallback to the
>> system datapath.
>> >
>> > Flower was chosen b/c its sort of natural to state OVS DP rules for
>> > this classifier. However, the code can be extended to support other
>> > classifiers such as U32, eBPF, etc which have HW offloads as well.
>> >
>> > The use-case we are currently addressing is the newly introduced SRIOV
>> > switchdev mode in the Linux kernel which is introduced in version 4.8
>> > [1][2]. This series was tested against SRIOV VFs vports representors of the
>> Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.
>> >
>> > Paul and Shahar.
>> >
>> > [1]
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
>> > 13334e18a74f70c0be58c2eb73af1715325b870
>> > [2]
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
>> > 3d94892e27409bb2b48140207c0273b2ba65f61
>>
>> Thanks for submitting the series. Clearly this is a topic of interest for multiple
>> parties, and it's a good starting point to discuss.
>>
>> A few of us also discussed this topic today at netdev, so I'll list a few points that
>> we talked about and hopefully others can fill in the bits I miss.
> Thanks for summarize our meeting today.
> Attached a link to the pdf pic that show the idea (picture <= 1,000 words)
> https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing
>
>>
>> Positives
>> * Hardware offload decision is made in a module in userspace
>> * Layered dpif approach means that the tc-based hardware offload could sit in
>> front of kernel or userspace datapaths
>> * Separate dpif means that if you don't enable it, it doesn't affect you. Doesn't
>> litter another dpif implementation with offload logic.
>>
Because of better modularity and usage of existing kernel interfaces
for flow offload, I like this approach.

>> Drawbacks
>> * Additional dpif to maintain. Another implementation to change when
>> modifying dpif interface. Maybe this doesn't change too often, but there has
>> been some discussions recently about whether the flow_{put,get,del} should be
>> converted to use internal flow structures rather than OVS netlink
>> representation. This is one example of potential impact on development.
> [RONY] you are right, but I don't think we can add it outher way. I think that the approach of use dpif_netlink will saved us a lot of maintenance.
>> * Fairly limited support for OVS matches and actions. For instance, it is not yet
>> useful for OVN-style pipeline. But that's not a limitation of the design, just the
>> current implementation.
> [RONY] sure, we intend to support OVN and connection tracking, we start with the simple case.
>>
>> Other considerations
>> * Is tc flower filter setup rate and stats dump fast enough? How does it
>> compare to existing kernel datapath flow setup rate? Multiple threads inserting
>> at once? How many filters can be dumped per second?
>> etc.
> [RONY] we will test it, and will try to improve the TC if it will be needed
>
I think there are two part in flow offloading.
1. Time spent to Add the flow to TC.
2. Time spent on pushing the flow to hardware.

It would be interesting to know which one is dominant in this case.

>> * Currently for a given flow, it will exist in either the offloaded implementation
>> or the kernel datapath. Statistics are only drawn from one location. This is
>> consistent with how ofproto-dpif-upcall will insert flows - one flow_put
>> operation and one flow is inserted into the datapath. Correspondingly there is
>> one udpif_key which reflects the most recently used stats for this datapath
>> flow. There may be situations where flows need to be in both datapaths, in
>> which case there either needs to be either one udpif_key per datapath
>> representation of the flow, or the dpif must hide the second flow and aggregate
>> stats.
> [RONY] as you wrote the dpif responsible to hide it, if the flow is offloaded to the HW this traffic won't came to the datapath
> We will handle it when we will support this combination.
>>
>> Extra, not previously discussed
>> * Testing - we may want a mode where tc flower is used in software mode, to
>> test the tc netlink interface. It would be good to see extension of kernel module
>> testsuite to at least test some basics of the interface, perhaps also the flower
>> behaviour (though that may be out of scope of the testsuite in the OVS tree).
>>

I have question about hardware offload capability. How the
capabilities are checked in the dpif module for accelerating
particular flow? or is it try to offload and fallback to software
datapath in case of an error?



More information about the dev mailing list