[ovs-dev] [PATCH ovs V2 00/21] Introducing HW offload support for openvswitch

Chandran, Sugesh sugesh.chandran at intel.com
Mon Feb 6 16:50:25 UTC 2017


Hi Roi,
Thank you for your replay, I was on vacation and couldn't get back to you on this before.
Please find my reply inline


Regards
_Sugesh


> -----Original Message-----
> From: Roi Dayan [mailto:roid at mellanox.com]
> Sent: Monday, January 16, 2017 12:57 PM
> To: Paul Blakey <paulb at mellanox.com>; Chandran, Sugesh
> <sugesh.chandran at intel.com>; Joe Stringer <joe at ovn.org>
> Cc: roid at mellanox.com; ovs dev <dev at openvswitch.org>; Shahar Klein
> <shahark at mellanox.com>; Mark Bloch <markb at mellanox.com>; Simon
> Horman <simon.horman at netronome.com>; Hadar Hen Zion
> <hadarh at mellanox.com>; Rony Efraim <ronye at mellanox.com>; Jiri Pirko
> <jiri at mellanox.com>; Eric Garver <e at erig.me>; Marcelo Ricardo Leitner
> <mleitner at redhat.com>; Or Gerlitz <ogerlitz at mellanox.com>; Andy
> Gospodarek <andy at greyhouse.net>
> Subject: Re: [ovs-dev] [PATCH ovs V2 00/21] Introducing HW offload support
> for openvswitch
> 
> 
> 
> On 16/01/2017 11:04, Paul Blakey wrote:
> >
> >
> > On 16/01/2017 11:03, Paul Blakey wrote:
> >>
> >> On 13/01/2017 02:42, Chandran, Sugesh wrote:
> >>> Hi Paul,
> >>>
> >>> Thank you for releasing the patch set. I have few high level
> >>> comments on the patch set as below.
> >>>
> >>>  My first concern is not providing the device (for eg: A FPGA)
> >>> specific initialization support. This has to be done based on the
> >>> user requirement/use case. Consider the P4 based OVS switch which
> >>> wanted to define a FPGA behavior/profile at the time of OVS
> >>> initialization. This profile defines set of features/number of flow
> >>> fields to be enabled on the hardware for the specific application. I
> >>> feel there must be some option for control-path to init the entire
> hardware module.
> >>> Some of the FPGA can offer dynamic feature initialization based on
> >>> application need. This is something like loading different firmware
> >>> on the fly based on what application profile is. Each firmware
> >>> varies in performance and scalability figures hence its worth to
> >>> define the profile on the need basis.  Do you think its possible to
> >>> offer this support with current implementation model?
> 
> Hi Sugesh,
> 
> What about netdev_initialize?
> We didn't need it currently but can be added later when needed.
[Sugesh] I feel that would be fine.
> 
> >>>
> >>> According to my understanding, OVS try to install the flows in
> >>> hardware first if it is enabled. The flows are fallback to software
> >>> only when hw insert is failed. I would like to have flexibility of
> >>> defining the flow installation behavior from the control path than
> >>> just based on netdev capacity.  Consider a multi-tenant deployment
> >>> with limited hardware offload support (Say hardware supports only
> >>> limited number of flows), OVS should install only priority flows in
> >>> hardware to offer better performance. The current implementation
> >>> operates in first come first serve model, which may not be optimal.
> 
> This can be added later as a feature. here, as you understood, we added the
> basic flow which is if the user enabled hw-offload then try to offload and
> fallback to software.
[Sugesh] Sure.
> 
> >>>
> >>> From an orchestration point of view, how system would know the port
> >>> can do hardware offload/not. More specifically how system could know
> >>> the capabilities of hardware offload. In many cases this information
> >>> is sufficient enough to identify a flow is hardware ready/not, which
> >>> avoid the cost of hardware interaction at the time of flow insert.
> >>> May be port can be added as software port, which will populate the
> >>> OVSDB with hardware capability information. The orchestration can
> >>> use the exposed hardware capability matrix to determine whether or
> >>> not to enable the hardware offload on that port.
> 
> We definitely thought about it but didn't do it yet. currently we just let the
> user enable or disable the hw-offload which by default is disabled.
[Sugesh] ok
> 
> >>>
> >>> Just to confirm, the provided APIs should be implemented for virtual
> >>> ports(virtio-vhost ports) as well. Is that the case?
> 
> you mean netdev providers? then yes. can be NULLs though.
[Sugesh] Ok
> 
> >>>
> >>> I assume the flow between hardware ports to software only ports
> >>> always programmed in the software. The packets from hardware port
> >>> are arrived as normal packet in OVS-DP.  Is that correct?
> 
> can you give an example?
> do you mean like veth? then if the TC policy is skip_sw then yes.
> we can't add HW offload rule for veth so we'll fallback to OVS datapath.
[Sugesh]Ok. And this is true for the virtio ports, where the vhost backend is
in software.?
For eg: the way how the north-south traffic is treated when the VM port is a software only port.
> 
> >>>
> >>> The hardware flow API are not defined for the dpdk netdev. It may
> >>> fail for OVS-DPDK compilation.
> 
> Right. we'll add the missing api (NULLs currently) to make sure we pass
> compilation. we'll make sure to test ovs-dpdk compilation for next
> submission.
[Sugesh] Thank you for that.
> 
> Thanks,
> Roi
> 
> >>>
> >>>
> >>> Regards
> >>> _Sugesh
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> >>>> bounces at openvswitch.org] On Behalf Of Joe Stringer
> >>>> Sent: Tuesday, January 10, 2017 6:09 PM
> >>>> To: Paul Blakey <paulb at mellanox.com>
> >>>> Cc: ovs dev <dev at openvswitch.org>; Shahar Klein
> >>>> <shahark at mellanox.com>; Mark Bloch <markb at mellanox.com>;
> Simon
> >>>> Horman <simon.horman at netronome.com>; Hadar Hen Zion
> >>>> <hadarh at mellanox.com>; Rony Efraim <ronye at mellanox.com>; Jiri
> Pirko
> >>>> <jiri at mellanox.com>; Eric Garver <e at erig.me>; Marcelo Ricardo
> >>>> Leitner <mleitner at redhat.com>; Or Gerlitz <ogerlitz at mellanox.com>;
> >>>> Andy Gospodarek <andy at greyhouse.net>
> >>>> Subject: Re: [ovs-dev] [PATCH ovs V2 00/21] Introducing HW offload
> >>>> support for openvswitch
> >>>>
> >>>> On 10 January 2017 at 10:07, Joe Stringer <joe at ovn.org> wrote:
> >>>>> On 8 January 2017 at 07:18, Paul Blakey <paulb at mellanox.com>
> wrote:
> >>>>>>
> >>>>>> On 05/01/2017 02:11, Joe Stringer wrote:
> >>>>>>
> >>>>>>> On 25 December 2016 at 03:39, Paul Blakey <paulb at mellanox.com>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> This patch series introduces rule offload functionality to
> >>>>>>>> dpif-netlink via netdev ports new flow offloading API. The user
> >>>>>>>> can specify whether to enable rule offloading or not via OVS
> >>>>>>>> configuration. Netdev providers are able to implement netdev
> >>>>>>>> flow
> >>>> offload API in order to offload rules.
> >>>>>>>>
> >>>>>>>> This patch series also implements one offload scheme for
> >>>>>>>> netdev-linux, using TC flower classifier, which was chosen
> >>>>>>>> because its sort of natrual to state OVS DP rules for this classifier.
> >>>>>>>> However, the code can be extended to support other classifiers
> >>>>>>>> such as U32, eBPF, etc which support offload as well.
> >>>>>>>>
> >>>>>>>> The use-case we are currently addressing is the newly sriov
> >>>>>>>> switchdev mode in the linux kernel which was introduced in
> >>>>>>>> version
> >>>>>>>> 4.8 [1][2]. this series was tested against sriov vfs vports
> >>>>>>>> representors of the Mellanox 100G ConnectX-4 series exposed by
> >>>>>>>> the
> >>>>>>>> mlx5 kernel driver.
> >>>>>>>>
> >>>>>>>> changes from V1
> >>>>>>>>      - Added generic netdev flow offloads API.
> >>>>>>>>      - Implemented relevant flow API in netdev-linux (and
> >>>>>>>> netdev-vport).
> >>>>>>>>      - Added a other_config hw-offload option to enable
> >>>>>>>> offloading (defaults to false).
> >>>>>>>>      - Fixed coding style to conform with OVS.
> >>>>>>>>      - Policy removed for now. (Will be discussed how best
> >>>>>>>> implemented later).
> >>>>>>>
> >>>>>>> Hi Paul,
> >>>>>>>
> >>>>>>> Happy holidays to you too ;)
> >>>>>>
> >>>>>> Hi :)
> >>>>>> Thanks for your thorough review, We are working on resubmitting
> >>>>>> the next patch set and I'll try and reply as much as I can right
> >>>>>> now, but this is a big review, so I'll split this to several replies.
> >>>>>>>
> >>>>>>>
> >>>>>>> A few high level comments:
> >>>>>>> * Overall the patchset is looking in better state than previously.
> >>>>>>> * That said, on platforms other than your specific test
> >>>>>>> environment OVS fails to build.
> >>>>>>> * netdev-vport seems to have acquired platform-specific code
> >>>>>>
> >>>>>> Thanks, We've seen your links for travis-ci and app veyor, we'll
> >>>>>> create an account and fix it accordingly.
> >>>>>>>
> >>>>>>> * I don't think that ovs-dpctl should read the database. One of
> >>>>>>> the nice things about ovs-dpctl is that it is currently
> >>>>>>> standalone, ie doesn't require anything else to be running to be
> >>>>>>> able to query the state of the kernel datapath. What are you
> >>>>>>> trying to achieve with this? Perhaps you can extend the
> >>>>>>> ovs-appctl dpctl/... commands instead to provide the functionality
> you need?
> >>>>>>
> >>>>>> We have two config options that isn't saved with the kernel
> >>>>>> datapath (and dumped back on dpif open):
> >>>>>> if offload is enabled to know if we even need to dump from
> >>>>>> netdevs or not for dump-flows cmd (I guess we could just dump the
> >>>>>> netdev anyway) and for add-flow, and tc flower flag
> >>>>>> skip_hw/skip_sw that is also needed for add-flow cmd, so we can
> >>>>>> know what flag we be used while inserting to flower if offloading is
> enabled and possible.
> >>>>>
> >>>>> I discussed this with a few people offline and it seems that the
> >>>>> most reasonable way to handle it for dump is just to dump from
> >>>>> netdevs anyway. You can fetch the list of relevant ports from OVS
> >>>>> datapath in kernel, then only dump those ports.
> >>>>>
> >>>>> For adding and deleting flows, using ovs-dpctl is not really
> >>>>> supported, it's primarily for debug. ovs-vswitchd is going to
> >>>>> override that behaviour anyway - if you add a flow, it may delete
> >>>>> it; if you delete a flow, it may re-add it. I suggest that for
> >>>>> debugging purposes the default behaviour is to install into OVS,
> >>>>> but if you want to put it into TC sw/hw, provide an additional flag on
> the commandline.
> >>>>>
> >>>>> If you want ovs-vswitchd to decide how to install the flow, you
> >>>>> can always use "ovs-appctl dpctl/add-flow ..." and the command
> >>>>> will go via ovs-vswitchd, which will make the offloading decision
> >>>>> based on the same logic as it would when handling regular upcalls.
> >>>>>
> >>>>>>> * It would be good to see at least some basic system tests ala
> >>>>>>> tests/system-traffic.at or tests/system-ovn.at. Maybe there
> >>>>>>> could be tests/system-tc.at that uses SKIP_HW to show that flows
> >>>>>>> can be installed and that they work.
> >>>>>>
> >>>>>> Sure.
> >>>>>>>
> >>>>>>> * I think it would be great to get some ability to see what's
> >>>>>>> happening with flow offloads; eg "ovs-appctl dpctl/show-hw-
> offloads"
> >>>>>>> that could give you some idea of how many flows are being
> >>>>>>> offloaded
> >>>>>>
> >>>>>> You mean a number/stats or actually dump only offloaded flows?
> >>>>>
> >>>>> At the time, I was thinking just stats.
> >>>>>
> >>>>> However I think it would also be useful to have some flag for
> >>>>> ovs-dpctl that chooses which flows to fetch.
> >>>>>
> >>>>> Something like:
> >>>>> ovs-dpctl dump-flows: Dump all flows from both OVS and TC (flower
> >>>>> only).
> >>>>> ovs-dpctl dump-flows --only-flower ovs-dpctl dump-flows --only-ovs
> >>>>>
> >>>>> Each flow when printed should also indicate in some way where it
> >>>>> came
> >>>> from.
> >>>>
> >>>> Could be something like ovs-dpctl dump-flows --type=flower instead,
> >>>> to be a bit more generic.
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> dev at openvswitch.org
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >> +cc Roi that was removed from cc somehow, he can answer some of this
> >> questions for you.


More information about the dev mailing list