[ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

Fri May 24 20:51:09 UTC 2019

Hi Ilya,
(S

See inline.

>-----Original Message-----
>From: Ilya Maximets <i.maximets at samsung.com>
>Sent: Friday, May 24, 2019 3:21 PM
>To: Simon Horman <simon.horman at netronome.com>; Roni Bar Yanai
><roniba at mellanox.com>
>Cc: ovs-dev at openvswitch.org; Ian Stokes <ian.stokes at intel.com>; Kevin Traynor
><ktraynor at redhat.com>; Oz Shlomo <ozsh at mellanox.com>; Eli Britstein
><elibr at mellanox.com>; Eyal Lavee <elavee at mellanox.com>; Rony Efraim
><ronye at mellanox.com>; Ben Pfaff <blp at ovn.org>
>Subject: Re: [ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>
>On 22.05.2019 15:10, Simon Horman wrote:
>> Hi,
>>
>> On Thu, May 16, 2019 at 08:44:31AM +0000, Roni Bar Yanai wrote:
>>>> -----Original Message-----
>>>> From: Ilya Maximets <i.maximets at samsung.com>
>>>> Sent: Wednesday, May 15, 2019 4:37 PM
>>>> To: Roni Bar Yanai <roniba at mellanox.com>; ovs-dev at openvswitch.org; Ian
>>>> Stokes <ian.stokes at intel.com>; Kevin Traynor <ktraynor at redhat.com>
>>>> Cc: Eyal Lavee <elavee at mellanox.com>; Oz Shlomo <ozsh at mellanox.com>;
>Eli
>>>> Britstein <elibr at mellanox.com>; Rony Efraim <ronye at mellanox.com>; Asaf
>>>> Penso <asafp at mellanox.com>
>>>> Subject: Re: [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>>>>
>>>> On 15.05.2019 16:01, Roni Bar Yanai wrote:
>>>>> Hi Ilya,
>>>>>
>>>>> Thanks for the comment.
>>>>>
>>>>> I think the suggested arch is very good and has many advantages, and
>>>>> in fact I had something very similar as my internally first approach.
>>>>>
>>>>> However, I had one problem: it doesn't solves the kernel case. It make
>>>>> sense doing forwarding using dpdk also when OVS is kernel (port
>>>>> representor and rule offloads are done with kernel OVS). It makes
>>>>> sense because we can have one solution and because DPDK has better
>>>>> performance.
>>>>
>>>> I'm not sure if it makes practical sense to run separate userpace
>>>> datapath just to pass packets between vhost and VF. This actually
>>>> matches with some of your own disadvantages of separate DPDK apps.
>>>> Separate userspace datapath will need its own complex start,
>>>> configuration and maintenance. Also it will consume additional cpu cores
>>>> which will not be shared with kernel packet processing.  I think that
>>>> just move everything to userspace in this case would be much more simple
>>>> for user than maintaining such configuration.
>>>
>>> Maybe It doesn't make sense for OVS-DPDK but for OVS users it does.  When
>>> you run offload with OVS-kernel, and for some vendors this is the current
>>> status, and virtio is a requirement, you now have millions of packets
>>> that should be forwarded. Basically you have two options:
>>>
>>> 1. use external application (we discussed that).
>>>
>>> 2. create user space data plane and configure forwarding (OVS), but then
>>> you have performance issues as OVS is not optimized for this. And for
>>> kernel data plane much worse off course.
>>>
>>> Regarding burning a core. In case of HW offload you will do it either
>>> way, and there is no benefit for adding FW functionality for kernel data
>>> path, mainly because of kernel performance limitations.
>>>
>>> I agree that in such case moving to user space is a solution for some,
>>> but keep in mind that some doesn't have such support for DPDK and for
>>> others they have their own OVS based data path with their adjustments, so
>>> it will be a hard transition.
>>>
>>> While arch is good for the two DPDK use cases, it leaves the kernel one
>>> out.  Any thoughts how we can add this use case as well and still keep
>>> the suggested arch?
>>
>> ...
>>
>> At Netronome we have an Open Source standalone application,
>> called virtio-forwarder
>(https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.co
>m%2FNetronome%2Fvirtio-
>forwarder&amp;data=02%7C01%7Croniba%40mellanox.com%7C72d768141c794f4
>59eee08d6e0424d0a%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C6369
>42972699205388&amp;sdata=Nk1a9tRUD4%2BjDFOh%2BUwrdnQ6sJ2cnhXfvJsXJs
>F0X3E%3D&amp;reserved=0).
>> The reason that we provide this solution is that we see this as a
>> requirement for some customers. This includes customers using OVS
>> with the kernel based HW offload (OVS-TC).
>>
>> In general I agree that integration with OVS has some advantages and
>> I'm happy to see this issue being discussed. But as we see demand
>> for use of virtio-forwarder in conjunction with OVS-TC I see that
>> as a requirement for a solution that is integrated with OVS, which leads
>> me to lean towards the proposal put forward by Roni.
>>
>> I also feel that the proposal put forward by Roni is likely to prove more
>> flexible that a port-based approach, as proposed by Ilya. For one thing
>> such a design ought to allow for arbitrary combinations of port types.
>> In fact, it would be entirely feasible to run this in conjunction with a
>> non-OVS offload aware NIC (SR-IOV in VEB mode).
>>
>> Returning to the stand-alone Netronome implementation, I would welcome
>> discussion of how any useful portions of this could be reused.
>>
>
>Hi Simon. Thanks for link. It's very interesting.
>
>My key point about the proposal put forward by Roni is that Open vSwitch
>is an *OF switch* first of all and *not the multitool*. This proposal adds

Let's not forget performance is a major factor for a switch and currently there 
is a gap between the market demand + hw capabilities and performance. 
The idea is not to change OVS into multi-tool, The idea is to improve OVS performance
and making it a complete solution:  a standalone OF switch with great performance.

Currently we are having some technical gap with forwarding into virtio that we must 
close with SW.
 I've suggested splitting the code into a separate module so it can be 
configured independently, and will require minimum change in OVS non offload
code path. In addition to what Simon mentioned, separate module will also allow other
 performance improvement such as TSO, (now it is always disabled by SW). it can be 
enabled on the forwarder, OVS will see the traffic after HW segments it. 

>some parasite work to the main OVS workflow which doesn't connected

I can agree with you, it is not ideal, but It is a matter of how you look at it.
You still want performance, and still the OVS role is to forward the packets
not just take decisions.
Maybe we can think how we can minimize it even more.

>with its main purpose. If you really want this implemented, this should
>probably be done inside DPDK. You may implement a virtual device in DPDK
>(like bonding) that will forward traffic between subports while calling
>receive function. Adding this vdev to OVS as a usual DPDK port you will
>be able to achieve your goal. DPDK as a development kit (an actual multitool)
>is much more appropriate place for such solutions.

I don't agree with this point.  what about other forwarding use case. What would 
you do in vdpa? There is still control to do. (I guess you can create another type of
 port), what about kernel? You will still need a standalone. 
I think the use case for this special port , is when you want offload to virtio (not just
Dpdk), and this exists mainly in virtualization, when you have a vswitch. I don’t see 
DPDK application including HW offload using it unless it is a switch. The ability to use 
it highly depends on the switch architecture.

>
>BTW, the root cause of this approach is the slow packet forwarding in OVS
>in compare with direct rx + tx without parsing.

This is only part of it. Those are not switch ports. Those are HW offload ports.
you don't want forwarding rules on them. In fact you don't want that the user
will see them as part of the switch dpif/show for example.

>OVS performance improvement is probably the right direction where we
>can move to achieve reasonably effective packet forwarding. I prepared
>a patch that should allow much faster packet forwarding for direct
>output flows like "in_port=1,actions=output:2". Take a look here:
>
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork
>.ozlabs.org%2Fpatch%2F1104878%2F&amp;data=02%7C01%7Croniba%40mellanox
>.com%7C72d768141c794f459eee08d6e0424d0a%7Ca652971c7d2e4d9ba6a4d14925
>6f461b%7C0%7C0%7C636942972699205388&amp;sdata=MH0qI9meo3mBk0tOpCb
>9u4ri2Hmo4ZfJm2I011XyCo4%3D&amp;reserved=0
>It still will be slower than "no parsing at all", but could be suitable
>in practice for some use cases.
>

Thanks. Impressive, we should test the performance. 
BR,
Roni

>Best regards, Ilya Maximets.