[ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

Ilya Maximets i.maximets at samsung.com
Wed May 15 13:36:34 UTC 2019


On 15.05.2019 16:01, Roni Bar Yanai wrote:
> Hi Ilya,
> 
> Thanks for the comment.
> I think the suggested arch is very good and has many advantages, and in fact I had something very similar as my internally first approach.
> However, I had one problem: it doesn't solves the kernel case. It make sense doing forwarding using dpdk also when OVS is kernel (port representor and rule offloads are done with kernel OVS). It makes sense because we can have one solution and because DPDK has better performance.

I'm not sure if it makes practical sense to run separate userpace
datapath just to pass packets between vhost and VF. This actually
matches with some of your own disadvantages of separate DPDK apps.
Separate userspace datapath will need its own complex start,
configuration and maintenance. Also it will consume additional cpu
cores which will not be shared with kernel packet processing.
I think that just move everything to userspace in this case would
be much more simple for user than maintaining such configuration.

> Do you think of a way we can create also support just FW case when there is no port representors?
> 
> Thanks,
> Roni
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets at samsung.com>
>> Sent: Wednesday, May 15, 2019 2:25 PM
>> To: Roni Bar Yanai <roniba at mellanox.com>; ovs-dev at openvswitch.org; Ian
>> Stokes <ian.stokes at intel.com>; Kevin Traynor <ktraynor at redhat.com>
>> Cc: Eyal Lavee <elavee at mellanox.com>; Oz Shlomo <ozsh at mellanox.com>; Eli
>> Britstein <elibr at mellanox.com>; Rony Efraim <ronye at mellanox.com>; Asaf
>> Penso <asafp at mellanox.com>
>> Subject: Re: [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
>>
>> On 06.05.2019 13:43, Roni Bar Yanai wrote:
>>> Background
>>> ==========
>>> OVS HW offload solution is consisted of forwarding and control. HW implements
>>> embedded switch that connects SRIOV VF's and forwards packets according to
>> the
>>> dynamically configured HW rules (packets can be altered by HW rules). Packets
>>> that have no forwarding rule, called exception packets, are sent to the control
>>> path (OVS SW). OVS SW will handle the exception packet (just like in SW only
>>> mode),  namely calling up-call if no DP flow exists. OVS SW will use port
>>> representor for representing the VF. see:
>>>
>>>     (_https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.
>> dpdk.org%2Fguides%2Fprog_guide%2Fswitch_representation.html)._&amp;data
>> =02%7C01%7Croniba%40mellanox.com%7Ce6056cf632f343dd734308d6d927f1e0%
>> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636935162916810125&amp;
>> sdata=OT%2BXn0yz2pllFVRqFZDQCmXYymDuZfRamK72EX4Z2ZU%3D&amp;reserv
>> ed=0
>>>
>>> Packets sent from VF will get to the port representor and packets
>>> sent to the port representor will get to the VF. Once OVS SW generates a
>>> data plane flow, a new HW rules will be configured in the embedded switch.
>>> following packet on the on the same flow will be directed by HW only. Will
>>> arrive directly from VF (also uplink) to VF without getting to SW.
>>>
>>> For some HW architecture only the shortly presented SRIOV hw offload
>>> architecture is supported. SRIOV architecture requires that the guest will
>>> install a driver which is specific for the underlying HW. Specific HW driver
>>> interduces two main problems for virtualization:
>>>
>>> 1. It breaks virtualization in some sense (VM aware of the HW).
>>> 2. less natural support for live migration.
>>>
>>> Using virtio interface solves both problems (on the expense of some loss in
>>> functionality and performance). However, for some HW offload, working directly
>>> with vitrio cannot be supported.
>>>
>>> HW offload for virtio architecture
>>> ====================================
>>> We suggest an architecture for HW offload of virtio interface that
>>> adds another component called virtio-forwarder on top of the current
>>> architecture. The forwarder is a software or hardware (for vdpa) component
>> that
>>> connects the VF with a matching virtio interface as shown below:
>>>
>>>        | PR1              -----------
>>>      --|--               |           |
>>>     |     |              | forwarder |
>>>     | OVS |              |           |
>>>     |     |              -------------         ---------
>>>      -----                | VF1   | virtio1   |         |
>>>        | uplink           |       |           |  guest  |
>>>   -----------------       |       \ ----------|         |
>>> |                  |----- /                    ---------
>>> |      e-switch    |
>>> |                  |
>>>  ------------------
>>>
>>> The forwarder role is to function as a wire between the VF and the virtio.
>>> Forwarder reads packets from the rx-queue and sends them to the peer
>>> tx-queue (and vice versa). since the function in this case is reduced to
>>> forwarding packets without inspecting them, a single core can push a very high
>>> number of PPS (near DPDK forwarding performance).
>>>
>>> There are 3 sub use cases.
>>>
>>> OVS-dpdk
>>> --------
>>> This is the basic use case that was just described. In this use case we have
>>> port representor, VF and virtio (forwarding should be done between VF and
>> virtio).
>>>
>>> Vdpa
>>> -----
>>> Vdpa enables the HW to directly put the packets in the VM virtio. In this case
>>> the forwarding is done in HW, but it requires that some SW will handle the
>>> control. Configure the queues and adjust configuration according to VHOST
>>> updates.
>>>
>>> OVS-kernel
>>> ----------
>>> OVS-kernel HW offload has the same limitation. However, in the case of
>>> just forwarding packets, DPDK has a great performance advantage over the
>> kernel.
>>> It would be good to also add this use case, looking on implementation
>>> effort and performance gain.
>>>
>>> Why not just use standalone (DPDK test PMD)?
>>> ----------------------------------------
>>> 1. When HW-offload is running, we expect that most of the traffic will be
>>>    handled by HW, so the PMD thread will be mostly idle. we don't want to burn
>>>    another core for the forwarding.
>>>
>>> 2. Standalone application is another application with all the additional
>>>    overheads: start, configuration, monitoring...etc. besides being another
>>>    project which means another dependency.
>>>
>>> 3. Using already existing OVS load balancing and NUMA awareness. Forwarding
>>>    should have the exact same symptoms of unbalanced workload as regular rx-
>> queue
>>>
>>> 4. We might need to have some prioritization, exception packets are more
>>>    important than forwarding. Being on the same domain will make it possible
>>>    to add such prioritization while reducing CPU requirement to minimum.
>>>
>>> OVS virtio-forwarder
>>> ====================
>>> The suggestion is to put the wire and control functionality in the hw-offload
>>> module. Looking on the forwarder functionality we have control and data.
>>> The control is the configuration: Virtio/VF matching (and type). queues
>>> configuration (defined when VM initialized, and can change)...etc. The data is
>>> the actual forwarding that needs a context to run it. As explained, forwarding
>>> is reduced to a simple rx-burst and tx-burst where all can be predefined
>>> after the configuration.
>>>
>>> We add the forwarding layer to the hw offload module and we configure it
>>> separately. For example:
>>>
>>> ovs-appctl hw-offload/set-fw
>>>            vhost-server-path=/tmp/dpdkvhostvm1:rxq=2 dpdk-devargs=0000:08:00.0
>>>            type=pr:[1]
>>>
>>> Once configured we attach the context according to user configuration. In the
>>> basic use case, we hook to the port representor scheduling. This way we can use
>>> the OVS scheduler. When port representor rx-queue is called we forward the
>> packets
>>> for it and account the cycles on the port representor (rx-queue), so OVS can
>>> rebalance if needed. This way we use the PMD thread empty cycles.
>>> If no port representor is added, we hook to the scheduler as a generic call.
>>> Every scheduling cycle we will call the HW virtio-forwarder. We limit the quota
>>> to avoid starvation of rx-queues. Although we cannot use the OVS scheduling
>>> features in this case, we still reuse most of the code of the forwarder and we
>>> solve the problem for kernel-OVS with minor additional effort.
>>>
>>> From OVS perspective this is a HW offload functionality, no ports are added
>>> to the OVS. The functionality and statistics can be only accessed through the hw
>>> offload module and there is a minimal code change needed from OVS, mainly for
>>> hooking the calling context.
>>
>> Can we just create a new netdev type like dpdkvdpa ?
>>
>> Let me explain.
>> IIUC, in order to make vhost acceleration work we need 3 components:
>>
>> 1. vhost-user socket
>> 2. vdpa device: real vdpa device or a SmartNIC VF.
>> 3. representor of "vdpa device".
>>
>> So, let's create a new OVS netdev 'dpdkvdpa'. It will have 3 mandatory
>> arguments:
>>
>> 1. vhost-server-path (ex.: /tmp/dpdkvhostvm1)
>> 2. vhost-accelerator-devargs (ex.: "<vdpa pci id>,vdpa=1", or "<VF pci id>")
>> 3. dpdk-devargs (ex.: "<vdpa pci id>,representor=[id]", or "<PF pci
>> id>,representor=[id]")
>>
>> And one optional config "accelerator-type=(hw|sw)".
>>
>> In case of real VDPA device:
>> ----------------------------
>> ovs-vsctl add-port vdpa0 br0 -- \
>>      set interface vdpa0 type=dpdkvdpa \
>>                          vhost-server-path=/tmp/dpdkvhostvm1 \
>>                          vhost-accelerator-devargs="<vdpa pci id>,vdpa=1" \
>>                          dpdk-devargs="<vdpa pci id>,representor=[id]" \
>>                          accelerator-type=hw
>>
>> On this command OVS will create a new netdev:
>> 1. Register vhost-user-client device with rte_vhost_driver_register().
>> 2. Attach VDPA device to vhost-user with
>> rte_vhost_driver_attach_vdpa_device().
>> 3. Open and configure representor dpdk port.
>>
>> netdev_rxq_recv() will just receive packets from the representor.
>> netdev_send() will just send packets to the representor.
>> HW offloading will install flows to the representor.
>>
>>
>> In case of VF pretending to be VDPA device:
>> -------------------------------------------
>> ovs-vsctl add-port vdpa0 br0 -- \
>>      set interface vdpa0 type=dpdkvdpa \
>>                          vhost-server-path=/tmp/dpdkvhostvm1 \
>>                          vhost-accelerator-devargs="<VF pci id>" \
>>                          dpdk-devargs="<VF pci id>,representor=[id]" \
>>                          accelerator-type=sw
>>
>> On this command OVS will create a new netdev:
>> 1. Register vhost-user-client device with rte_vhost_driver_register().
>> 2. Open and configure VF dpdk port.
>> 3. Open and configure representor dpdk port.
>>
>> netdev_rxq_recv() will:
>> * Receive packets from VF and push to vhost-user.
>> * Receive packets from vhost-user and push to VF.
>> * Receive packets from representor and return them to the caller.
>>
>> netdev_send() will just send packets to the representor.
>> HW offloading will install flows to the representor.
>>
>>
>> Above approach will allow us to not have any hooks/dirty hacks/separate appctls.
>> Also there will be no resources dangling in OVS that needs separate management.
>> All the code will be localized inside netdev-dpdk.c and hopefully mostly reuse
>> existing code.
>>
>> What do you think?
>>
>> Best regards, Ilya Maximets.
> 
> 


More information about the dev mailing list