[ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder
Ilya Maximets
i.maximets at samsung.com
Wed May 15 11:24:46 UTC 2019
On 06.05.2019 13:43, Roni Bar Yanai wrote:
> Background
> ==========
> OVS HW offload solution is consisted of forwarding and control. HW implements
> embedded switch that connects SRIOV VF's and forwards packets according to the
> dynamically configured HW rules (packets can be altered by HW rules). Packets
> that have no forwarding rule, called exception packets, are sent to the control
> path (OVS SW). OVS SW will handle the exception packet (just like in SW only
> mode), namely calling up-call if no DP flow exists. OVS SW will use port
> representor for representing the VF. see:
>
> (_https://doc.dpdk.org/guides/prog_guide/switch_representation.html)._
>
> Packets sent from VF will get to the port representor and packets
> sent to the port representor will get to the VF. Once OVS SW generates a
> data plane flow, a new HW rules will be configured in the embedded switch.
> following packet on the on the same flow will be directed by HW only. Will
> arrive directly from VF (also uplink) to VF without getting to SW.
>
> For some HW architecture only the shortly presented SRIOV hw offload
> architecture is supported. SRIOV architecture requires that the guest will
> install a driver which is specific for the underlying HW. Specific HW driver
> interduces two main problems for virtualization:
>
> 1. It breaks virtualization in some sense (VM aware of the HW).
> 2. less natural support for live migration.
>
> Using virtio interface solves both problems (on the expense of some loss in
> functionality and performance). However, for some HW offload, working directly
> with vitrio cannot be supported.
>
> HW offload for virtio architecture
> ====================================
> We suggest an architecture for HW offload of virtio interface that
> adds another component called virtio-forwarder on top of the current
> architecture. The forwarder is a software or hardware (for vdpa) component that
> connects the VF with a matching virtio interface as shown below:
>
> | PR1 -----------
> --|-- | |
> | | | forwarder |
> | OVS | | |
> | | ------------- ---------
> ----- | VF1 | virtio1 | |
> | uplink | | | guest |
> ----------------- | \ ----------| |
> | |----- / ---------
> | e-switch |
> | |
> ------------------
>
> The forwarder role is to function as a wire between the VF and the virtio.
> Forwarder reads packets from the rx-queue and sends them to the peer
> tx-queue (and vice versa). since the function in this case is reduced to
> forwarding packets without inspecting them, a single core can push a very high
> number of PPS (near DPDK forwarding performance).
>
> There are 3 sub use cases.
>
> OVS-dpdk
> --------
> This is the basic use case that was just described. In this use case we have
> port representor, VF and virtio (forwarding should be done between VF and virtio).
>
> Vdpa
> -----
> Vdpa enables the HW to directly put the packets in the VM virtio. In this case
> the forwarding is done in HW, but it requires that some SW will handle the
> control. Configure the queues and adjust configuration according to VHOST
> updates.
>
> OVS-kernel
> ----------
> OVS-kernel HW offload has the same limitation. However, in the case of
> just forwarding packets, DPDK has a great performance advantage over the kernel.
> It would be good to also add this use case, looking on implementation
> effort and performance gain.
>
> Why not just use standalone (DPDK test PMD)?
> ----------------------------------------
> 1. When HW-offload is running, we expect that most of the traffic will be
> handled by HW, so the PMD thread will be mostly idle. we don't want to burn
> another core for the forwarding.
>
> 2. Standalone application is another application with all the additional
> overheads: start, configuration, monitoring...etc. besides being another
> project which means another dependency.
>
> 3. Using already existing OVS load balancing and NUMA awareness. Forwarding
> should have the exact same symptoms of unbalanced workload as regular rx-queue
>
> 4. We might need to have some prioritization, exception packets are more
> important than forwarding. Being on the same domain will make it possible
> to add such prioritization while reducing CPU requirement to minimum.
>
> OVS virtio-forwarder
> ====================
> The suggestion is to put the wire and control functionality in the hw-offload
> module. Looking on the forwarder functionality we have control and data.
> The control is the configuration: Virtio/VF matching (and type). queues
> configuration (defined when VM initialized, and can change)...etc. The data is
> the actual forwarding that needs a context to run it. As explained, forwarding
> is reduced to a simple rx-burst and tx-burst where all can be predefined
> after the configuration.
>
> We add the forwarding layer to the hw offload module and we configure it
> separately. For example:
>
> ovs-appctl hw-offload/set-fw
> vhost-server-path=/tmp/dpdkvhostvm1:rxq=2 dpdk-devargs=0000:08:00.0
> type=pr:[1]
>
> Once configured we attach the context according to user configuration. In the
> basic use case, we hook to the port representor scheduling. This way we can use
> the OVS scheduler. When port representor rx-queue is called we forward the packets
> for it and account the cycles on the port representor (rx-queue), so OVS can
> rebalance if needed. This way we use the PMD thread empty cycles.
> If no port representor is added, we hook to the scheduler as a generic call.
> Every scheduling cycle we will call the HW virtio-forwarder. We limit the quota
> to avoid starvation of rx-queues. Although we cannot use the OVS scheduling
> features in this case, we still reuse most of the code of the forwarder and we
> solve the problem for kernel-OVS with minor additional effort.
>
> From OVS perspective this is a HW offload functionality, no ports are added
> to the OVS. The functionality and statistics can be only accessed through the hw
> offload module and there is a minimal code change needed from OVS, mainly for
> hooking the calling context.
Can we just create a new netdev type like dpdkvdpa ?
Let me explain.
IIUC, in order to make vhost acceleration work we need 3 components:
1. vhost-user socket
2. vdpa device: real vdpa device or a SmartNIC VF.
3. representor of "vdpa device".
So, let's create a new OVS netdev 'dpdkvdpa'. It will have 3 mandatory
arguments:
1. vhost-server-path (ex.: /tmp/dpdkvhostvm1)
2. vhost-accelerator-devargs (ex.: "<vdpa pci id>,vdpa=1", or "<VF pci id>")
3. dpdk-devargs (ex.: "<vdpa pci id>,representor=[id]", or "<PF pci id>,representor=[id]")
And one optional config "accelerator-type=(hw|sw)".
In case of real VDPA device:
----------------------------
ovs-vsctl add-port vdpa0 br0 -- \
set interface vdpa0 type=dpdkvdpa \
vhost-server-path=/tmp/dpdkvhostvm1 \
vhost-accelerator-devargs="<vdpa pci id>,vdpa=1" \
dpdk-devargs="<vdpa pci id>,representor=[id]" \
accelerator-type=hw
On this command OVS will create a new netdev:
1. Register vhost-user-client device with rte_vhost_driver_register().
2. Attach VDPA device to vhost-user with rte_vhost_driver_attach_vdpa_device().
3. Open and configure representor dpdk port.
netdev_rxq_recv() will just receive packets from the representor.
netdev_send() will just send packets to the representor.
HW offloading will install flows to the representor.
In case of VF pretending to be VDPA device:
-------------------------------------------
ovs-vsctl add-port vdpa0 br0 -- \
set interface vdpa0 type=dpdkvdpa \
vhost-server-path=/tmp/dpdkvhostvm1 \
vhost-accelerator-devargs="<VF pci id>" \
dpdk-devargs="<VF pci id>,representor=[id]" \
accelerator-type=sw
On this command OVS will create a new netdev:
1. Register vhost-user-client device with rte_vhost_driver_register().
2. Open and configure VF dpdk port.
3. Open and configure representor dpdk port.
netdev_rxq_recv() will:
* Receive packets from VF and push to vhost-user.
* Receive packets from vhost-user and push to VF.
* Receive packets from representor and return them to the caller.
netdev_send() will just send packets to the representor.
HW offloading will install flows to the representor.
Above approach will allow us to not have any hooks/dirty hacks/separate appctls.
Also there will be no resources dangling in OVS that needs separate management.
All the code will be localized inside netdev-dpdk.c and hopefully mostly reuse
existing code.
What do you think?
Best regards, Ilya Maximets.
More information about the dev
mailing list