[ovs-dev] [RFC V2] netdev-rte-offloads: HW offload virtio-forwarder

Roni Bar Yanai roniba at mellanox.com
Mon May 6 10:43:18 UTC 2019


Background
==========

OVS HW offload solution is consisted of forwarding and control. HW implements
embedded switch that connects SRIOV VF's and forwards packets according to the
dynamically configured HW rules (packets can be altered by HW rules). Packets
that have no forwarding rule, called exception packets, are sent to the control
path (OVS SW). OVS SW will handle the exception packet (just like in SW only
mode),  namely calling up-call if no DP flow exists. OVS SW will use port
representor for representing the VF. see:
    (https://doc.dpdk.org/guides/prog_guide/switch_representation.html).
Packets sent from VF will get to the port representor and packets
sent to the port representor will get to the VF. Once OVS SW generates a
data plane flow, a new HW rules will be configured in the embedded switch.
following packet on the on the same flow will be directed by HW only. Will
arrive directly from VF (also uplink) to VF without getting to SW.

For some HW architecture only the shortly presented SRIOV hw offload
architecture is supported. SRIOV architecture requires that the guest will
install a driver which is specific for the underlying HW. Specific HW driver
interduces two main problems for virtualization:
1. It breaks virtualization in some sense (VM aware of the HW).
2. less natural support for live migration.
Using virtio interface solves both problems (on the expense of some loss in
functionality and performance). However, for some HW offload, working directly
with vitrio cannot be supported.

HW offload for virtio architecture
====================================

We suggest an architecture for HW offload of virtio interface that
adds another component called virtio-forwarder on top of the current
architecture. The forwarder is a software or hardware (for vdpa) component that
connects the VF with a matching virtio interface as shown below:


       | PR1              -----------
     --|--               |           |
    |     |              | forwarder |
    | OVS |              |           |
   |     |              -------------         ---------
     -----                | VF1   | virtio1   |         |
       | uplink           |       |           |  guest  |
  -----------------       |       \ ----------|         |
|                  |----- /                    ---------
|      e-switch    |
|                  |
------------------

The forwarder role is to function as a wire between the VF and the virtio.
Forwarder reads packets from the rx-queue and sends them to the peer
tx-queue (and vice versa). since the function in this case is reduced to
forwarding packets without inspecting them, a single core can push a very high
number of PPS (near DPDK forwarding performance).

There are 3 sub use cases.

OVS-dpdk
--------
This is the basic use case that was just described. In this use case we have
port representor, VF and virtio (forwarding should be done between VF and virtio).

Vdpa
-----
Vdpa enables the HW to directly put the packets in the VM virtio. In this case
the forwarding is done in HW, but it requires that some SW will handle the
control. Configure the queues and adjust configuration according to VHOST
updates.

OVS-kernel
----------
OVS-kernel HW offload has the same limitation. However, in the case of
just forwarding packets, DPDK has a great performance advantage over the kernel.
It would be good to also add this use case, looking on implementation
effort and performance gain.

Why not just use standalone (DPDK test PMD)?
----------------------------------------

1. When HW-offload is running, we expect that most of the traffic will be
   handled by HW, so the PMD thread will be mostly idle. we don't want to burn
   another core for the forwarding.
2. Standalone application is another application with all the additional
   overheads: start, configuration, monitoring...etc. besides being another
   project which means another dependency.
3. Using already existing OVS load balancing and NUMA awareness. Forwarding
   should have the exact same symptoms of unbalanced workload as regular rx-queue
4. We might need to have some prioritization, exception packets are more
   important than forwarding. Being on the same domain will make it possible
   to add such prioritization while reducing CPU requirement to minimum.

OVS virtio-forwarder
====================

The suggestion is to put the wire and control functionality in the hw-offload
module. Looking on the forwarder functionality we have control and data.
The control is the configuration: Virtio/VF matching (and type). queues
configuration (defined when VM initialized, and can change)...etc. The data is
the actual forwarding that needs a context to run it. As explained, forwarding
is reduced to a simple rx-burst and tx-burst where all can be predefined
after the configuration.

We add the forwarding layer to the hw offload module and we configure it
separately. For example:

ovs-appctl hw-offload/set-fw
        vhost-server-path=/tmp/dpdkvhostvm1:rxq=2 dpdk-devargs=0000:08:00.0
         type=pr:[1]

Once configured we attach the context according to user configuration. In the
basic use case, we hook to the port representor scheduling. This way we can use
the OVS scheduler. When port representor rx-queue is called we forward the packets
for it and account the cycles on the port representor (rx-queue), so OVS can
rebalance if needed. This way we use the PMD thread empty cycles.
If no port representor is added, we hook to the scheduler as a generic call.
Every scheduling cycle we will call the HW virtio-forwarder. We limit the quota
to avoid starvation of rx-queues. Although we cannot use the OVS scheduling
features in this case, we still reuse most of the code of the forwarder and we
solve the problem for kernel-OVS with minor additional effort.

>From OVS perspective this is a HW offload functionality, no ports are added
to the OVS. The functionality and statistics can be only accessed through the hw
offload module and there is a minimal code change needed from OVS, mainly for
hooking the calling context.



More information about the dev mailing list