[ovs-dev] [PATCH ovs v1 2/2] netdev-dpdk: Add dpdkvdpa port

Noa Ezra noae at mellanox.com
Thu Apr 2 11:13:47 UTC 2020


dpdkvdpa netdev works with 3 components:
vhost-user socket, vdpa device: real vdpa device or a VF and
representor of "vdpa device".

In order to add a new vDPA port, add a new port to existing bridge
with type dpdkvdpa and vDPA options:
ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
   options:vdpa-socket-path=<sock path>
   options:vdpa-accelerator-devargs=<VF pci id>
   options:dpdk-devargs=<vdpa pci id>,representor=[id]

On this command OVS will create a new netdev:
1. Register vhost-user-client device.
2. Open and configure VF dpdk port.
3. Open and configure representor dpdk port.

The new netdev will use netdev_rxq_recv() function in order to receive
packets from VF and push to vhost-user and receive packets from
vhost-user and push to VF.

Signed-off-by: Noa Ezra <noae at mellanox.com>
Reviewed-by: Oz Shlomo <ozsh at mellanox.com>
---
 Documentation/automake.mk           |   1 +
 Documentation/topics/dpdk/index.rst |   1 +
 Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
 NEWS                                |   1 +
 lib/netdev-dpdk.c                   | 164 +++++++++++++++++++++++++++++++++++-
 vswitchd/vswitch.xml                |  25 ++++++
 6 files changed, 281 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/topics/dpdk/vdpa.rst

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index f85c432..7caf6e7 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -41,6 +41,7 @@ DOC_SOURCE = \
 	Documentation/topics/dpdk/qos.rst \
 	Documentation/topics/dpdk/vdev.rst \
 	Documentation/topics/dpdk/vhost-user.rst \
+	Documentation/topics/dpdk/vdpa.rst \
 	Documentation/topics/fuzzing/index.rst \
 	Documentation/topics/fuzzing/what-is-fuzzing.rst \
 	Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \
diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst
index a5be5e3..e8595c3 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -39,3 +39,4 @@ DPDK Support
    /topics/dpdk/qos
    /topics/dpdk/jumbo-frames
    /topics/dpdk/memory
+   /topics/dpdk/vdpa
diff --git a/Documentation/topics/dpdk/vdpa.rst b/Documentation/topics/dpdk/vdpa.rst
new file mode 100644
index 0000000..34c5300
--- /dev/null
+++ b/Documentation/topics/dpdk/vdpa.rst
@@ -0,0 +1,90 @@
+..
+      Copyright (c) 2019 Mellanox Technologies, Ltd.
+
+      Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License.
+      You may obtain a copy of the License at:
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+
+===============
+DPDK VDPA Ports
+===============
+
+In user space there are two main approaches to communicate with a guest (VM),
+using virtIO ports (e.g. netdev type=dpdkvhoshuser/dpdkvhostuserclient) or
+SR-IOV using phy ports (e.g. netdev type = dpdk).
+Phy ports allow working with port representor which is attached to the OVS and
+a matching VF is given with pass-through to the guest.
+HW rules can process packets from up-link and direct them to the VF without
+going through SW (OVS) and therefore using phy ports gives the best
+performance.
+However, SR-IOV architecture requires that the guest will use a driver which is
+specific to the underlying HW. Specific HW driver has two main drawbacks:
+1. Breaks virtualization in some sense (guest aware of the HW), can also limit
+the type of images supported.
+2. Less natural support for live migration.
+
+Using virtIO port solves both problems, but reduces performance and causes
+losing of some functionality, for example, for some HW offload, working
+directly with virtIO cannot be supported.
+
+We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this conflict.
+The new netdev is basically very similar to regular dpdk netdev but it has some
+additional functionally.
+This port translates between phy port to virtIO port, it takes packets from
+rx-queue and send them to the suitable tx-queue and allows to transfer packets
+from virtIO guest (VM) to a VF and vice versa and benefit both SR-IOV and
+virtIO.
+
+Quick Example
+-------------
+
+Configure OVS bridge and ports
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+you must first create a bridge and add ports to the switch.
+Since the dpdkvdpa port is configured as a client, the vdpa-socket-path must be
+configured by the user.
+VHOST_USER_SOCKET_PATH=/path/to/socket
+
+    $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
+    $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \
+    type=dpdk options:dpdk-devargs=<pf pci id>
+    $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
+    options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \
+    options:vdpa-accelerator-devargs=<vf pci id> \
+    options:dpdk-devargs=<pf pci id>,representor=[id]
+
+Once the ports have been added to the switch, they must be added to the guest.
+
+Adding vhost-user ports to the guest (QEMU)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Attach the vhost-user device sockets to the guest. To do this, you must pass
+the following parameters to QEMU:
+
+    -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
+    -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
+    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
+
+QEMU will wait until the port is created successfully in OVS to boot the VM.
+In this mode, in case the switch will crash, the vHost ports will reconnect
+automatically once it is brought back.
diff --git a/NEWS b/NEWS
index 70bd175..79ed080 100644
--- a/NEWS
+++ b/NEWS
@@ -45,6 +45,7 @@ v2.13.0 - 14 Feb 2020
      * Add hardware offload support for output, drop, set of MAC, IPv4 and
        TCP/UDP ports actions (experimental).
      * Add experimental support for TSO.
+     * 'dpdkvdpa' port type.
    - RSTP:
      * The rstp_statistics column in Port table will only be updated every
        stats-update-interval configured in Open_vSwitch table.
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 44ebf96..ce7ed7e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -54,6 +54,7 @@
 #include "fatal-signal.h"
 #include "if-notifier.h"
 #include "netdev-provider.h"
+#include "netdev-dpdk-vdpa.h"
 #include "netdev-vport.h"
 #include "odp-util.h"
 #include "openvswitch/dynamic-string.h"
@@ -532,6 +533,8 @@ struct netdev_dpdk {
         int rte_xstats_ids_size;
         uint64_t *rte_xstats_ids;
     );
+
+    struct netdev_dpdk_vdpa_relay *relay;
 };
 
 struct netdev_rxq_dpdk {
@@ -541,6 +544,7 @@ struct netdev_rxq_dpdk {
 
 static void netdev_dpdk_destruct(struct netdev *netdev);
 static void netdev_dpdk_vhost_destruct(struct netdev *netdev);
+static void netdev_dpdk_vdpa_destruct(struct netdev *netdev);
 
 static int netdev_dpdk_get_sw_custom_stats(const struct netdev *,
                                            struct netdev_custom_stats *);
@@ -555,7 +559,8 @@ static bool
 is_dpdk_class(const struct netdev_class *class)
 {
     return class->destruct == netdev_dpdk_destruct
-           || class->destruct == netdev_dpdk_vhost_destruct;
+           || class->destruct == netdev_dpdk_vhost_destruct
+           || class->destruct == netdev_dpdk_vdpa_destruct;
 }
 
 /* DPDK NIC drivers allocate RX buffers at a particular granularity, typically
@@ -1432,6 +1437,30 @@ netdev_dpdk_construct(struct netdev *netdev)
     return err;
 }
 
+static int
+netdev_dpdk_vdpa_construct(struct netdev *netdev)
+{
+    struct netdev_dpdk *dev;
+    int err;
+
+    err = netdev_dpdk_construct(netdev);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev->name);
+        goto out;
+    }
+
+    ovs_mutex_lock(&dpdk_mutex);
+    dev = netdev_dpdk_cast(netdev);
+    dev->relay = netdev_dpdk_vdpa_alloc_relay();
+    if (!dev->relay) {
+        err = ENOMEM;
+    }
+
+    ovs_mutex_unlock(&dpdk_mutex);
+out:
+    return err;
+}
+
 static void
 common_destruct(struct netdev_dpdk *dev)
     OVS_REQUIRES(dpdk_mutex)
@@ -1515,6 +1544,19 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED,
 }
 
 static void
+netdev_dpdk_vdpa_destruct(struct netdev *netdev)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+
+    ovs_mutex_lock(&dpdk_mutex);
+    netdev_dpdk_vdpa_destruct_impl(dev->relay);
+    rte_free(dev->relay);
+    ovs_mutex_unlock(&dpdk_mutex);
+
+    netdev_dpdk_destruct(netdev);
+}
+
+static void
 netdev_dpdk_vhost_destruct(struct netdev *netdev)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
@@ -2018,6 +2060,50 @@ out:
 }
 
 static int
+
+netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap *args,
+                            char **errp)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    const char *vdpa_accelerator_devargs =
+                smap_get(args, "vdpa-accelerator-devargs");
+    const char *vdpa_socket_path =
+                smap_get(args, "vdpa-socket-path");
+    int vdpa_max_queues = smap_get_int(args, "vdpa-max-queues", -1);
+    int err = 0;
+
+    if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path == NULL)) {
+        VLOG_ERR("netdev_dpdk_vdpa_set_config failed."
+                 "Required arguments are missing for VDPA port %s",
+                 netdev->name);
+        goto free_relay;
+    }
+
+    err = netdev_dpdk_set_config(netdev, args, errp);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev->name);
+        goto free_relay;
+    }
+
+    err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id,
+                                       vdpa_socket_path,
+                                       vdpa_accelerator_devargs,
+                                       vdpa_max_queues);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s",
+                 netdev->name);
+        goto free_relay;
+    }
+
+    goto out;
+
+free_relay:
+    rte_free(dev->relay);
+out:
+    return err;
+}
+
+static int
 netdev_dpdk_vhost_client_set_config(struct netdev *netdev,
                                     const struct smap *args,
                                     char **errp OVS_UNUSED)
@@ -2479,6 +2565,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch,
     return 0;
 }
 
+static int
+netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
+                          struct dp_packet_batch *batch,
+                          int *qfill)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
+    int fwd_rx;
+    int ret;
+
+    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id);
+    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
+    if ((ret == EAGAIN) && fwd_rx) {
+        return 0;
+    }
+    return ret;
+}
+
 static inline int
 netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
                     int cnt, bool should_steal)
@@ -3244,6 +3347,26 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev,
 }
 
 static int
+netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev,
+                                  struct netdev_custom_stats *custom_stats)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err = 0;
+
+    ovs_mutex_lock(&dev->mutex);
+
+    err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay,
+                                                 custom_stats);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed."
+                 "Port %s\n", netdev->name);
+    }
+
+    ovs_mutex_unlock(&dev->mutex);
+    return err;
+}
+
+static int
 netdev_dpdk_get_features(const struct netdev *netdev,
                          enum netdev_features *current,
                          enum netdev_features *advertised,
@@ -5022,6 +5145,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev)
 }
 
 static int
+netdev_dpdk_vdpa_reconfigure(struct netdev *netdev)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err;
+
+    err = netdev_dpdk_reconfigure(netdev);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev->name);
+        goto out;
+    }
+
+    ovs_mutex_lock(&dev->mutex);
+    err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp->mp,
+                                        dev->up.n_rxq);
+    if (err) {
+        VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s",
+                 netdev->name);
+    }
+
+    ovs_mutex_unlock(&dev->mutex);
+out:
+    return err;
+}
+
+static int
 netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
@@ -5310,10 +5458,24 @@ static const struct netdev_class dpdk_vhost_client_class = {
     .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,
 };
 
+static const struct netdev_class dpdk_vdpa_class = {
+    .type = "dpdkvdpa",
+    NETDEV_DPDK_CLASS_COMMON,
+    .construct = netdev_dpdk_vdpa_construct,
+    .destruct = netdev_dpdk_vdpa_destruct,
+    .rxq_recv = netdev_dpdk_vdpa_rxq_recv,
+    .set_config = netdev_dpdk_vdpa_set_config,
+    .reconfigure = netdev_dpdk_vdpa_reconfigure,
+    .get_stats = netdev_dpdk_get_stats,
+    .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats,
+    .send = netdev_dpdk_eth_send
+};
+
 void
 netdev_dpdk_register(void)
 {
     netdev_register_provider(&dpdk_class);
     netdev_register_provider(&dpdk_vhost_class);
     netdev_register_provider(&dpdk_vhost_client_class);
+    netdev_register_provider(&dpdk_vdpa_class);
 }
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index f9339af..e7715f5 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2671,6 +2671,13 @@
             </p>
           </dd>
 
+          <dt><code>dpdkvdpa</code></dt>
+          <dd>
+            The dpdk vDPA port allows forwarding bi-directional traffic between
+            SR-IOV virtual functions (VFs) and VirtIO devices in virtual
+            machines (VMs).
+          </dd>
+      
         </dl>
       </column>
     </group>
@@ -3219,6 +3226,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         </p>
       </column>
 
+      <column name="options" key="vdpa-socket-path"
+              type='{"type": "string"}'>
+        <p>
+          The value specifies the path to the socket associated with a VDPA
+          port that will be created by QEMU.
+          Only supported by dpdkvdpa interfaces.
+        </p>
+      </column>
+
+      <column name="options" key="vdpa-accelerator-devargs"
+              type='{"type": "string"}'>
+        <p>
+          The value specifies the PCI address associated with the virtual
+          function.
+          Only supported by dpdkvdpa interfaces.
+        </p>
+      </column>
+
       <column name="options" key="dq-zero-copy"
               type='{"type": "boolean"}'>
         <p>
-- 
1.8.3.1



More information about the dev mailing list