[ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port

Noa Levy noae at mellanox.com
Sun Oct 27 09:24:17 UTC 2019


> -----Original Message-----
> From: William Tu [mailto:u9012063 at gmail.com]
> Sent: Thursday, October 24, 2019 2:00 AM
> To: Noa Levy <noae at mellanox.com>
> Cc: ovs-dev at openvswitch.org; Oz Shlomo <ozsh at mellanox.com>; Majd
> Dibbiny <majd at mellanox.com>; Ameer Mahagneh
> <ameerm at mellanox.com>; Eli Britstein <elibr at mellanox.com>
> Subject: Re: [ovs-dev] [PATCH ovs v3 2/2] netdev-dpdk: Add dpdkvdpa port
> 
> Hi Noa,
> 
> I have a couple more questions. I'm still at the learning stage of this new
> feature, thanks in advance for your patience.
> 
> On Thu, Oct 17, 2019 at 02:16:56PM +0300, Noa Ezra wrote:
> > dpdkvdpa netdev works with 3 components:
> > vhost-user socket, vdpa device: real vdpa device or a VF and
> > representor of "vdpa device".
> 
> What NIC card support this feature?
> I don't have real vdpa device, can I use Intel X540 VF feature?
> 

This feature will have two modes, SW and HW.
The SW mode doesn't depend on a real vdpa device and allows you to use this feature even if you don't have a NIC that support it.
The HW mode will be implemented in the future and will use a real vdpa device. It will be better to use the HW mode if you have a NIC that support it. 

For now, we only support the SW mode, when vdpa will have support in dpdk, we will add the HW mode to OVS.

> >
> > In order to add a new vDPA port, add a new port to existing bridge
> > with type dpdkvdpa and vDPA options:
> > ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa
> >    options:vdpa-socket-path=<sock path>
> >    options:vdpa-accelerator-devargs=<VF pci id>
> >    options:dpdk-devargs=<vdpa pci id>,representor=[id]
> >
> > On this command OVS will create a new netdev:
> > 1. Register vhost-user-client device.
> > 2. Open and configure VF dpdk port.
> > 3. Open and configure representor dpdk port.
> >
> > The new netdev will use netdev_rxq_recv() function in order to receive
> > packets from VF and push to vhost-user and receive packets from
> > vhost-user and push to VF.
> >
> > Signed-off-by: Noa Ezra <noae at mellanox.com>
> > Reviewed-by: Oz Shlomo <ozsh at mellanox.com>
> > ---
> >  Documentation/automake.mk           |   1 +
> >  Documentation/topics/dpdk/index.rst |   1 +
> >  Documentation/topics/dpdk/vdpa.rst  |  90 ++++++++++++++++++++
> >  NEWS                                |   1 +
> >  lib/netdev-dpdk.c                   | 162
> ++++++++++++++++++++++++++++++++++++
> >  vswitchd/vswitch.xml                |  25 ++++++
> >  6 files changed, 280 insertions(+)
> >  create mode 100644 Documentation/topics/dpdk/vdpa.rst
> >
> > diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> > index cd68f3b..ee574bc 100644
> > --- a/Documentation/automake.mk
> > +++ b/Documentation/automake.mk
> > @@ -43,6 +43,7 @@ DOC_SOURCE = \
> >  	Documentation/topics/dpdk/ring.rst \
> >  	Documentation/topics/dpdk/vdev.rst \
> >  	Documentation/topics/dpdk/vhost-user.rst \
> > +	Documentation/topics/dpdk/vdpa.rst \
> >  	Documentation/topics/fuzzing/index.rst \
> >  	Documentation/topics/fuzzing/what-is-fuzzing.rst \
> >  	Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \ diff
> > --git a/Documentation/topics/dpdk/index.rst
> > b/Documentation/topics/dpdk/index.rst
> > index cf24a7b..c1d4ea7 100644
> > --- a/Documentation/topics/dpdk/index.rst
> > +++ b/Documentation/topics/dpdk/index.rst
> > @@ -41,3 +41,4 @@ The DPDK Datapath
> >     /topics/dpdk/pdump
> >     /topics/dpdk/jumbo-frames
> >     /topics/dpdk/memory
> > +   /topics/dpdk/vdpa
> > diff --git a/Documentation/topics/dpdk/vdpa.rst
> > b/Documentation/topics/dpdk/vdpa.rst
> > new file mode 100644
> > index 0000000..34c5300
> > --- /dev/null
> > +++ b/Documentation/topics/dpdk/vdpa.rst
> > @@ -0,0 +1,90 @@
> > +..
> > +      Copyright (c) 2019 Mellanox Technologies, Ltd.
> > +
> > +      Licensed under the Apache License, Version 2.0 (the "License");
> > +      you may not use this file except in compliance with the License.
> > +      You may obtain a copy of the License at:
> > +
> > +
> > +
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > + .apache.org%2Flicenses%2FLICENSE-
> 2.0&amp;data=02%7C01%7Cnoae%40mella
> > +
> nox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2e4d9ba6a4
> d14
> > +
> 9256f461b%7C0%7C0%7C637074684147132980&amp;sdata=94myUB4Fchqm4
> 4lxlto
> > + OIcbCXhlu%2FA%2FoVID8Z9EyvXU%3D&amp;reserved=0
> > +
> > +      Unless required by applicable law or agreed to in writing, software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> > +      License for the specific language governing permissions and limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +
> > +===============
> > +DPDK VDPA Ports
> > +===============
> > +
> > +In user space there are two main approaches to communicate with a
> > +guest (VM), using virtIO ports (e.g. netdev
> > +type=dpdkvhoshuser/dpdkvhostuserclient) or SR-IOV using phy ports
> (e.g. netdev type = dpdk).
> > +Phy ports allow working with port representor which is attached to
> > +the OVS and a matching VF is given with pass-through to the guest.
> > +HW rules can process packets from up-link and direct them to the VF
> > +without going through SW (OVS) and therefore using phy ports gives
> > +the best performance.
> > +However, SR-IOV architecture requires that the guest will use a
> > +driver which is specific to the underlying HW. Specific HW driver has two
> main drawbacks:
> > +1. Breaks virtualization in some sense (guest aware of the HW), can
> > +also limit the type of images supported.
> > +2. Less natural support for live migration.
> > +
> > +Using virtIO port solves both problems, but reduces performance and
> > +causes losing of some functionality, for example, for some HW
> > +offload, working directly with virtIO cannot be supported.
> > +
> > +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this
> conflict.
> > +The new netdev is basically very similar to regular dpdk netdev but
> > +it has some additional functionally.
> > +This port translates between phy port to virtIO port, it takes
> > +packets from rx-queue and send them to the suitable tx-queue and
> > +allows to transfer packets from virtIO guest (VM) to a VF and vice
> > +versa and benefit both SR-IOV and virtIO.
> > +
> > +Quick Example
> > +-------------
> > +
> > +Configure OVS bridge and ports
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +you must first create a bridge and add ports to the switch.
> > +Since the dpdkvdpa port is configured as a client, the
> > +vdpa-socket-path must be configured by the user.
> > +VHOST_USER_SOCKET_PATH=/path/to/socket
> > +
> > +    $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
> > +    $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \
> > +    type=dpdk options:dpdk-devargs=<pf pci id>
> 
> Is adding pf port to br0 necessary?
> 
> > +    $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
> > +    options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \
> > +    options:vdpa-accelerator-devargs=<vf pci id> \
> > +    options:dpdk-devargs=<pf pci id>,representor=[id]
> > +
> > +Once the ports have been added to the switch, they must be added to
> the guest.
> > +
> > +Adding vhost-user ports to the guest (QEMU)
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Attach the vhost-user device sockets to the guest. To do this, you
> > +must pass the following parameters to QEMU:
> > +
> > +    -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server
> > +    -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> > +    -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> > +
> > +QEMU will wait until the port is created successfully in OVS to boot the
> VM.
> > +In this mode, in case the switch will crash, the vHost ports will
> > +reconnect automatically once it is brought back.
> > diff --git a/NEWS b/NEWS
> > index f5a0b8f..6f315c6 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -542,6 +542,7 @@ v2.6.0 - 27 Sep 2016
> >       * Remove dpdkvhostcuse port type.
> >       * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
> >       * 'dpdkvhostuserclient' port type.
> > +     * 'dpdkvdpa' port type.
> >     - Increase number of registers to 16.
> >     - ovs-benchmark: This utility has been removed due to lack of use and
> >       bitrot.
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > bc20d68..16ddf58 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -47,6 +47,7 @@
> >  #include "dpif-netdev.h"
> >  #include "fatal-signal.h"
> >  #include "netdev-provider.h"
> > +#include "netdev-dpdk-vdpa.h"
> >  #include "netdev-vport.h"
> >  #include "odp-util.h"
> >  #include "openvswitch/dynamic-string.h"
> > @@ -137,6 +138,9 @@ typedef uint16_t dpdk_port_t;
> >  /* Legacy default value for vhost tx retries. */  #define
> > VHOST_ENQ_RETRY_DEF 8
> >
> > +/* Size of VDPA custom stats. */
> > +#define VDPA_CUSTOM_STATS_SIZE          4
> > +
> >  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
> >
> >  static const struct rte_eth_conf port_conf = { @@ -461,6 +465,8 @@
> > struct netdev_dpdk {
> >          int rte_xstats_ids_size;
> >          uint64_t *rte_xstats_ids;
> >      );
> > +
> > +    struct netdev_dpdk_vdpa_relay *relay;
> >  };
> >
> >  struct netdev_rxq_dpdk {
> > @@ -1346,6 +1352,30 @@ netdev_dpdk_construct(struct netdev *netdev)
> >      return err;
> >  }
> >
> > +static int
> > +netdev_dpdk_vdpa_construct(struct netdev *netdev) {
> > +    struct netdev_dpdk *dev;
> > +    int err;
> > +
> > +    err = netdev_dpdk_construct(netdev);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev-
> >name);
> > +        goto out;
> > +    }
> > +
> > +    ovs_mutex_lock(&dpdk_mutex);
> > +    dev = netdev_dpdk_cast(netdev);
> > +    dev->relay = netdev_dpdk_vdpa_alloc_relay();
> > +    if (!dev->relay) {
> > +        err = ENOMEM;
> > +    }
> > +
> > +    ovs_mutex_unlock(&dpdk_mutex);
> > +out:
> > +    return err;
> > +}
> > +
> >  static void
> >  common_destruct(struct netdev_dpdk *dev)
> >      OVS_REQUIRES(dpdk_mutex)
> > @@ -1428,6 +1458,19 @@ dpdk_vhost_driver_unregister(struct
> netdev_dpdk
> > *dev OVS_UNUSED,  }
> >
> >  static void
> > +netdev_dpdk_vdpa_destruct(struct netdev *netdev) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +
> > +    ovs_mutex_lock(&dpdk_mutex);
> > +    netdev_dpdk_vdpa_destruct_impl(dev->relay);
> > +    rte_free(dev->relay);
> > +    ovs_mutex_unlock(&dpdk_mutex);
> > +
> > +    netdev_dpdk_destruct(netdev);
> > +}
> > +
> > +static void
> >  netdev_dpdk_vhost_destruct(struct netdev *netdev)  {
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -1878,6
> > +1921,47 @@ out:
> >  }
> >
> >  static int
> > +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap
> *args,
> > +                            char **errp) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +    const char *vdpa_accelerator_devargs =
> > +                smap_get(args, "vdpa-accelerator-devargs");
> > +    const char *vdpa_socket_path =
> > +                smap_get(args, "vdpa-socket-path");
> > +    int err = 0;
> > +
> > +    if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path ==
> NULL)) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_set_config failed."
> > +                 "Required arguments are missing for VDPA port %s",
> > +                 netdev->name);
> > +        goto free_relay;
> > +    }
> > +
> > +    err = netdev_dpdk_set_config(netdev, args, errp);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev-
> >name);
> > +        goto free_relay;
> > +    }
> > +
> > +    err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id,
> > +                                       vdpa_socket_path,
> > +                                       vdpa_accelerator_devargs);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s",
> > +                 netdev->name);
> > +        goto free_relay;
> > +    }
> > +
> > +    goto out;
> > +
> > +free_relay:
> > +    rte_free(dev->relay);
> > +out:
> > +    return err;
> > +}
> > +
> > +static int
> >  netdev_dpdk_ring_set_config(struct netdev *netdev, const struct smap
> *args,
> >                              char **errp OVS_UNUSED)  { @@ -2273,6
> > +2357,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> dp_packet_batch *batch,
> >      return 0;
> >  }
> >
> > +static int
> > +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq,
> > +                          struct dp_packet_batch *batch,
> > +                          int *qfill) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> > +    int fwd_rx;
> > +    int ret;
> > +
> > +    fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay,
> > + rxq->queue_id);
> I'm still not clear about the above function.
> So netdev_dpdk_vdpa_recv_impl()
>     netdev_dpdk_vdpa_forward_traffic(), with a queue pair as parameter
>         ...
>         rte_eth_rx_burst(qpair->port_id_rx...)
>         ...
>         rte_eth_tx_burst(qpair->port_id_tx...)
> 
> So looks like forwarding between vf to vhostuser and vice versa is done in
> this function.
> 
> > +    ret = netdev_dpdk_rxq_recv(rxq, batch, qfill);
> 
> Then why do we call netdev_dpdk_rxq_recv() above again?
> Are packets received above the same packets as rte_eth_rx_burst()
> previously called in netdev_dpdk_vdpa_forward_traffic()?
> 
> 
> Thanks
> William
> 
> > +    if ((ret == EAGAIN) && fwd_rx) {
> > +        return 0;
> > +    }
> > +    return ret;
> > +}
> > +
> >  static inline int
> >  netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
> >                      int cnt, bool should_steal) @@ -2854,6 +2955,29
> > @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev,
> }
> >
> >  static int
> > +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev,
> > +                                  struct netdev_custom_stats
> > +*custom_stats) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +    int err = 0;
> > +
> > +    ovs_mutex_lock(&dev->mutex);
> > +
> > +    custom_stats->size = VDPA_CUSTOM_STATS_SIZE;
> > +    custom_stats->counters = xcalloc(custom_stats->size,
> > +                                     sizeof *custom_stats->counters);
> > +    err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay,
> > +                                                 custom_stats);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed."
> > +                 "Port %s\n", netdev->name);
> > +    }
> > +
> > +    ovs_mutex_unlock(&dev->mutex);
> > +    return err;
> > +}
> > +
> > +static int
> >  netdev_dpdk_get_features(const struct netdev *netdev,
> >                           enum netdev_features *current,
> >                           enum netdev_features *advertised, @@ -4237,6
> > +4361,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev)  }
> >
> >  static int
> > +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev) {
> > +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > +    int err;
> > +
> > +    err = netdev_dpdk_reconfigure(netdev);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev-
> >name);
> > +        goto out;
> > +    }
> > +
> > +    ovs_mutex_lock(&dev->mutex);
> > +    err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp-
> >mp,
> > +                                        dev->up.n_rxq);
> > +    if (err) {
> > +        VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s",
> > +                 netdev->name);
> > +    }
> > +
> > +    ovs_mutex_unlock(&dev->mutex);
> > +out:
> > +    return err;
> > +}
> > +
> > +static int
> >  netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)  {
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -4456,6
> > +4605,18 @@ static const struct netdev_class dpdk_vhost_client_class = {
> >      .rxq_enabled = netdev_dpdk_vhost_rxq_enabled,  };
> >
> > +static const struct netdev_class dpdk_vdpa_class = {
> > +    .type = "dpdkvdpa",
> > +    NETDEV_DPDK_CLASS_COMMON,
> > +    .construct = netdev_dpdk_vdpa_construct,
> > +    .destruct = netdev_dpdk_vdpa_destruct,
> > +    .rxq_recv = netdev_dpdk_vdpa_rxq_recv,
> > +    .set_config = netdev_dpdk_vdpa_set_config,
> > +    .reconfigure = netdev_dpdk_vdpa_reconfigure,
> > +    .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats,
> > +    .send = netdev_dpdk_eth_send
> > +};
> > +
> >  void
> >  netdev_dpdk_register(void)
> >  {
> > @@ -4463,4 +4624,5 @@ netdev_dpdk_register(void)
> >      netdev_register_provider(&dpdk_ring_class);
> >      netdev_register_provider(&dpdk_vhost_class);
> >      netdev_register_provider(&dpdk_vhost_client_class);
> > +    netdev_register_provider(&dpdk_vdpa_class);
> >  }
> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> > 9a743c0..9e94950 100644
> > --- a/vswitchd/vswitch.xml
> > +++ b/vswitchd/vswitch.xml
> > @@ -2640,6 +2640,13 @@
> >            <dd>
> >              A pair of virtual devices that act as a patch cable.
> >            </dd>
> > +
> > +          <dt><code>dpdkvdpa</code></dt>
> > +          <dd>
> > +            The dpdk vDPA port allows forwarding bi-directional traffic between
> > +            SR-IOV virtual functions (VFs) and VirtIO devices in virtual
> > +            machines (VMs).
> > +          </dd>
> >          </dl>
> >        </column>
> >      </group>
> > @@ -3156,6 +3163,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> type=patch options:peer=p1 \
> >          </p>
> >        </column>
> >
> > +      <column name="options" key="vdpa-socket-path"
> > +              type='{"type": "string"}'>
> > +        <p>
> > +          The value specifies the path to the socket associated with a VDPA
> > +          port that will be created by QEMU.
> > +          Only supported by dpdkvdpa interfaces.
> > +        </p>
> > +      </column>
> > +
> > +      <column name="options" key="vdpa-accelerator-devargs"
> > +              type='{"type": "string"}'>
> > +        <p>
> > +          The value specifies the PCI address associated with the virtual
> > +          function.
> > +          Only supported by dpdkvdpa interfaces.
> > +        </p>
> > +      </column>
> > +
> >        <column name="options" key="dq-zero-copy"
> >                type='{"type": "boolean"}'>
> >          <p>
> > --
> > 1.8.3.1
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail
> > .openvswitch.org%2Fmailman%2Flistinfo%2Fovs-
> dev&amp;data=02%7C01%7Cnoa
> >
> e%40mellanox.com%7C6390609428bf4e2a2df808d7580cc233%7Ca652971c7d2
> e4d9b
> >
> a6a4d149256f461b%7C0%7C0%7C637074684147132980&amp;sdata=Eai7e%2B
> Ln5x8a
> > fpEi7HdWF8FHDYe4vD7dxRLO2Yo0usQ%3D&amp;reserved=0


More information about the dev mailing list