[ovs-dev] [PATCH v3 2/3] netdev-dpdk: Add vHost User PMD

Traynor, Kevin kevin.traynor at intel.com
Wed May 18 16:57:42 UTC 2016


> -----Original Message-----
> From: Loftus, Ciara
> Sent: Wednesday, May 18, 2016 3:43 PM
> To: Traynor, Kevin <kevin.traynor at intel.com>; dev at openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v3 2/3] netdev-dpdk: Add vHost User PMD
> 
> > >
> > > DPDK 16.04 introduces the vHost PMD which allows 'dpdkvhostuser'
> > > ports
> > > to be controlled by the librte_ether API, like physical 'dpdk'
> ports.
> > > The commit integrates this functionality into OVS, and refactors
> some
> > > of the existing vhost code such that it is vhost-cuse specific.
> > > Similarly, there is now some overlap between dpdk and vhost-user
> port
> > > code.
> > >
> > > Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> >
> > Hi, few minor comments below. I didn't review the cuse specific code
> this
> > time around.
> Thanks Kevin for the feedback, my responses are inline.
> 
> Ciara
> 
> >
> > Kevin.
> >
> >
> > > ---
> > >  INSTALL.DPDK.md   |  12 ++
> > >  NEWS              |   2 +
> > >  lib/netdev-dpdk.c | 628 +++++++++++++++++++++++++++++++++--------
> -
> > --
> > > ----------
> > >  3 files changed, 396 insertions(+), 246 deletions(-)
> > >
> > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > > index 93f92e4..db7153a 100644
> > > --- a/INSTALL.DPDK.md
> > > +++ b/INSTALL.DPDK.md
> > > @@ -990,6 +990,18 @@ Restrictions:
> > >      increased to the desired number of queues. Both DPDK and OVS
> > > must be
> > >      recompiled for this change to take effect.
> > >
> > > +  DPDK 'eth' type ports:
> > > +  - dpdk, dpdkr and dpdkvhostuser ports are 'eth' type ports in
> the
> > > context of
> > > +    DPDK as they are all managed by the rte_ether API. This means
> > > that they
> > > +    adhere to the DPDK configuration option
> CONFIG_RTE_MAX_ETHPORTS
> > > which by
> > > +    default is set to 32. This means by default the combined
> total
> > > number of
> > > +    dpdk, dpdkr and dpdkvhostuser ports allowable in OVS with
> DPDK
> > > is 32. This
> > > +    value can be changed if desired by modifying the
> configuration
> > > file in
> > > +    DPDK, or by overriding the default value on the command line
> > > when building
> > > +    DPDK. eg.
> > > +
> > > +        `make install CONFIG_RTE_MAX_ETHPORTS=64`
> > > +
> > >  Bug Reporting:
> > >  --------------
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 4e81cad..841314b 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -32,6 +32,8 @@ Post-v2.5.0
> > >       * DB entries have been added for many of the DPDK EAL
> command
> > > line
> > >         arguments. Additional arguments can be passed via the
> dpdk-
> > > extra
> > >         entry.
> > > +     * vHost PMD integration brings vhost-user ports under
> control
> > > of the
> > > +       rte_ether DPDK API.
> > >     - ovs-benchmark: This utility has been removed due to lack of
> use
> > > and
> > >       bitrot.
> > >     - ovs-appctl:
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > > index 89d783a..814ef83 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -55,6 +55,7 @@
> > >  #include "unixctl.h"
> > >
> > >  #include "rte_config.h"
> > > +#include "rte_eth_vhost.h"
> > >  #include "rte_mbuf.h"
> > >  #include "rte_meter.h"
> > >  #include "rte_virtio_net.h"
> > > @@ -139,6 +140,11 @@ static char *cuse_dev_name = NULL;    /*
> > > Character device cuse_dev_name. */
> > >  #endif
> > >  static char *vhost_sock_dir = NULL;   /* Location of vhost-user
> > > sockets */
> > >
> > > +/* Array that tracks the used & unused vHost user driver IDs */
> > > +static unsigned int vhost_user_drv_ids[RTE_MAX_ETHPORTS];
> >
> > I think you can replace this array with a counter. You don't need a
> > unique id - just that you are < MAX.
> 
> I considered at first using a counter, but what if the counter reaches
> the MAX but we still have space for most vHost ports?
> eg. We add RTE_MAX_ETHPORTS vHost ports, delete all the ports, then
> try to add one again but can't because the counter is at max.
> Even if we decrement the counter on delete this still doesn't solve
> the problem because the port we delete won't necessarily be the last
> one we've added.

ok, got it - it's part of the name and that has to be unique.

> 
> >
> > > +/* Maximum string length allowed to provide to rte_eth_attach
> > > function */
> > > +#define DEVARGS_MAX (RTE_ETH_NAME_MAX_LEN + PATH_MAX + 18)
> > > +
> > >  /*
> > >   * Maximum amount of time in micro seconds to try and enqueue to
> > > vhost.
> > >   */
> > > @@ -172,7 +178,8 @@ enum { DRAIN_TSC = 200000ULL };
> > >
> > >  enum dpdk_dev_type {
> > >      DPDK_DEV_ETH = 0,
> > > -    DPDK_DEV_VHOST = 1,
> > > +    DPDK_DEV_VHOST_USER = 1,
> > > +    DPDK_DEV_VHOST_CUSE = 2,
> > >  };
> > >
> > >  static int rte_eal_init_ret = ENODEV;
> > > @@ -358,12 +365,22 @@ struct netdev_dpdk {
> > >      int real_n_rxq;
> > >      bool txq_needs_locking;
> > >
> > > -    /* virtio-net structure for vhost device */
> > > +    /* Spinlock for vhost cuse transmission. Other DPDK devices
> use
> > > spinlocks
> > > +     * in dpdk_tx_queue */
> > > +    rte_spinlock_t vhost_cuse_tx_lock;
> > > +
> > > +    /* virtio-net structure for vhost cuse device */
> > >      OVSRCU_TYPE(struct virtio_net *) virtio_dev;
> > >
> > > +    /* Number of virtqueue pairs reported by the guest */
> > > +    uint32_t vhost_qp_nb;
> > > +
> > >      /* Identifier used to distinguish vhost devices from each
> other
> > > */
> > >      char vhost_id[PATH_MAX];
> > >
> > > +    /* ID of vhost user port given to the PMD driver */
> > > +    unsigned int vhost_pmd_id;
> > > +
> >
> > This could be removed if you just use a counter as per comment
> above.
> >
> > >      /* In dpdk_list. */
> > >      struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
> > >
> > > @@ -381,16 +398,20 @@ struct netdev_rxq_dpdk {
> > >  static bool dpdk_thread_is_pmd(void);
> > >
> > >  static int netdev_dpdk_construct(struct netdev *);
> > > +static int netdev_dpdk_vhost_user_construct(struct netdev *);
> > >
> > >  struct virtio_net * netdev_dpdk_get_virtio(const struct
> netdev_dpdk
> > > *dev);
> > >
> > >  void link_status_changed_callback(uint8_t port_id,
> > >          enum rte_eth_event_type type OVS_UNUSED, void *param
> > > OVS_UNUSED);
> > > +void vring_state_changed_callback(uint8_t port_id,
> > > +        enum rte_eth_event_type type OVS_UNUSED, void *param
> > > OVS_UNUSED);
> > >
> > >  static bool
> > > -is_dpdk_class(const struct netdev_class *class)
> > > +is_dpdk_eth_class(const struct netdev_class *class)
> > >  {
> > > -    return class->construct == netdev_dpdk_construct;
> > > +    return ((class->construct == netdev_dpdk_construct) ||
> > > +            (class->construct ==
> netdev_dpdk_vhost_user_construct));
> > >  }
> > >
> > >  /* DPDK NIC drivers allocate RX buffers at a particular
> granularity,
> > > typically
> > > @@ -592,7 +613,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> > > *dev, int n_rxq, int n_txq)
> > >          }
> > >
> > >          dev->up.n_rxq = n_rxq;
> > > -        dev->real_n_txq = n_txq;
> > > +        /* Only set real_n_txq for physical devices. vHost User
> > > devices will
> > > +         * set this value correctly during vring_state_changed
> > > callbacks.
> > > +         */
> > > +        if (dev->type == DPDK_DEV_ETH) {
> > > +            dev->real_n_txq = n_txq;
> > > +        }
> > >
> > >          return 0;
> > >      }
> > > @@ -697,6 +723,118 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk
> > *dev,
> > > unsigned int n_txqs)
> > >      }
> > >  }
> > >
> > > +/*
> > > + * Fixes mapping for vhost-user tx queues. Must be called after
> each
> > > + * enabling/disabling of queues and real_n_txq modifications.
> > > + */
> > > +static void
> > > +netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
> > > +    OVS_REQUIRES(dev->mutex)
> > > +{
> > > +    int *enabled_queues, n_enabled = 0;
> > > +    int i, k, total_txqs = dev->real_n_txq;
> > > +
> > > +    enabled_queues = dpdk_rte_mzalloc(total_txqs * sizeof
> > > *enabled_queues);
> > > +
> > > +    for (i = 0; i < total_txqs; i++) {
> > > +        /* Enabled queues always mapped to themselves. */
> > > +        if (dev->tx_q[i].map == i) {
> > > +            enabled_queues[n_enabled++] = i;
> > > +        }
> > > +    }
> > > +
> > > +    if (n_enabled == 0 && total_txqs != 0) {
> > > +        enabled_queues[0] = OVS_VHOST_QUEUE_DISABLED;
> > > +        n_enabled = 1;
> > > +    }
> > > +
> > > +    k = 0;
> > > +    for (i = 0; i < total_txqs; i++) {
> > > +        if (dev->tx_q[i].map != i) {
> > > +            dev->tx_q[i].map = enabled_queues[k];
> > > +            k = (k + 1) % n_enabled;
> > > +        }
> > > +    }
> > > +
> > > +    VLOG_DBG("TX queue mapping for %s\n", dev->vhost_id);
> > > +    for (i = 0; i < total_txqs; i++) {
> > > +        VLOG_DBG("%2d --> %2d", i, dev->tx_q[i].map);
> > > +    }
> > > +
> > > +    rte_free(enabled_queues);
> > > +}
> > > +
> > > +static void
> > > +netdev_dpdk_vhost_set_queues(struct netdev_dpdk *dev)
> > > +    OVS_REQUIRES(dev->mutex)
> > > +{
> > > +    dev->real_n_rxq = dev->vhost_qp_nb;
> > > +    dev->real_n_txq = dev->vhost_qp_nb;
> > > +    dev->txq_needs_locking = true;
> > > +
> > > +    /* Enable TX queue 0 by default if it wasn't disabled. */
> > > +    if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) {
> > > +        dev->tx_q[0].map = 0;
> > > +    }
> > > +
> > > +    netdev_dpdk_remap_txqs(dev);
> > > +
> > > +    return;
> > > +}
> > > +
> > > +/* Clears mapping for all available queues of vhost interface. */
> > > +static void
> > > +netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev)
> > > +    OVS_REQUIRES(dev->mutex)
> > > +{
> > > +    int i;
> > > +
> > > +    for (i = 0; i < dev->real_n_txq; i++) {
> > > +        dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN;
> > > +    }
> > > +}
> > > +
> > > +void
> > > +vring_state_changed_callback(uint8_t port_id,
> > > +                             enum rte_eth_event_type type
> > > OVS_UNUSED,
> > > +                             void *param OVS_UNUSED)
> > > +{
> > > +    struct netdev_dpdk *dev;
> > > +    struct rte_eth_vhost_queue_event event;
> > > +    int err = 0;
> > > +
> > > +    err = rte_eth_vhost_get_queue_event(port_id, &event);
> > > +    if (err || (event.rx == 1)) {
> >
> > if (err || event.rx)  would be clearer
> ok
> >
> > > +        return;
> > > +    }
> > > +
> > > +    ovs_mutex_lock(&dpdk_mutex);
> > > +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
> > > +        if (port_id == dev->port_id) {
> > > +            ovs_mutex_lock(&dev->mutex);
> > > +            if (event.enable) {
> > > +                dev->tx_q[event.queue_id].map = event.queue_id;
> > > +                dev->vhost_qp_nb++;
> > > +            } else {
> > > +                dev->tx_q[event.queue_id].map =
> > > OVS_VHOST_QUEUE_DISABLED;
> > > +                dev->vhost_qp_nb--;
> > > +            }
> > > +            if (dev->vhost_qp_nb > dev->up.n_rxq) {
> >
> > There's a few places through the code where vhost_qp_nb is assumed
> to
> > have
> > a value, though it is initialized as 0. It *should* be ok, but
> perhaps the
> > initialization could be changed or 0 could be incorporated into the
> range
> > checks. What do you think?
> 
> virt_qp_nb is only used after VM boot so it should be safe to assume
> it will have a valid value. Although zero may be a valid value if we
> boot with no queues (not sure if possible!)
> I think the init should stay as 0 - it's representative of the virtual
> queue pairs and we have zero when the port is being created before the
> VM exists.
> By incorporating 0 into the range checks are you suggesting reporting
> an error when the queues reported = 0?

My general point was the rx/tx queues are being inited to 1 whereas
the qp to 0. So long as we can't do anything we shouldn't while they are
in that state because something will catch it, it's ok. Checking for
0 qp would be one way.

[snip]

> > > +    if (dev->type == DPDK_DEV_ETH) {
> > > +        /* DPDK counts imissed as errors, but count them here as
> > > dropped
> > > +         * instead */
> > > +        stats->rx_errors = rte_stats.ierrors - rte_stats.imissed;
> > > +        stats->tx_errors = rte_stats.oerrors;
> > > +        stats->multicast = rte_stats.imcasts;
> > > +
> > > +        rte_spinlock_lock(&dev->stats_lock);
> > > +        stats->tx_dropped = dev->stats.tx_dropped;
> > > +        rte_spinlock_unlock(&dev->stats_lock);
> > > +    } else {
> > > +        stats->rx_errors = UINT64_MAX;
> > > +        stats->tx_errors = UINT64_MAX;
> > > +        stats->multicast = UINT64_MAX;
> > > +
> > > +        rte_spinlock_lock(&dev->stats_lock);
> > > +        stats->tx_dropped = UINT64_MAX;
> >
> > We should still be able to count tx dropped.
> You're right. Will fix in v4
> 
> >
> > > +        rte_spinlock_unlock(&dev->stats_lock);
> > > +    }
> >
> > I assume there's some rework because of the extended stats that
> > have merged now?
> I think when this was submitted it was rebased on top of these
> changes.

Are they reporting for vhost-pmd? if not, you may need to skip that check
or set some =UINT64_MAX


More information about the dev mailing list