[ovs-dev] [PATCH 1/1] PMD dpdk TX output SMP dpdk queue use and packet surge assoprtion.

Polehn, Mike A mike.a.polehn at intel.com
Thu Jun 26 16:32:06 UTC 2014


I'll rebase, study and correct to OVS coding style and repost.

There is a very good reason for putting constants on the left hand side of a compare statement.
For example:
if (NULL = x)  
will be a compiler error, while the following will compile and need debugging:
if (x = NULL)

Although I try not making the comparison mistakes, I have recently made that exact mistake and had to debug.
If I had used the second format, the complier would have output an error and saved the time of debugging.

Mike Polehn

-----Original Message-----
From: Ben Pfaff [mailto:blp at nicira.com] 
Sent: Wednesday, June 25, 2014 1:48 PM
To: Polehn, Mike A
Cc: dev at openvswitch.org
Subject: Re: [ovs-dev] [PATCH 1/1] PMD dpdk TX output SMP dpdk queue use and packet surge assoprtion.

On Fri, Jun 20, 2014 at 10:24:33PM +0000, Polehn, Mike A wrote:
> Put in a DPDK queue to receive from multiple core SMP input from vSwitch for NIC TX output.
> Eliminated the inside polling loop SMP TX output lock (DPDK queue handles SMP).
> Added a SMP lock for non-polling operation to allow TX output by the non-polling thread 
>      when interface not being polled. Lock accessed only when polling is not enabled.
> Added new netdev subroutine to control polling lock and enable and disable flag.
> Packets do not get discarded between TX pre-queue and NIC queue to handle surges.
> 
> Measured improved average PMD port to port 2544 zero loss packet rate 
> of 268,000 for packets 512 bytes and smaller. Predict double that when using 1 cpu core/interface.
> 
> Observed better persistence of obtaining 100% 10 GbE for larger 
> packets with the added DPDK queue, consistent with other tests outside 
> of OVS where large surges from fast path interfaces transferring 
> larger sized packets from VMs were being absorbed in the NIC TX pre-queue and TX queue and packet loss was suppressed.
> 
> Signed-off-by: Mike A. Polehn <mike.a.polehn at intel.com>

This doesn't apply to the current tree.  You'll need to rebase and repost it.

I have some stylistic comments.  Most of the following are cut-and-paste from CONTRIBUTING or CodingStyle (please read both).
Many of them apply in multiple places, but I only pointed them out once.

Please limit lines in the commit message to 79 characters in width.

Comments should be written as full sentences that start with a capital letter and end with a period:
> +    /* get poll ownership */

  Enclose single statements in braces:

    if (a > b) {
        return a;
    } else {
        return b;
    }
> +    for (i = 0; i < poll_cnt; i++)
> +       netdev_rxq_do_polling(poll_list[i].rx, true);
> +
>      for (;;) {
>          unsigned int c_port_seq;
>          int i;

When using a relational operator like "<" or "==", put an expression or variable argument on the left and a constant argument on the right, e.g. "x == 0", *not* "0 == x":
> +        if (NULL == dev->tx_q[i].tx_preq) {
> +            VLOG_ERR("eth dev tx pre-queue alloc error");
> +            return -ENOMEM;
> +        }
>      }

We don't generally put "inline" on functions in C files, since it suppresses otherwise useful "function not used" warnings and doesn't usually help code generation:
>  inline static void
> -dpdk_queue_flush(struct netdev_dpdk *dev, int qid)
> +dpdk_port_out(struct dpdk_tx_queue *tx_q, int qid)
>  {
> -    struct dpdk_tx_queue *txq = &dev->tx_q[qid];
> -    uint32_t nb_tx;
> +    /* get packets from NIC tx staging queue */
> +    if (likely(tx_q->count == 0))
> +        tx_q->count =  rte_ring_sc_dequeue_burst(tx_q->tx_preq, 
> +            (void **)&tx_q->tx_trans[0], NIC_TX_PRE_Q_TRANS);
> +
> +    /* send packets to NIC tx queue */
> +    if (likely(tx_q->count != 0)) {
> +        unsigned sent = rte_eth_tx_burst(tx_q->port_id, qid, tx_q->tx_trans,
> +                                         tx_q->count);
> +        tx_q->count -= sent;
> +
> +        if (unlikely((tx_q->count != 0) && (sent > 0)))
> +            /* move unsent packets to front of list */
> +            memmove(&tx_q->tx_trans[0], &tx_q->tx_trans[sent],
> +                    (sizeof(struct rte_mbuf *) * tx_q->count));
> +    }
> +}
>  
> -    if (txq->count == 0) {
> -        return;

  Put the return type, function name, and the braces that surround the function's code on separate lines, all starting in column 0:
> +static void netdev_dpdk_do_poll(struct netdev_rxq *rxq_, unsigned 
> +enable) {
> +    struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq_);
> +    struct netdev *netdev = rx->up.netdev;
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    struct dpdk_tx_queue *tx_q = &dev->tx_q[rxq_->queue_id];
> +
> +    if (enable) {
> +        tx_q->is_polled = true;
> +        /* get polling ownership */
> +        rte_spinlock_lock(&tx_q->tx_lock);
> +    } else {
> +        tx_q->is_polled = false;
> +	   /* clear queue after flag for race of the non-poll queuer */
> +        dpdk_port_out(tx_q, rxq_->queue_id);
> +        rte_spinlock_unlock(&tx_q->tx_lock);
>      }
> -    rte_spinlock_lock(&txq->tx_lock);
> -    nb_tx = rte_eth_tx_burst(dev->port_id, qid, txq->burst_pkts, txq->count);
> -    if (nb_tx != txq->count) {
> -        /* free buffers if we couldn't transmit packets */
> -        rte_mempool_put_bulk(dev->dpdk_mp->mp,
> -                             (void **) &txq->burst_pkts[nb_tx],
> -                             (txq->count - nb_tx));
> +}

I guess that this "likely" macro is coming from the Linux kernel headers.  Please use OVS_LIKELY instead:
> +    if (likely(tx_q->is_polled))
> +        dpdk_port_out(tx_q, rxq_->queue_id);
>  
>      nb_rx = rte_eth_rx_burst(rx->port_id, rxq_->queue_id,
>                               (struct rte_mbuf **) packets,

Ditto OVS_UNLIKELY here:
> -    if (!pkt) {
> +    if (unlikely(!pkt)) {
>          ovs_mutex_lock(&dev->mutex);
>          dev->stats.tx_dropped++;
>          ovs_mutex_unlock(&dev->mutex);

This is much too short to explain how the client is supposed to use it:
> +    /* Get poll ownership for PMD, enable before starting RX polling loop and
> +     * disable after exiting the polling loop. NULL if not supported. */
> +    void (*rxq_do_polling)(struct netdev_rxq *rx, bool enable);
>  };

Ditto here:
> +/* Tell when entering and exiting polling mode for 'rx' interface */ 
> +void netdev_rxq_do_polling(struct netdev_rxq *rx, bool enable) {
> +    if (rx->netdev->netdev_class->rxq_do_polling)
> +        rx->netdev->netdev_class->rxq_do_polling(rx, enable); }

  Omit parameter names from function prototypes when the names do not give useful information, e.g.:

    int netdev_get_mtu(const struct netdev *, int *mtup);
> +void netdev_rxq_do_polling(struct netdev_rxq *rx, bool enable);



More information about the dev mailing list