[ovs-dev] [PATCH 1/1] PMD dpdk TX output SMP dpdk queue use and packet surge assoprtion.
Polehn, Mike A
mike.a.polehn at intel.com
Thu Jun 26 16:32:06 UTC 2014
I'll rebase, study and correct to OVS coding style and repost.
There is a very good reason for putting constants on the left hand side of a compare statement.
For example:
if (NULL = x)
will be a compiler error, while the following will compile and need debugging:
if (x = NULL)
Although I try not making the comparison mistakes, I have recently made that exact mistake and had to debug.
If I had used the second format, the complier would have output an error and saved the time of debugging.
Mike Polehn
-----Original Message-----
From: Ben Pfaff [mailto:blp at nicira.com]
Sent: Wednesday, June 25, 2014 1:48 PM
To: Polehn, Mike A
Cc: dev at openvswitch.org
Subject: Re: [ovs-dev] [PATCH 1/1] PMD dpdk TX output SMP dpdk queue use and packet surge assoprtion.
On Fri, Jun 20, 2014 at 10:24:33PM +0000, Polehn, Mike A wrote:
> Put in a DPDK queue to receive from multiple core SMP input from vSwitch for NIC TX output.
> Eliminated the inside polling loop SMP TX output lock (DPDK queue handles SMP).
> Added a SMP lock for non-polling operation to allow TX output by the non-polling thread
> when interface not being polled. Lock accessed only when polling is not enabled.
> Added new netdev subroutine to control polling lock and enable and disable flag.
> Packets do not get discarded between TX pre-queue and NIC queue to handle surges.
>
> Measured improved average PMD port to port 2544 zero loss packet rate
> of 268,000 for packets 512 bytes and smaller. Predict double that when using 1 cpu core/interface.
>
> Observed better persistence of obtaining 100% 10 GbE for larger
> packets with the added DPDK queue, consistent with other tests outside
> of OVS where large surges from fast path interfaces transferring
> larger sized packets from VMs were being absorbed in the NIC TX pre-queue and TX queue and packet loss was suppressed.
>
> Signed-off-by: Mike A. Polehn <mike.a.polehn at intel.com>
This doesn't apply to the current tree. You'll need to rebase and repost it.
I have some stylistic comments. Most of the following are cut-and-paste from CONTRIBUTING or CodingStyle (please read both).
Many of them apply in multiple places, but I only pointed them out once.
Please limit lines in the commit message to 79 characters in width.
Comments should be written as full sentences that start with a capital letter and end with a period:
> + /* get poll ownership */
Enclose single statements in braces:
if (a > b) {
return a;
} else {
return b;
}
> + for (i = 0; i < poll_cnt; i++)
> + netdev_rxq_do_polling(poll_list[i].rx, true);
> +
> for (;;) {
> unsigned int c_port_seq;
> int i;
When using a relational operator like "<" or "==", put an expression or variable argument on the left and a constant argument on the right, e.g. "x == 0", *not* "0 == x":
> + if (NULL == dev->tx_q[i].tx_preq) {
> + VLOG_ERR("eth dev tx pre-queue alloc error");
> + return -ENOMEM;
> + }
> }
We don't generally put "inline" on functions in C files, since it suppresses otherwise useful "function not used" warnings and doesn't usually help code generation:
> inline static void
> -dpdk_queue_flush(struct netdev_dpdk *dev, int qid)
> +dpdk_port_out(struct dpdk_tx_queue *tx_q, int qid)
> {
> - struct dpdk_tx_queue *txq = &dev->tx_q[qid];
> - uint32_t nb_tx;
> + /* get packets from NIC tx staging queue */
> + if (likely(tx_q->count == 0))
> + tx_q->count = rte_ring_sc_dequeue_burst(tx_q->tx_preq,
> + (void **)&tx_q->tx_trans[0], NIC_TX_PRE_Q_TRANS);
> +
> + /* send packets to NIC tx queue */
> + if (likely(tx_q->count != 0)) {
> + unsigned sent = rte_eth_tx_burst(tx_q->port_id, qid, tx_q->tx_trans,
> + tx_q->count);
> + tx_q->count -= sent;
> +
> + if (unlikely((tx_q->count != 0) && (sent > 0)))
> + /* move unsent packets to front of list */
> + memmove(&tx_q->tx_trans[0], &tx_q->tx_trans[sent],
> + (sizeof(struct rte_mbuf *) * tx_q->count));
> + }
> +}
>
> - if (txq->count == 0) {
> - return;
Put the return type, function name, and the braces that surround the function's code on separate lines, all starting in column 0:
> +static void netdev_dpdk_do_poll(struct netdev_rxq *rxq_, unsigned
> +enable) {
> + struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq_);
> + struct netdev *netdev = rx->up.netdev;
> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + struct dpdk_tx_queue *tx_q = &dev->tx_q[rxq_->queue_id];
> +
> + if (enable) {
> + tx_q->is_polled = true;
> + /* get polling ownership */
> + rte_spinlock_lock(&tx_q->tx_lock);
> + } else {
> + tx_q->is_polled = false;
> + /* clear queue after flag for race of the non-poll queuer */
> + dpdk_port_out(tx_q, rxq_->queue_id);
> + rte_spinlock_unlock(&tx_q->tx_lock);
> }
> - rte_spinlock_lock(&txq->tx_lock);
> - nb_tx = rte_eth_tx_burst(dev->port_id, qid, txq->burst_pkts, txq->count);
> - if (nb_tx != txq->count) {
> - /* free buffers if we couldn't transmit packets */
> - rte_mempool_put_bulk(dev->dpdk_mp->mp,
> - (void **) &txq->burst_pkts[nb_tx],
> - (txq->count - nb_tx));
> +}
I guess that this "likely" macro is coming from the Linux kernel headers. Please use OVS_LIKELY instead:
> + if (likely(tx_q->is_polled))
> + dpdk_port_out(tx_q, rxq_->queue_id);
>
> nb_rx = rte_eth_rx_burst(rx->port_id, rxq_->queue_id,
> (struct rte_mbuf **) packets,
Ditto OVS_UNLIKELY here:
> - if (!pkt) {
> + if (unlikely(!pkt)) {
> ovs_mutex_lock(&dev->mutex);
> dev->stats.tx_dropped++;
> ovs_mutex_unlock(&dev->mutex);
This is much too short to explain how the client is supposed to use it:
> + /* Get poll ownership for PMD, enable before starting RX polling loop and
> + * disable after exiting the polling loop. NULL if not supported. */
> + void (*rxq_do_polling)(struct netdev_rxq *rx, bool enable);
> };
Ditto here:
> +/* Tell when entering and exiting polling mode for 'rx' interface */
> +void netdev_rxq_do_polling(struct netdev_rxq *rx, bool enable) {
> + if (rx->netdev->netdev_class->rxq_do_polling)
> + rx->netdev->netdev_class->rxq_do_polling(rx, enable); }
Omit parameter names from function prototypes when the names do not give useful information, e.g.:
int netdev_get_mtu(const struct netdev *, int *mtup);
> +void netdev_rxq_do_polling(struct netdev_rxq *rx, bool enable);
More information about the dev
mailing list