[ovs-dev] [PATCH v2] Do not flush tx queue which is shared among CPUs since it is always flushed

Wed Jun 17 15:47:28 UTC 2015

On Wed, Jun 17, 2015 at 12:28:23PM -0300, Flavio Leitner wrote:
> On Wed, Jun 17, 2015 at 10:54:25AM +0800, Dongjun wrote:
> > 
> > 
> > On 2015/6/17 1:44, Daniele Di Proietto wrote:
> > >
> > >On 16/06/2015 07:40, "Pravin Shelar" <pshelar at nicira.com> wrote:
> > >
> > >>On Mon, Jun 8, 2015 at 7:42 PM, Pravin Shelar <pshelar at nicira.com> wrote:
> > >>>On Mon, Jun 8, 2015 at 6:13 PM, Wei li <liw at dtdream.com> wrote:
> > >>>>When tx queue is shared among CPUS,the pkts always be flush in
> > >>>>'netdev_dpdk_eth_send'
> > >>>>So it is unnecessarily for flushing in netdev_dpdk_rxq_recv
> > >>>>Otherwise tx will be accessed without locking
> > >>>>
> > >>>>Signed-off-by: Wei li <liw at dtdream.com>
> > >>>>---
> > >>>>v1->v2: fix typo
> > >>>>
> > >>>>  lib/netdev-dpdk.c | 7 +++++--
> > >>>>  1 file changed, 5 insertions(+), 2 deletions(-)
> > >>>>
> > >>>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > >>>>index 63243d8..1e0a483 100644
> > >>>>--- a/lib/netdev-dpdk.c
> > >>>>+++ b/lib/netdev-dpdk.c
> > >>>>@@ -892,8 +892,11 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_,
> > >>>>struct dp_packet **packets,
> > >>>>      int nb_rx;
> > >>>>
> > >>>>      /* There is only one tx queue for this core.  Do not flush other
> > >>>>-     * queueus. */
> > >>>>-    if (rxq_->queue_id == rte_lcore_id()) {
> > >>>>+     * queues.
> > >>>>+     * Do not flush tx queue which is shared among CPUs
> > >>>>+     * since it is always flushed */
> > >>>>+    if (rxq_->queue_id == rte_lcore_id() &&
> > >>>>+               OVS_LIKELY(!dev->txq_needs_locking)) {
> > >>>>          dpdk_queue_flush(dev, rxq_->queue_id);
> > >>>>      }
> > >>>Patch looks good, But Daniele has plan to get rid of queue flushing
> > >>>logic. do you see problem with doing this?
> > >>Daniele,
> > >>I am not sure if we should fix this queue flushing logic if we are
> > >>going to remove it. So I would like to discuss it first.
> > >>You mentioned that the removing flushing logic results in performance
> > >>drop. Can you explain it?  How much is performance drop and which is
> > >>the case?
> > >Hi Pravin,
> > >
> > >sorry for the late reply.  I suspected that removing the queues in
> > >netdev-dpdk
> > >was going to have a performance impact in the following scenario: a batch
> > >of
> > >packets from different megaflows with the same action (output on port 1).
> > >Here's what happens:
> > >
> > >1) With the queues in netdev-dpdk.  dpif-netdev calls netdev_send() for
> > >each
> > >    packet.  netdev_dpdk_send() just copies the pointer in the queue. Before
> > >    receiving the next batch dpdk_queue_flush() call rte_eth_tx_burst() with
> > >    a whole batch of packets
> > >2) Without the queues in netdev-dpdk. dpif-netdev calls netdev_send() for
> > >each
> > >    packet.  netdev_dpdk_send() calls rte_eth_tx_burst() for each packet.
> > >
> > >Scenario 2 could be slower because rte_eth_tx_burst() is called for each
> > >packet
> > >(instead of each batch). After some testing this turns out not to be the
> > >case.
> > >It appears that the bottleneck is not rte_eth_tx_burst(), and copying the
> > >pointers
> > >in the temporary queue makes the code slower.
> > Hi Daniele,
> > I get something else impacting the performance too. Here it is:
> > In netdev_dpdk_rxq_recv(), the flush condition is "txq->count != 0", and in
> > dpdk_queue_pkts(), the flush condition is "txq->count == MAX_TX_QUEUE_LEN"
> > or "(rte_get_timer_cycles() - txq->tsc) >= DRAIN_TSC".
> > They don't match. Our thoughts is to get a batch flush, but
> > netdev_dpdk_rxq_recv() may do it earlier sometime.
> > 
> > I did some local change, stealing the cycle interval from dpdk_queue_pkts().
> > I got performance improvement, about at least 10% from guest VNIC to dpdk
> > phy NIC to outside.
> > I didn't compare the performance to "Scenario 2" for "Scenario 2"
> > modification is not that easy. :(
> > 
> > $ git diff
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 3af1ee7..98d33c3 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -929,10 +929,13 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct
> > dp_packet **packets,
> > 
> >      /* There is only one tx queue for this core.  Do not flush other
> >       * queueus. */
> > -    if (rxq_->queue_id == rte_lcore_id()) {
> > -        dpdk_queue_flush(dev, rxq_->queue_id);
> > -    }
> > +    if (rxq_->queue_id == rte_lcore_id() &&
> > OVS_LIKELY(!dev->txq_needs_locking)) {
> > +        struct dpdk_tx_queue *txq = &dev->tx_q[rxq_->queue_id];
> > 
> > +        if (txq->count != 0 && (rte_get_timer_cycles() - txq->tsc) >=
> > DRAIN_TSC) {
> > +            dpdk_queue_flush__(dev, rxq_->queue_id);  // instead of
> > dpdk_queue_flush
> > +        }
> > +    }
> >      nb_rx = rte_eth_rx_burst(rx->port_id, rxq_->queue_id,
> >                               (struct rte_mbuf **) packets,
> >                               NETDEV_MAX_BURST);
> 
> I haven't tried your patch yet, but indeed there is something
> weird going on with batching.
> 
> I am now at 9Mpps running l2fwd inside of guest which is +100%
> more than before just because increased the batching.

scratch that, typo in the scripts.
I will try your patch next
fbl