[ovs-dev] [RFC PATCH] netdev-dpdk: Remove vhost send retries when no packets have been sent.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Tue May 24 14:08:23 UTC 2016


>-----Original Message-----
>From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Kevin
>Traynor
>Sent: Monday, May 23, 2016 3:46 PM
>To: dev at openvswitch.org
>Cc: i.maximets at samsung.com; Traynor, Kevin <kevin.traynor at intel.com>
>Subject: [ovs-dev] [RFC PATCH] netdev-dpdk: Remove vhost send retries
>when no packets have been sent.
>
>If the guest is connected but not servicing the virt queue, this leads to vhost
>send retries until timeout. This is fine in isolation but if there are other high
>rate queues also being serviced by the same PMD it can lead to a performance
>hit on those queues. Change to only retry when at least some packets have
>been successfully sent on the previous attempt.

Thanks for the patch kevin, I verified the patch and this seems to fix the problem I reported.
Though the below(now removed) retry logic looks fair (given a 100 micro second timeout) and causes no issues in isolation, it leads to serious problems when scaling the VMs.
On a multi VM setup with explicit flows set, I see the aggregate throughput nearly collapses to few hundred thousand packets  when  few of my guests are not servicing their queues and sitting idle. The profiling shows significant cycles are spent in __netdev_dpdk_vhost_send() enqueuing  the packets on the vhost ports which are not drained there by triggering the retry logic and wasting cpu cycles there by significantly impacting the aggregate throughput.

I am not sure of corner cases where the retry logic is still needed, otherwise you can treat this as Acked.

Acked-by: Bhanuprakash Bodireddy <Bhanuprakash.bodireddy at intel.com>

>
>Reported-by: Bhanuprakash Bodireddy
><bhanuprakash.bodireddy at intel.com>
>Signed-off-by: Kevin Traynor <kevin.traynor at intel.com>
>---
>
>Sending for discussion in this mailing list thread
>http://openvswitch.org/pipermail/dev/2016-May/071115.html
>
> lib/netdev-dpdk.c |   28 +---------------------------
> 1 files changed, 1 insertions(+), 27 deletions(-)
>
>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c7217ea..c39ce6c
>100644
>--- a/lib/netdev-dpdk.c
>+++ b/lib/netdev-dpdk.c
>@@ -110,11 +110,6 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
>ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
> static char *cuse_dev_name = NULL;    /* Character device cuse_dev_name.
>*/
> static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
>
>-/*
>- * Maximum amount of time in micro seconds to try and enqueue to vhost.
>- */
>-#define VHOST_ENQ_RETRY_USECS 100
>-
> static const struct rte_eth_conf port_conf = {
>     .rxmode = {
>         .mq_mode = ETH_MQ_RX_RSS,
>@@ -1261,7 +1256,6 @@ __netdev_dpdk_vhost_send(struct netdev
>*netdev, int qid,
>     struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
>     unsigned int total_pkts = cnt;
>     unsigned int qos_pkts = cnt;
>-    uint64_t start = 0;
>
>     qid = dev->tx_q[qid % dev->real_n_txq].map;
>
>@@ -1290,27 +1284,7 @@ __netdev_dpdk_vhost_send(struct netdev
>*netdev, int qid,
>             /* Prepare for possible next iteration.*/
>             cur_pkts = &cur_pkts[tx_pkts];
>         } else {
>-            uint64_t timeout = VHOST_ENQ_RETRY_USECS * rte_get_timer_hz() /
>1E6;
>-            unsigned int expired = 0;
>-
>-            if (!start) {
>-                start = rte_get_timer_cycles();
>-            }
>-
>-            /*
>-             * Unable to enqueue packets to vhost interface.
>-             * Check available entries before retrying.
>-             */
>-            while (!rte_vring_available_entries(virtio_dev, vhost_qid)) {
>-                if (OVS_UNLIKELY((rte_get_timer_cycles() - start) > timeout)) {
>-                    expired = 1;
>-                    break;
>-                }
>-            }
>-            if (expired) {
>-                /* break out of main loop. */
>-                break;
>-            }
>+            break;
>         }
>     } while (cnt);
>
>--
>1.7.4.1
>
>_______________________________________________
>dev mailing list
>dev at openvswitch.org
>http://openvswitch.org/mailman/listinfo/dev


More information about the dev mailing list