[ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in intermediate queue.

Jan Scheurich jan.scheurich at ericsson.com
Fri Aug 11 15:20:52 UTC 2017


Hi Bhanu,

Thanks a lot for all the work you put into making the original hackfest patch good enough for upstreaming in an increasingly complex and ever-evolving DPDK datapath.

I tend to agree that the complexity to achieve thread safety is becoming overwhelming. Ilya's v3 patch set submitted yesterday looks much simpler. I fully support to focus on this one, especially as it already covers time-based tx batching.

We will start testing Ilya's patch now and come back with results/comments next week.

BR, Jan

> -----Original Message-----
> From: Bodireddy, Bhanuprakash [mailto:bhanuprakash.bodireddy at intel.com]
> Sent: Friday, 11 August, 2017 15:12
> To: Darrell Ball <dball at vmware.com>; dev at openvswitch.org; Ilya Maximets <i.maximets at samsung.com>; Jan Scheurich
> <jan.scheurich at ericsson.com>; Eelco Chaudron <echaudro at redhat.com>
> Cc: Ben Pfaff <blp at ovn.org>; Stokes, Ian <ian.stokes at intel.com>; Gray, Mark D <mark.d.gray at intel.com>; Fischetti, Antonio
> <antonio.fischetti at intel.com>; Kevin Traynor <ktraynor at redhat.com>
> Subject: RE: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in intermediate queue.
> 
> Hello All,
> 
> Adding all the people here who had either reviewed or provided their feedback
> on the batching patches at some stage.
> 
> you are already aware that there are 2 different series on ML to implement tx
>  batching (netdev layer vs dpif layer) that improves DPDK datapath performance.
> Our output batching is in netdev layer whereas ilya moved it to  dpif layer
> and simplified it. Each approach has its own pros and cons and had been
> discussed in earlier threads.
> 
> While reviewing v4 of my patch series,  ilya detected a race condition that happens
> when the queues in the guest are enabled/disabled at run time. Though we have
> solutions to address this issue and implemented it, I realized that the code complexity
> has increased and changes spanning multiple functions with additional spin locks  to
> address this one corner case.
> 
> I think, though our patch series has flexibility it has gotten lot complex now and would
> be difficult to maintain in the future. At this stage I would like to lean towards simpler
> solution that's more maintainable  which is implemented by Ilya.
> 
> I would like to thank Eelco, Darrell, Jan and Ilya for reviewing our series and providing their
> feedback.
> 
> Bhanuprakash.
> 
> >-----Original Message-----
> >From: Darrell Ball [mailto:dball at vmware.com]
> >Sent: Friday, August 11, 2017 2:03 AM
> >To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>;
> >dev at openvswitch.org
> >Subject: Re: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in
> >intermediate queue.
> >
> >Hi Bhanu
> >
> >Given that you ultimately intend changes beyond those in this patch, would it
> >make sense just to fold the follow up series (at least, the key elements) into
> >this series, essentially expanding on this patch 5 ?
> >
> >Thanks Darrell
> >
> >-----Original Message-----
> >From: <ovs-dev-bounces at openvswitch.org> on behalf of Bhanuprakash
> >Bodireddy <bhanuprakash.bodireddy at intel.com>
> >Date: Tuesday, August 8, 2017 at 10:06 AM
> >To: "dev at openvswitch.org" <dev at openvswitch.org>
> >Subject: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in
> >	intermediate queue.
> >
> >    Under low rate traffic conditions, there can be 2 issues.
> >      (1) Packets potentially can get stuck in the intermediate queue.
> >      (2) Latency of the packets can increase significantly due to
> >           buffering in intermediate queue.
> >
> >    This commit handles the (1) issue by flushing the tx port queues using
> >    dp_netdev_flush_txq_ports() as part of PMD packet processing loop.
> >    Also this commit addresses issue (2) by flushing the tx queues after
> >    every rxq port processing. This reduces the latency with out impacting
> >    the forwarding throughput.
> >
> >       MASTER
> >      --------
> >       Pkt size  min(ns)   avg(ns)   max(ns)
> >        512      4,631      5,022    309,914
> >       1024      5,545      5,749    104,294
> >       1280      5,978      6,159     45,306
> >       1518      6,419      6,774    946,850
> >
> >      MASTER + COMMIT
> >      -----------------
> >       Pkt size  min(ns)   avg(ns)   max(ns)
> >        512      4,711      5,064    182,477
> >       1024      5,601      5,888    701,654
> >       1280      6,018      6,491    533,037
> >       1518      6,467      6,734    312,471
> >
> >    PMDs can be teared down and spawned at runtime and so the rxq and txq
> >    mapping of the PMD threads can change. In few cases packets can get
> >    stuck in the queue due to reconfiguration and this commit helps flush
> >    the queues.
> >
> >    Suggested-by: Eelco Chaudron <echaudro at redhat.com>
> >    Reported-at: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> >2DApril_331039.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09C
> >GX7JQ5Ih-
> >uZnsw&m=qwwXxtIBvUf5cgPbYkcAKwCukS_ZiaeFE6lAdHHaw28&s=H0yNRh-
> >c9pdYHacCzkoruc48Dj_Whkcwcjzv-vta-EI&e=
> >    Signed-off-by: Bhanuprakash Bodireddy
> ><bhanuprakash.bodireddy at intel.com>
> >    Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
> >    Co-authored-by: Antonio Fischetti <antonio.fischetti at intel.com>
> >    Signed-off-by: Markus Magnusson <markus.magnusson at ericsson.com>
> >    Co-authored-by: Markus Magnusson <markus.magnusson at ericsson.com>
> >    Acked-by: Eelco Chaudron <echaudro at redhat.com>
> >    ---
> >     lib/dpif-netdev.c | 52
> >+++++++++++++++++++++++++++++++++++++++++++++++++++-
> >     1 file changed, 51 insertions(+), 1 deletion(-)
> >
> >    diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >    index e2cd931..bfb9650 100644
> >    --- a/lib/dpif-netdev.c
> >    +++ b/lib/dpif-netdev.c
> >    @@ -340,6 +340,7 @@ enum pmd_cycles_counter_type {
> >     };
> >
> >     #define XPS_TIMEOUT_MS 500LL
> >    +#define LAST_USED_QID_NONE -1
> >
> >     /* Contained by struct dp_netdev_port's 'rxqs' member.  */
> >     struct dp_netdev_rxq {
> >    @@ -500,7 +501,13 @@ struct rxq_poll {
> >     struct tx_port {
> >         struct dp_netdev_port *port;
> >         int qid;
> >    -    long long last_used;
> >    +    int last_used_qid;        /* Last queue id where packets got
> >    +                                 enqueued. */
> >    +    long long last_used;      /* In case XPS is enabled, it contains the
> >    +                               * timestamp of the last time the port was
> >    +                               * used by the thread to send data.  After
> >    +                               * XPS_TIMEOUT_MS elapses the qid will be
> >    +                               * marked as -1. */
> >         struct hmap_node node;
> >     };
> >
> >    @@ -3101,6 +3108,25 @@ cycles_count_intermediate(struct
> >dp_netdev_pmd_thread *pmd,
> >         non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> >     }
> >
> >    +static void
> >    +dp_netdev_flush_txq_ports(struct dp_netdev_pmd_thread *pmd)
> >    +{
> >    +    struct tx_port *cached_tx_port;
> >    +    int tx_qid;
> >    +
> >    +    HMAP_FOR_EACH (cached_tx_port, node, &pmd->send_port_cache) {
> >    +        tx_qid = cached_tx_port->last_used_qid;
> >    +
> >    +        if (tx_qid != LAST_USED_QID_NONE) {
> >    +            netdev_txq_flush(cached_tx_port->port->netdev, tx_qid,
> >    +                         cached_tx_port->port->dynamic_txqs);
> >    +
> >    +            /* Queue flushed and mark it empty. */
> >    +            cached_tx_port->last_used_qid = LAST_USED_QID_NONE;
> >    +        }
> >    +    }
> >    +}
> >    +
> >     static int
> >     dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
> >                                struct netdev_rxq *rx,
> >    @@ -3667,6 +3693,9 @@ dpif_netdev_run(struct dpif *dpif)
> >                             dp_netdev_process_rxq_port(non_pmd,
> >                                                        port->rxqs[i].rx,
> >                                                        port->port_no);
> >    +
> >    +                    dp_netdev_flush_txq_ports(non_pmd);
> >    +
> >                         cycles_count_intermediate(non_pmd, process_packets ?
> >                                                            PMD_CYCLES_PROCESSING
> >                                                          : PMD_CYCLES_IDLE);
> >    @@ -3854,6 +3883,8 @@ reload:
> >                 process_packets =
> >                     dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
> >                                                poll_list[i].port_no);
> >    +            dp_netdev_flush_txq_ports(pmd);
> >    +
> >                 cycles_count_intermediate(pmd,
> >                                           process_packets ? PMD_CYCLES_PROCESSING
> >                                                           : PMD_CYCLES_IDLE);
> >    @@ -3879,6 +3910,9 @@ reload:
> >
> >         cycles_count_end(pmd, PMD_CYCLES_IDLE);
> >
> >    +    /* Flush the queues as part of reconfiguration logic. */
> >    +    dp_netdev_flush_txq_ports(pmd);
> >    +
> >         poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> >         exiting = latch_is_set(&pmd->exit_latch);
> >         /* Signal here to make sure the pmd finishes
> >    @@ -4481,6 +4515,7 @@ dp_netdev_add_port_tx_to_pmd(struct
> >dp_netdev_pmd_thread *pmd,
> >
> >         tx->port = port;
> >         tx->qid = -1;
> >    +    tx->last_used_qid = LAST_USED_QID_NONE;
> >
> >         hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port-
> >>port_no));
> >         pmd->need_reload = true;
> >    @@ -5051,6 +5086,14 @@ dpif_netdev_xps_get_tx_qid(const struct
> >dp_netdev_pmd_thread *pmd,
> >
> >         dpif_netdev_xps_revalidate_pmd(pmd, now, false);
> >
> >    +    /* The tx queue can change in XPS case, make sure packets in previous
> >    +     * queue is flushed properly. */
> >    +    if (tx->last_used_qid != LAST_USED_QID_NONE &&
> >    +               tx->qid != tx->last_used_qid) {
> >    +        netdev_txq_flush(port->netdev, tx->last_used_qid, port-
> >>dynamic_txqs);
> >    +        tx->last_used_qid = LAST_USED_QID_NONE;
> >    +    }
> >    +
> >         VLOG_DBG("Core %d: New TX queue ID %d for port \'%s\'.",
> >                  pmd->core_id, tx->qid, netdev_get_name(tx->port->netdev));
> >         return min_qid;
> >    @@ -5146,6 +5189,13 @@ dp_execute_cb(void *aux_, struct
> >dp_packet_batch *packets_,
> >                     tx_qid = pmd->static_tx_qid;
> >                 }
> >
> >    +            /* In case these packets gets buffered into an intermediate
> >    +             * queue and XPS is enabled the flush function could find a
> >    +             * different tx qid assigned to its thread.  We keep track
> >    +             * of the qid we're now using, that will trigger the flush
> >    +             * function and will select the right queue to flush. */
> >    +            p->last_used_qid = tx_qid;
> >    +
> >                 netdev_send(p->port->netdev, tx_qid, packets_, may_steal,
> >                             dynamic_txqs);
> >                 return;
> >    --
> >    2.4.11
> >
> >    _______________________________________________
> >    dev mailing list
> >    dev at openvswitch.org
> >    https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__mail.openvswitch.org_mailman_listinfo_ovs-
> >2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> >uZnsw&m=qwwXxtIBvUf5cgPbYkcAKwCukS_ZiaeFE6lAdHHaw28&s=IXKDcYc2
> >D7SfGkflLokKVfrNafJlFLfT94q366bHHFo&e=
> >



More information about the dev mailing list