[ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in intermediate queue.
Jan Scheurich
jan.scheurich at ericsson.com
Fri Aug 11 15:20:52 UTC 2017
Hi Bhanu,
Thanks a lot for all the work you put into making the original hackfest patch good enough for upstreaming in an increasingly complex and ever-evolving DPDK datapath.
I tend to agree that the complexity to achieve thread safety is becoming overwhelming. Ilya's v3 patch set submitted yesterday looks much simpler. I fully support to focus on this one, especially as it already covers time-based tx batching.
We will start testing Ilya's patch now and come back with results/comments next week.
BR, Jan
> -----Original Message-----
> From: Bodireddy, Bhanuprakash [mailto:bhanuprakash.bodireddy at intel.com]
> Sent: Friday, 11 August, 2017 15:12
> To: Darrell Ball <dball at vmware.com>; dev at openvswitch.org; Ilya Maximets <i.maximets at samsung.com>; Jan Scheurich
> <jan.scheurich at ericsson.com>; Eelco Chaudron <echaudro at redhat.com>
> Cc: Ben Pfaff <blp at ovn.org>; Stokes, Ian <ian.stokes at intel.com>; Gray, Mark D <mark.d.gray at intel.com>; Fischetti, Antonio
> <antonio.fischetti at intel.com>; Kevin Traynor <ktraynor at redhat.com>
> Subject: RE: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in intermediate queue.
>
> Hello All,
>
> Adding all the people here who had either reviewed or provided their feedback
> on the batching patches at some stage.
>
> you are already aware that there are 2 different series on ML to implement tx
> batching (netdev layer vs dpif layer) that improves DPDK datapath performance.
> Our output batching is in netdev layer whereas ilya moved it to dpif layer
> and simplified it. Each approach has its own pros and cons and had been
> discussed in earlier threads.
>
> While reviewing v4 of my patch series, ilya detected a race condition that happens
> when the queues in the guest are enabled/disabled at run time. Though we have
> solutions to address this issue and implemented it, I realized that the code complexity
> has increased and changes spanning multiple functions with additional spin locks to
> address this one corner case.
>
> I think, though our patch series has flexibility it has gotten lot complex now and would
> be difficult to maintain in the future. At this stage I would like to lean towards simpler
> solution that's more maintainable which is implemented by Ilya.
>
> I would like to thank Eelco, Darrell, Jan and Ilya for reviewing our series and providing their
> feedback.
>
> Bhanuprakash.
>
> >-----Original Message-----
> >From: Darrell Ball [mailto:dball at vmware.com]
> >Sent: Friday, August 11, 2017 2:03 AM
> >To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>;
> >dev at openvswitch.org
> >Subject: Re: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in
> >intermediate queue.
> >
> >Hi Bhanu
> >
> >Given that you ultimately intend changes beyond those in this patch, would it
> >make sense just to fold the follow up series (at least, the key elements) into
> >this series, essentially expanding on this patch 5 ?
> >
> >Thanks Darrell
> >
> >-----Original Message-----
> >From: <ovs-dev-bounces at openvswitch.org> on behalf of Bhanuprakash
> >Bodireddy <bhanuprakash.bodireddy at intel.com>
> >Date: Tuesday, August 8, 2017 at 10:06 AM
> >To: "dev at openvswitch.org" <dev at openvswitch.org>
> >Subject: [ovs-dev] [PATCH v4 5/5] dpif-netdev: Flush the packets in
> > intermediate queue.
> >
> > Under low rate traffic conditions, there can be 2 issues.
> > (1) Packets potentially can get stuck in the intermediate queue.
> > (2) Latency of the packets can increase significantly due to
> > buffering in intermediate queue.
> >
> > This commit handles the (1) issue by flushing the tx port queues using
> > dp_netdev_flush_txq_ports() as part of PMD packet processing loop.
> > Also this commit addresses issue (2) by flushing the tx queues after
> > every rxq port processing. This reduces the latency with out impacting
> > the forwarding throughput.
> >
> > MASTER
> > --------
> > Pkt size min(ns) avg(ns) max(ns)
> > 512 4,631 5,022 309,914
> > 1024 5,545 5,749 104,294
> > 1280 5,978 6,159 45,306
> > 1518 6,419 6,774 946,850
> >
> > MASTER + COMMIT
> > -----------------
> > Pkt size min(ns) avg(ns) max(ns)
> > 512 4,711 5,064 182,477
> > 1024 5,601 5,888 701,654
> > 1280 6,018 6,491 533,037
> > 1518 6,467 6,734 312,471
> >
> > PMDs can be teared down and spawned at runtime and so the rxq and txq
> > mapping of the PMD threads can change. In few cases packets can get
> > stuck in the queue due to reconfiguration and this commit helps flush
> > the queues.
> >
> > Suggested-by: Eelco Chaudron <echaudro at redhat.com>
> > Reported-at: https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> >2DApril_331039.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09C
> >GX7JQ5Ih-
> >uZnsw&m=qwwXxtIBvUf5cgPbYkcAKwCukS_ZiaeFE6lAdHHaw28&s=H0yNRh-
> >c9pdYHacCzkoruc48Dj_Whkcwcjzv-vta-EI&e=
> > Signed-off-by: Bhanuprakash Bodireddy
> ><bhanuprakash.bodireddy at intel.com>
> > Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
> > Co-authored-by: Antonio Fischetti <antonio.fischetti at intel.com>
> > Signed-off-by: Markus Magnusson <markus.magnusson at ericsson.com>
> > Co-authored-by: Markus Magnusson <markus.magnusson at ericsson.com>
> > Acked-by: Eelco Chaudron <echaudro at redhat.com>
> > ---
> > lib/dpif-netdev.c | 52
> >+++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 51 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index e2cd931..bfb9650 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -340,6 +340,7 @@ enum pmd_cycles_counter_type {
> > };
> >
> > #define XPS_TIMEOUT_MS 500LL
> > +#define LAST_USED_QID_NONE -1
> >
> > /* Contained by struct dp_netdev_port's 'rxqs' member. */
> > struct dp_netdev_rxq {
> > @@ -500,7 +501,13 @@ struct rxq_poll {
> > struct tx_port {
> > struct dp_netdev_port *port;
> > int qid;
> > - long long last_used;
> > + int last_used_qid; /* Last queue id where packets got
> > + enqueued. */
> > + long long last_used; /* In case XPS is enabled, it contains the
> > + * timestamp of the last time the port was
> > + * used by the thread to send data. After
> > + * XPS_TIMEOUT_MS elapses the qid will be
> > + * marked as -1. */
> > struct hmap_node node;
> > };
> >
> > @@ -3101,6 +3108,25 @@ cycles_count_intermediate(struct
> >dp_netdev_pmd_thread *pmd,
> > non_atomic_ullong_add(&pmd->cycles.n[type], interval);
> > }
> >
> > +static void
> > +dp_netdev_flush_txq_ports(struct dp_netdev_pmd_thread *pmd)
> > +{
> > + struct tx_port *cached_tx_port;
> > + int tx_qid;
> > +
> > + HMAP_FOR_EACH (cached_tx_port, node, &pmd->send_port_cache) {
> > + tx_qid = cached_tx_port->last_used_qid;
> > +
> > + if (tx_qid != LAST_USED_QID_NONE) {
> > + netdev_txq_flush(cached_tx_port->port->netdev, tx_qid,
> > + cached_tx_port->port->dynamic_txqs);
> > +
> > + /* Queue flushed and mark it empty. */
> > + cached_tx_port->last_used_qid = LAST_USED_QID_NONE;
> > + }
> > + }
> > +}
> > +
> > static int
> > dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
> > struct netdev_rxq *rx,
> > @@ -3667,6 +3693,9 @@ dpif_netdev_run(struct dpif *dpif)
> > dp_netdev_process_rxq_port(non_pmd,
> > port->rxqs[i].rx,
> > port->port_no);
> > +
> > + dp_netdev_flush_txq_ports(non_pmd);
> > +
> > cycles_count_intermediate(non_pmd, process_packets ?
> > PMD_CYCLES_PROCESSING
> > : PMD_CYCLES_IDLE);
> > @@ -3854,6 +3883,8 @@ reload:
> > process_packets =
> > dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
> > poll_list[i].port_no);
> > + dp_netdev_flush_txq_ports(pmd);
> > +
> > cycles_count_intermediate(pmd,
> > process_packets ? PMD_CYCLES_PROCESSING
> > : PMD_CYCLES_IDLE);
> > @@ -3879,6 +3910,9 @@ reload:
> >
> > cycles_count_end(pmd, PMD_CYCLES_IDLE);
> >
> > + /* Flush the queues as part of reconfiguration logic. */
> > + dp_netdev_flush_txq_ports(pmd);
> > +
> > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
> > exiting = latch_is_set(&pmd->exit_latch);
> > /* Signal here to make sure the pmd finishes
> > @@ -4481,6 +4515,7 @@ dp_netdev_add_port_tx_to_pmd(struct
> >dp_netdev_pmd_thread *pmd,
> >
> > tx->port = port;
> > tx->qid = -1;
> > + tx->last_used_qid = LAST_USED_QID_NONE;
> >
> > hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port-
> >>port_no));
> > pmd->need_reload = true;
> > @@ -5051,6 +5086,14 @@ dpif_netdev_xps_get_tx_qid(const struct
> >dp_netdev_pmd_thread *pmd,
> >
> > dpif_netdev_xps_revalidate_pmd(pmd, now, false);
> >
> > + /* The tx queue can change in XPS case, make sure packets in previous
> > + * queue is flushed properly. */
> > + if (tx->last_used_qid != LAST_USED_QID_NONE &&
> > + tx->qid != tx->last_used_qid) {
> > + netdev_txq_flush(port->netdev, tx->last_used_qid, port-
> >>dynamic_txqs);
> > + tx->last_used_qid = LAST_USED_QID_NONE;
> > + }
> > +
> > VLOG_DBG("Core %d: New TX queue ID %d for port \'%s\'.",
> > pmd->core_id, tx->qid, netdev_get_name(tx->port->netdev));
> > return min_qid;
> > @@ -5146,6 +5189,13 @@ dp_execute_cb(void *aux_, struct
> >dp_packet_batch *packets_,
> > tx_qid = pmd->static_tx_qid;
> > }
> >
> > + /* In case these packets gets buffered into an intermediate
> > + * queue and XPS is enabled the flush function could find a
> > + * different tx qid assigned to its thread. We keep track
> > + * of the qid we're now using, that will trigger the flush
> > + * function and will select the right queue to flush. */
> > + p->last_used_qid = tx_qid;
> > +
> > netdev_send(p->port->netdev, tx_qid, packets_, may_steal,
> > dynamic_txqs);
> > return;
> > --
> > 2.4.11
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://urldefense.proofpoint.com/v2/url?u=https-
> >3A__mail.openvswitch.org_mailman_listinfo_ovs-
> >2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> >uZnsw&m=qwwXxtIBvUf5cgPbYkcAKwCukS_ZiaeFE6lAdHHaw28&s=IXKDcYc2
> >D7SfGkflLokKVfrNafJlFLfT94q366bHHFo&e=
> >
More information about the dev
mailing list