[ovs-dev] [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval in PMD main loop.

Fri Oct 4 17:13:13 UTC 2019

Regarding my own part, usually I do not review dpif-netdev patches.

On Fri, Oct 04, 2019 at 05:35:58AM +0000, Nitin Katiyar wrote:
> Hi,
> Can you please review the following patch and provide the feedback?
> 
> Regards,
> Nitin
> 
> > -----Original Message-----
> > From: Nitin Katiyar
> > Sent: Wednesday, August 28, 2019 2:00 PM
> > To: ovs-dev at openvswitch.org
> > Cc: Anju Thomas <anju.thomas at ericsson.com>; Ilya Maximets
> > <i.maximets at samsung.com>
> > Subject: RE: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed interval
> > in PMD main loop.
> > 
> > Hi,
> > Please provide your feedback.
> > 
> > Regards,
> > Nitin
> > 
> > > -----Original Message-----
> > > From: Nitin Katiyar
> > > Sent: Thursday, August 22, 2019 10:24 PM
> > > To: ovs-dev at openvswitch.org
> > > Cc: Nitin Katiyar <nitin.katiyar at ericsson.com>; Anju Thomas
> > > <anju.thomas at ericsson.com>
> > > Subject: [PATCH V2] dpif-netdev: Do RCU synchronization at fixed
> > > interval in PMD main loop.
> > >
> > > Each PMD updates the global sequence number for RCU synchronization
> > > purpose with other OVS threads. This is done at every 1025th iteration
> > > in PMD main loop.
> > >
> > > If the PMD thread is responsible for polling large number of queues
> > > that are carrying traffic, it spends a lot of time processing packets
> > > and this results in significant delay in performing the housekeeping
> > activities.
> > >
> > > If the OVS main thread is waiting to synchronize with the PMD threads
> > > and if those threads delay performing housekeeping activities for more
> > > than 3 sec then LACP processing will be impacted and it will lead to
> > > LACP flaps. Similarly, other controls protocols run by OVS main thread are
> > impacted.
> > >
> > > For e.g. a PMD thread polling 200 ports/queues with average of 1600
> > > processing cycles per packet with batch size of 32 may take 10240000
> > > (200 * 1600 * 32) cycles per iteration. In system with 2.0 GHz CPU it
> > > means more than 5 ms per iteration. So, for 1024 iterations to
> > > complete it would be more than 5 seconds.
> > >
> > > This gets worse when there are PMD threads which are less loaded.
> > > It reduces possibility of getting mutex lock in ovsrcu_try_quiesce()
> > > by heavily loaded PMD and next attempt to quiesce would be after 1024
> > iterations.
> > >
> > > With this patch, PMD RCU synchronization will be performed after fixed
> > > interval instead after a fixed number of iterations. This will ensure
> > > that even if the packet processing load is high the RCU
> > > synchronization will not be delayed long.
> > >
> > > Co-authored-by: Anju Thomas <anju.thomas at ericsson.com>
> > > Signed-off-by: Anju Thomas <anju.thomas at ericsson.com>
> > > Signed-off-by: Nitin Katiyar <nitin.katiyar at ericsson.com>
> > > ---
> > >  lib/dpif-netdev.c | 22 ++++++++++++++++++++++
> > >  1 file changed, 22 insertions(+)
> > >
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > > d0a1c58..63b6cb9
> > > 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -228,6 +228,9 @@ struct dfc_cache {
> > >   * and used during rxq to pmd assignment. */  #define
> > > PMD_RXQ_INTERVAL_MAX 6
> > >
> > > +/* Time in microseconds to try RCU quiescing. */ #define
> > > +PMD_RCU_QUIESCE_INTERVAL 10000LL
> > > +
> > >  struct dpcls {
> > >      struct cmap_node node;      /* Within dp_netdev_pmd_thread.classifiers
> > > */
> > >      odp_port_t in_port;
> > > @@ -751,6 +754,9 @@ struct dp_netdev_pmd_thread {
> > >
> > >      /* Set to true if the pmd thread needs to be reloaded. */
> > >      bool need_reload;
> > > +
> > > +    /* Next time when PMD should try RCU quiescing. */
> > > +    long long next_rcu_quiesce;
> > >  };
> > >
> > >  /* Interface to netdev-based datapath. */ @@ -5486,6 +5492,8 @@
> > reload:
> > >      pmd->intrvl_tsc_prev = 0;
> > >      atomic_store_relaxed(&pmd->intrvl_cycles, 0);
> > >      cycles_counter_update(s);
> > > +
> > > +    pmd->next_rcu_quiesce = pmd->ctx.now +
> > > PMD_RCU_QUIESCE_INTERVAL;
> > >      /* Protect pmd stats from external clearing while polling. */
> > >      ovs_mutex_lock(&pmd->perf_stats.stats_mutex);
> > >      for (;;) {
> > > @@ -5493,6 +5501,17 @@ reload:
> > >
> > >          pmd_perf_start_iteration(s);
> > >
> > > +        /* Do RCU synchronization at fixed interval instead of doing it
> > > +         * at fixed number of iterations. This ensures that synchronization
> > > +         * would not be delayed long even at high load of packet
> > > +         * processing. */
> > > +        if (pmd->ctx.now > pmd->next_rcu_quiesce) {
> > > +            if (!ovsrcu_try_quiesce()) {
> > > +                pmd->next_rcu_quiesce =
> > > +                    pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
> > > +            }
> > > +        }
> > > +
> > >          for (i = 0; i < poll_cnt; i++) {
> > >
> > >              if (!poll_list[i].rxq_enabled) { @@ -5527,6 +5546,8 @@ reload:
> > >              dp_netdev_pmd_try_optimize(pmd, poll_list, poll_cnt);
> > >              if (!ovsrcu_try_quiesce()) {
> > >                  emc_cache_slow_sweep(&((pmd->flow_cache).emc_cache));
> > > +                pmd->next_rcu_quiesce =
> > > +                    pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
> > >              }
> > >
> > >              for (i = 0; i < poll_cnt; i++) { @@ -5986,6 +6007,7 @@
> > > dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct
> > > dp_netdev *dp,
> > >      pmd->ctx.last_rxq = NULL;
> > >      pmd_thread_ctx_time_update(pmd);
> > >      pmd->next_optimization = pmd->ctx.now +
> > > DPCLS_OPTIMIZATION_INTERVAL;
> > > +    pmd->next_rcu_quiesce = pmd->ctx.now +
> > > PMD_RCU_QUIESCE_INTERVAL;
> > >      pmd->rxq_next_cycle_store = pmd->ctx.now +
> > PMD_RXQ_INTERVAL_LEN;
> > >      hmap_init(&pmd->poll_list);
> > >      hmap_init(&pmd->tx_ports);
> > > --
> > > 1.9.1
>