[ovs-dev] [PATCH] dpif-netdev: End the quiescent state for flow offloading thread.

Stokes, Ian ian.stokes at intel.com
Fri Nov 2 10:52:39 UTC 2018


> On 01.11.2018 3:30, Flavio Leitner wrote:
> > On Wed, Oct 31, 2018 at 06:44:09PM +0300, Ilya Maximets wrote:
> >> Flow offloading thread uses concurrent hash maps which are based on
> >> rcu protected variables. It must use them while in active state.
> >> Working in a quiescent state could cause segmentation faults because
> >> of possible cmap internal structure changes.
> >>
> >> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
> >> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
> >> ---
> >>  lib/dpif-netdev.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> >> 5839f2375..8136b0389 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -2396,6 +2396,7 @@ dp_netdev_flow_offload_main(void *data
> OVS_UNUSED)
> >>              ovsrcu_quiesce_start();
> >>              ovs_mutex_cond_wait(&dp_flow_offload.cond,
> >>                                  &dp_flow_offload.mutex);
> >> +            ovsrcu_quiesce_end();
> >>          }
> >>          list = ovs_list_pop_front(&dp_flow_offload.list);
> >>          offload = CONTAINER_OF(list, struct dp_flow_offload_item,
> >> node);
> >
> > Indeed, this is very similar to system_stats_thread_func().
> > Must have been an interesting journey to find the root cause if not by
> > code inspection.
> 
> Yeah. Bugs like this are really tricky. We had one with quiescent state
> while iterating and deleting pmd threads a few years ago. Was fixed by
> Daniele.
> Fortunately, current one was found by inspection.
> 

Good catch, I'll add this to the pull request today and backport to 2.10.

Thanks
Ian
> >
> > Acked-by: Flavio Leitner <fbl at sysclose.org>
> >
> > fbl
> >
> >
> >


More information about the dev mailing list