[ovs-dev] [PATCH v2] dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.

Ilya Maximets i.maximets at ovn.org
Fri Jul 17 01:56:45 UTC 2020


On 7/15/20 6:00 PM, Ilya Maximets wrote:
> Main thread will try to pause/stop all revalidators during datapath
> reconfiguration via datapath purge callback (dp_purge_cb) while
> holding 'dp->port_mutex'.  And deadlock happens in case any of
> revalidator threads is already waiting on 'dp->port_mutex' while
> dumping offloaded flows:
> 
>            main thread                           revalidator
>  ---------------------------------  ----------------------------------
> 
>  ovs_mutex_lock(&dp->port_mutex)
> 
>                                     dpif_netdev_flow_dump_next()
>                                     -> dp_netdev_flow_to_dpif_flow
>                                     -> get_dpif_flow_status
>                                     -> dpif_netdev_get_flow_offload_status()
>                                     -> ovs_mutex_lock(&dp->port_mutex)
>                                        <waiting for mutex here>
> 
>  reconfigure_datapath()
>  -> reconfigure_pmd_threads()
>  -> dp_netdev_del_pmd()
>  -> dp_purge_cb()
>  -> udpif_pause_revalidators()
>  -> ovs_barrier_block(&udpif->pause_barrier)
>     <waiting for revalidators to reach barrier>
> 
>                           <DEADLOCK>
> 
> We're not allowed to call offloading API without holding global
> port mutex from the userspace datapath due to thread safety
> restrictions on netdev-offload-dpdk module.  And it's also not easy
> to rework datapath reconfiguration process in order to move actual
> PMD removal and datapath purge out of the port mutex.
> 
> So, for now, not sleeping on a mutex if it's not immediately available
> seem like an easiest workaround.  This will have impact on flow
> statistics update rate and on ability to get the latest statistics
> before removing the flow (latest stats will be lost in case we were
> not able to take the mutex).  However, this will allow us to operate
> normally avoiding the deadlock.
> 
> The last bit is that to avoid flapping of flow attributes and
> statistics we're not failing the operation, but returning last
> statistics and attributes returned by offload provider.  Since those
> might be updated in different threads, stores and reads are atomic.
> 
> Reported-by: Frank Wang (王培辉) <wangpeihui at inspur.com>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> ---
> 
> Version 2:
>   - Storing error code from netdev_flow_get() to mimic the actual result
>     of the last call.
> 

Thanks, everyone!
Applied and backported to branch-2.13.

Best regards, Ilya Maximets.


More information about the dev mailing list