[ovs-dev] [PATCH v2] dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.
Ilya Maximets
i.maximets at ovn.org
Fri Jul 17 01:56:45 UTC 2020
On 7/15/20 6:00 PM, Ilya Maximets wrote:
> Main thread will try to pause/stop all revalidators during datapath
> reconfiguration via datapath purge callback (dp_purge_cb) while
> holding 'dp->port_mutex'. And deadlock happens in case any of
> revalidator threads is already waiting on 'dp->port_mutex' while
> dumping offloaded flows:
>
> main thread revalidator
> --------------------------------- ----------------------------------
>
> ovs_mutex_lock(&dp->port_mutex)
>
> dpif_netdev_flow_dump_next()
> -> dp_netdev_flow_to_dpif_flow
> -> get_dpif_flow_status
> -> dpif_netdev_get_flow_offload_status()
> -> ovs_mutex_lock(&dp->port_mutex)
> <waiting for mutex here>
>
> reconfigure_datapath()
> -> reconfigure_pmd_threads()
> -> dp_netdev_del_pmd()
> -> dp_purge_cb()
> -> udpif_pause_revalidators()
> -> ovs_barrier_block(&udpif->pause_barrier)
> <waiting for revalidators to reach barrier>
>
> <DEADLOCK>
>
> We're not allowed to call offloading API without holding global
> port mutex from the userspace datapath due to thread safety
> restrictions on netdev-offload-dpdk module. And it's also not easy
> to rework datapath reconfiguration process in order to move actual
> PMD removal and datapath purge out of the port mutex.
>
> So, for now, not sleeping on a mutex if it's not immediately available
> seem like an easiest workaround. This will have impact on flow
> statistics update rate and on ability to get the latest statistics
> before removing the flow (latest stats will be lost in case we were
> not able to take the mutex). However, this will allow us to operate
> normally avoiding the deadlock.
>
> The last bit is that to avoid flapping of flow attributes and
> statistics we're not failing the operation, but returning last
> statistics and attributes returned by offload provider. Since those
> might be updated in different threads, stores and reads are atomic.
>
> Reported-by: Frank Wang (王培辉) <wangpeihui at inspur.com>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
> Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> ---
>
> Version 2:
> - Storing error code from netdev_flow_get() to mimic the actual result
> of the last call.
>
Thanks, everyone!
Applied and backported to branch-2.13.
Best regards, Ilya Maximets.
More information about the dev
mailing list