[ovs-dev] [PATCH] dpif-netdev: Make datapath port mutex recursive.

Ilya Maximets i.maximets at ovn.org
Thu Jan 16 13:32:06 UTC 2020


On 16.01.2020 13:13, Stokes, Ian wrote:
> 
> 
> On 1/15/2020 5:29 PM, Ilya Maximets wrote:
>> Upcoming HW offloading will request flow statistics from the dpdk
>> offloading module.  This operation requires holding datapath port
>> mutex.  However, there is a possible scenarion in which flow deletion
>> happens during datapath reconfiguration process and the mutex already
>> acquired:
>>
>>    0  raise () from /lib64/libc.so.6
>>    1  abort () from /lib64/libc.so.6
>>    2  ovs_abort_valist ()
>>    3  ovs_abort ()
>>    4  ovs_mutex_lock_at ()
>>    5  dpif_netdev_get_flow_offload_status ()
>>    6  get_dpif_flow_status ()
>>    7  flow_del_on_pmd ()
>>    8  dpif_netdev_flow_del ()
>>    9  dpif_netdev_operate ()
>>    10 dpif_operate ()
>>    11 push_dp_ops ()
>>    12 push_ukey_ops ()
>>    13 dp_purge_cb ()
>>    14 dp_netdev_del_pmd ()
>>    15 reconfigure_pmd_threads ()
>>    16 reconfigure_datapath ()
>>    17 do_del_port ()
>>    18 dpif_netdev_port_del ()
>>    19 dpif_port_del ()
>>    20 port_del ()
>>    21 ofproto_port_del ()
>>    22 bridge_delete_or_reconfigure_ports ()
>>    23 bridge_reconfigure ()
>>    24 bridge_run ()
>>    25 main ()
>>
>> This happens while removing the last port of a particular PMD thread.
>> Reconfiguration process decides that we need to remove current PMD
>> thread and calls datapath purge callback in order to clean up resources
>> assigned to it.  This turns into flow removal and flow_del() tries to
>> request statistics.
>>
>> Turning the dp->mutex into recursive version as a quick fix for this
>> issue.  Better solutions might be to avoid statistics request somehow,
>> or fully disassociate offloaded flows from the datapath flows.
>>
>> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> 
> Thanks for the patch Ilya, makes snse to me as a workaround for the moment. Tested in my own deployment and didnt see any issues.
> 
> Acked.


Thanks.  I don't really like this solution.  Recursive mutexes usually
indicates architectural issues (which we definitely have) and should be
avoided as possible.  Hope, we'll find a way to revert this change in
the future.  For now, applied to master.

Best regards, Ilya Maximets.


More information about the dev mailing list