[ovs-dev] [PATCH] dpif-netdev: Enter quiescent state after each offloading operation.

Eli Britstein elibr at mellanox.com
Sun Feb 23 14:32:38 UTC 2020


On 2/21/2020 4:54 PM, Ilya Maximets wrote:
> If the offloading queue is big and filled continuously, offloading
> thread may have no chance to quiesce blocking rcu callbacks and
> other threads waiting for synchronization.
>
> Fix that by entering momentary quiescent state after each operation
> since we're not holding any rcu-protected memory here.
>
> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a thread")
> Reported-by: Eli Britstein <elibr at mellanox.com>
> Reported-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&data=02%7C01%7Celibr%40mellanox.com%7Cd9f2868bb21d43721b0408d7b6ddedd5%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637178936610286432&sdata=I1qteh%2FJ6QHMn593P1uJOvWqbrumsfthYdgZ1Me96Fo%3D&reserved=0
> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> ---
>   lib/dpif-netdev.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d393aab5e..a798db45d 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data OVS_UNUSED)
>           VLOG_DBG("%s to %s netdev flow\n",
>                    ret == 0 ? "succeed" : "failed", op);
>           dp_netdev_free_flow_offload(offload);
> +        ovsrcu_quiesce();

This seems to solve the issue of responsiveness, but I have encountered 
a crash using it, while there is a lot of flow deletions.

#0  0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500) at 
lib/unaligned.h:86
#1  0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16, basis=0) 
at lib/hash.c:38
#2  0x0000000001f7f836 in ufid_to_rte_flow_data_find 
(ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
#3  0x0000000001f88693 in netdev_offload_dpdk_flow_del 
(netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at 
lib/netdev-offload-dpdk.c:3361
#4  0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0, 
ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296

In other trials, I see the ovs-appctl stuck again, but then I cannot 
attach gdb (or use pstack). It hangs. In this scenario, I see this in dmesg:

[  404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip 
0000000001e463c3 sp 00007f8cfaf33028 error 4 in ovs-vswitchd[400000+200c000]

Yanqin Wei <Yanqin.Wei at arm.com> suggested to add ovsrcu_try_quiesce call 
(in the same place). It seems more stable (haven't see such crash as 
above), but it has the same stuck symptom as above.

>       }
>   
>       return NULL;


More information about the dev mailing list