[ovs-dev] [PATCH] dpif-netdev: Enter quiescent state after each offloading operation.
Eli Britstein
elibr at mellanox.com
Wed Mar 4 15:51:34 UTC 2020
On 3/3/2020 4:44 PM, Eli Britstein wrote:
>
> On 2/28/2020 11:03 AM, Ilya Maximets wrote:
>> On 2/27/20 10:00 AM, Eli Britstein wrote:
>>> On 2/26/2020 2:31 PM, Ilya Maximets wrote:
>>>> On 2/26/20 1:05 PM, Eli Britstein wrote:
>>>>> On 2/24/2020 1:06 PM, Ilya Maximets wrote:
>>>>>> On 2/23/20 3:32 PM, Eli Britstein wrote:
>>>>>>> On 2/21/2020 4:54 PM, Ilya Maximets wrote:
>>>>>>>> If the offloading queue is big and filled continuously, offloading
>>>>>>>> thread may have no chance to quiesce blocking rcu callbacks and
>>>>>>>> other threads waiting for synchronization.
>>>>>>>>
>>>>>>>> Fix that by entering momentary quiescent state after each
>>>>>>>> operation
>>>>>>>> since we're not holding any rcu-protected memory here.
>>>>>>>>
>>>>>>>> Fixes: 02bb2824e51d ("dpif-netdev: do hw flow offload in a
>>>>>>>> thread")
>>>>>>>> Reported-by: Eli Britstein <elibr at mellanox.com>
>>>>>>>> Reported-at:
>>>>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-discuss%2F2020-February%2F049768.html&data=02%7C01%7Celibr%40mellanox.com%7Ceb2c70875cb14a4d4b4108d7bc2d2676%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637184774411852418&sdata=8YgXyAJ248Mqhiq4tbUcAb357Dqdu1ue%2BBXp%2F6pX6tY%3D&reserved=0
>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
>>>>>>>> ---
>>>>>>>> lib/dpif-netdev.c | 1 +
>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>> index d393aab5e..a798db45d 100644
>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>> @@ -2512,6 +2512,7 @@ dp_netdev_flow_offload_main(void *data
>>>>>>>> OVS_UNUSED)
>>>>>>>> VLOG_DBG("%s to %s netdev flow\n",
>>>>>>>> ret == 0 ? "succeed" : "failed", op);
>>>>>>>> dp_netdev_free_flow_offload(offload);
>>>>>>>> + ovsrcu_quiesce();
Acked-by: Eli Britstein <elibr at mellanox.com>
>>>>>>> This seems to solve the issue of responsiveness, but I have
>>>>>>> encountered a crash using it, while there is a lot of flow
>>>>>>> deletions.
>>>>>>>
>>>>>>> #0 0x0000000001e462dc in get_unaligned_u32 (p=0x7f57f7813500)
>>>>>>> at lib/unaligned.h:86
>>>>>>> #1 0x0000000001e46388 in hash_bytes (p_=0x7f57f7813500, n=16,
>>>>>>> basis=0) at lib/hash.c:38
>>>>>>> #2 0x0000000001f7f836 in ufid_to_rte_flow_data_find
>>>>>>> (ufid=0x7f57f7813500) at lib/netdev-offload-dpdk.c:134
>>>>>>> #3 0x0000000001f88693 in netdev_offload_dpdk_flow_del
>>>>>>> (netdev=0x42260da0, ufid=0x7f57f7813500, stats=0x0) at
>>>>>>> lib/netdev-offload-dpdk.c:3361
>>>>>>> #4 0x0000000001e6a00d in netdev_flow_del (netdev=0x42260da0,
>>>>>>> ufid=0x7f57f7813500, stats=0x0) at lib/netdev-offload.c:296
>>>>>> This might mean 2 things:
>>>>>> 1. Incorrect usage of RCU protected data inside netdev-offload-dpdk.
>>>>>> 2. We're not allowed to quiesce if we have any item in offloading
>>>>>> queue,
>>>>>> i.e. bad RCU usage scheme in dpif-netdev.
>>>>>>
>>>>>> Both cases are bad and I'm not sure if we can easily fix case #2.
>>>>>> I'll take a look and try to find a root cause.
>>>>> It was my bad. Sorry. This is another issue, that exists in my
>>>>> branch only (CT offloads), so it's not related. I was able to fix it.
>>>> OK. Good to know.
>>>>
>>>>> However, I wanted to reproduce with master branch, but then I
>>>>> cannot create a lot of flows as I get: "upcall: datapath flow
>>>>> limit reached".
>>>>>
>>>>> I tried setting large number using "ovs-appctl
>>>>> upcall/set-flow-limit 400000", but it didn't help.
>>>> Number of flows in dpif-netdev is artificially limited by MAX_FLOWS
>>>> (65536)
>>>> value. There was a recent discussion about this limit and I think
>>>> we could
>>>> just remove it. I have a small patch for this, will send soon.
>>>> For now, you can just raise the hardcoded value of MAX_FLOWS in
>>>> dpif-netdev.c.
>>> The issue is not in dpif-netdev.c. It's in
>>> ofproto/ofproto-dpif-upcall.c. If I'm not wrong, when it happens,
>>> there is not attempt to install the flow, so anyway MAX_FLOWS in
>>> dpif-netdev.c is not the blocker.
>> There are couple of things that controls the maximum number of
>> datapath flows.
>> First is the other_config:flow-limit that defaults to 200000. Second
>> is the
>> datapath internal MAX_FLOWS define. And also, "ovs-appctl
>> upcall/set-flow-limit"
>> should be able to modify this value. Sounds strange.
>>
>>> BTW, my setup is the PF and VF rep in a bridge. I run testpmd with
>>> macswap on the VF, and send 10k different src MAC packets (OpenFlow
>>> configuration is NORMAL). So, I expect total 20k flows, also less
>>> than 64k of MAX_FLOWS.
>> Are you sure that packets are not corrupted? Could you dump datapath
>> flows
>> and verify that these flows are what you've expected?
>
> This is for example for 4 different MACs
> (e4:10:00:00:00:01-e4:10:00:00:00:04). As expected:
>
> $ ovs-appctl dpctl/dump-flows -m
> flow-dump from pmd on cpu core: 8
> ufid:65aae19f-06a9-4887-8d2d-60b04187f7cd,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:11316766, bytes:1244844260, used:0.013s, offloaded:yes,
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
> ufid:a0e4b9bc-8973-41d1-b2cf-29f2a14935d8,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:03,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:109476803, bytes:12042448330, used:0.013s, offloaded:yes,
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:bbb7fe49-422b-434c-bd70-32d6a90c44c4,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:03),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:11147160, bytes:1226187600, used:0.013s, offloaded:yes,
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
> ufid:bebe7de1-b2d6-48d6-87aa-3fab025a793d,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:02,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:110643486, bytes:12170783460, used:0.013s, offloaded:yes,
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:24e09efe-f02c-41d5-a7c1-3d0aea28cbe0,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:02),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:11304574, bytes:1243503140, used:0.013s, offloaded:yes,
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
> ufid:4fba2205-37d8-42a1-8899-57bdbbc1c539,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:01,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:110644871, bytes:12170935810, used:0.013s, offloaded:yes,
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:99848c40-a8cd-4732-b520-30a27010964b,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(pf),packet_type(ns=0,id=0),eth(src=e4:10:00:00:00:04,dst=e4:11:00:00:00:01),eth_type(0x0800),ipv4(src=1.1.1.2/0.0.0.0,dst=1.1.1.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:110644198, bytes:12170861780, used:0.013s, offloaded:yes,
> dp:dpdk, actions:vm1, dp-extra-info:miniflow_bits(5,1)
> ufid:293e5fa1-cce6-41d2-9b3c-db01105f3cc2,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(vm1),packet_type(ns=0,id=0),eth(src=e4:11:00:00:00:01,dst=e4:10:00:00:00:04),eth_type(0x0800),ipv4(src=1.1.1.1/0.0.0.0,dst=1.1.1.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=1025/0,dst=1025/0),
> packets:11327727, bytes:1246049970, used:0.013s, offloaded:yes,
> dp:dpdk, actions:pf, dp-extra-info:miniflow_bits(5,1)
This issue is resolved by enlarging the MAC learning number:
ovs-vsctl set bridge br-int other_config:mac-table-size=200000
>
>>
>>>>>>> In other trials, I see the ovs-appctl stuck again, but then I
>>>>>>> cannot attach gdb (or use pstack). It hangs. In this scenario, I
>>>>>>> see this in dmesg:
>>>>>>>
>>>>>>> [ 404.450694] ovs-vswitchd[8574]: segfault at 7f8bd4e87280 ip
>>>>>>> 0000000001e463c3 sp 00007f8cfaf33028 error 4 in
>>>>>>> ovs-vswitchd[400000+200c000]
>>>>>>>
>>>>>>> Yanqin Wei <Yanqin.Wei at arm.com> suggested to add
>>>>>>> ovsrcu_try_quiesce call (in the same place). It seems more
>>>>>>> stable (haven't see such crash as above), but it has the same
>>>>>>> stuck symptom as above.
>>>>>> ovsrcu_try_quiesce() is the same as ovsrcu_quiesce() except that
>>>>>> is fails
>>>>>> if mutex can not be taken right now. There is no real
>>>>>> difference. Your
>>>>>> crashes are less frequent just because thread enters quiescent
>>>>>> state less
>>>>>> frequently.
>>>>> If so, I think "ovsrcu_try_quiesce" is more suitable here. Don't
>>>>> you think?
>>>> I don't think that taking of one more mutex could make harm in
>>>> offloading thread. If we'll want to care about minor sleeping
>>>> spikes, we should start from replacing the dp_flow_offload.mutex
>>>> and sleeping on the conditional variable.
>>> Agree it's not critical. However, if it's not mandatory, I think
>>> it's better to avoid it.
>> Since it's not critical, I think it's better to not block RCU callbacks
>> and other threads that waits for RCU synchronization.
>
> I am not clear. I thought ovsrcu_try_quiesce is more lightweight than
> ovsrcu_quiesce. Now I see in the code there is indeed a lock there
> (seq_try_lock()).
>
> So, what's the logic behind your previous comment:
>
> Your
> crashes are less frequent just because thread enters quiescent state less
> frequently.
>
> Is it because of that lock in ovsrcu_try_quiesce the thread enters
> quiescent less frequently? So actually ovsrcu_try_quiesce is heavier
> than ovsrcu_quiesce.
>
>>
>>> In addition I think of making it multi-threaded (thread per port) to
>>> boost insertion/deletion rates.
>>>
>> This will not give any performance unless you've found a good way how to
>> avoid taking of global dp->port_mutex. Also, "per port" seems too much.
> Of course. Need to find a way not to have a global lock. Regarding
> "per-port", maybe not a thread per-port, but per-port request queue.
> The number of threads can maybe configurable, each handling several
> queues.
More information about the dev
mailing list