[ovs-dev] [PATCH ovn v2 0/9] Incremental processing for flow installation.

Ilya Maximets i.maximets at ovn.org
Mon Sep 14 16:54:16 UTC 2020


On 9/14/20 6:45 PM, Dumitru Ceara wrote:
> On 9/14/20 6:36 PM, Han Zhou wrote:
>>
>>
>> On Mon, Sep 14, 2020 at 6:04 AM Dumitru Ceara <dceara at redhat.com
>> <mailto:dceara at redhat.com>> wrote:
>>>
>>> On 9/11/20 9:16 PM, Dumitru Ceara wrote:
>>>> On 9/11/20 8:39 PM, Han Zhou wrote:
>>>>>
>>>>>
>>>>> On Fri, Sep 11, 2020 at 6:17 AM Dumitru Ceara <dceara at redhat.com
>> <mailto:dceara at redhat.com>
>>>>> <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>> wrote:
>>>>>>
>>>>>> On 9/9/20 11:13 PM, Han Zhou wrote:
>>>>>>> On Wed, Sep 9, 2020 at 6:39 AM Mark Michelson
>> <mmichels at redhat.com <mailto:mmichels at redhat.com>
>>>>> <mailto:mmichels at redhat.com <mailto:mmichels at redhat.com>>> wrote:
>>>>>>>>
>>>>>>>> Thanks for addressing my concerns on patch 9.
>>>>>>>>
>>>>>>>> Acked-by: Mark Michelson <mmichels at redhat.com
>> <mailto:mmichels at redhat.com>
>>>>> <mailto:mmichels at redhat.com <mailto:mmichels at redhat.com>>>
>>>>>>>>
>>>>>>>> Warning: I'm also ACKing Numan's patch series that also adds some
>>>>>>>> lflow-level caching (but he is caching expressions). So whoever
>> loses
>>>>>>>> this merge race may have some work to do in order to apply their
>> patch
>>>>>>>> cleanly.
>>>>>>>>
>>>>>>> Thanks Mark for the review, and the warning :).
>>>>>>> I applied the series to master, and also the first 4 patches to
>>>>>>> branch-20.06 and patch 1,2,4 to branch-20.03, to fix the conjunction
>>>>> flow
>>>>>>> bug in those branches.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Han
>>>>>>>
>>>>>>
>>>>>> Hi Han,
>>>>>>
>>>>>> This seems to cause an assertion failure in ovn-controller
>> compiled with
>>>>>> today's OVS/OVN master branches. I'm not completely sure about the
>> steps
>>>>>> to reproduce because I was testing some other ovn-k8s related
>> things but
>>>>>> I did get a core dump and the NB/DB/conf DBs.
>>>>>>
>>>>>> I opened a RedHat BZ [0] (more for my tracking than anything else)
>> and I
>>>>>> shared there the OVS/OVN revisions and how to build the exact same
>> image
>>>>>> that I was running when I hit the crash.
>>>>>>
>>>>>> I'll try to investigate what's happening but you're probably more
>>>>>> familiar with the code so any hints would be great.
>>>>>>
>>>>>> Thanks,
>>>>>> Dumitru
>>>>>>
>>>>>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1878139
>>>>>>
>>>>>
>>>>> Thanks Dumitru for reporting the issue. I reviewed the code and the
>> test
>>>>> cases again, and didn't figure out any problem yet. The failed
>> assertion
>>>>> "ovs_list_is_empty(&f->list_node) " is to assert that the desired_flow
>>>>> is not accessed more than once in a single flood_remove call. This
>>>>> should be true because the loop at
>>>>>
>> (https://github.com/ovn-org/ovn/blob/master/controller/ofctrl.c#L1126-L1134)
>>>>> ensures that all flows in to_be_removed (which is the only list that
>>>>> uses the "list_node" member) is detached from the cross reference
>> lists,
>>>>> so that in the following recursive calls these flows will never be
>>>>> accessed again. Now that this assert fails, it could be some orphaned
>>>>> pointer somewhere, leading to the "f" pointing to a flow that's already
>>>>> destroyed.
>>>>>
>>>>> One thing not clear to me is the version you used for the testing.
>> In BZ
>>>>> you mentioned revision: 520189bf313054702f5f802acd7944cca3b6baaa.
>>>>> However, in this revision, the line number of the assertion doesn't
>>>>> match the one you pasted in BZ. What you pasted was 1135:
>>>>
>>>> The revision should be good, I think the problem is with the way I
>>>> rebuilt the packages in the new container after I retrieved the core
>> dump.
>>>>
>>>> The assert message clearly points to line 1108. This is from the old
>>>> logs of the run where I saw the crash:
>>>>
>>>> 2020-09-11T11:53:18.426Z|00037|util|EMER|controller/ofctrl.c:1108:
>>>> assertion ovs_list_is_empty(&f->list_node) failed in
>>>> flood_remove_flows_for_sb_uuid()
>>>>
>>>>>
>>>>> #6  0x00005607d73502fa in flood_remove_flows_for_sb_uuid
>>>>> (flow_table=flow_table at entry=0x5607d8c4b380,
>>>>> sb_uuid=sb_uuid at entry=0x5607d8f91430,
>>>>> flood_remove_nodes=flood_remove_nodes at entry=0x7ffeccadb9c0) at
>>>>> controller/ofctrl.c:1135
>>>>>
>>>>> What's in the code for revision (520189bf) is 1108:
>>>>>
>> https://github.com/ovn-org/ovn/blob/520189bf313054702f5f802acd7944cca3b6baaa/controller/ofctrl.c#L1108
>>>>>
>>>>> Did you change anything while doing the testing?
>>>>>
>>>>
>>>> I'll try to replicate the issue again next week, maybe we get more info.
>>>>
>>>> Thanks,
>>>> Dumitru
>>>>
>>>
>>> Hi Han,
>>>
>>> I managed to minimize the way to replicate the crash (with OVN master).
>>> I do:
>>>
>>> make sandbox
>>> ovn-nbctl ls-add ls
>>> ovn-nbctl lsp-add ls vm1
>>> ovn-nbctl acl-add lsp-set-addresses vm1 "0a:58:fc:09:1d:4e
>> fd00:10:244:1::5"
>>
>> Thanks Dumitru! Here it seems a copy/paste problem. Could you provide
>> the ACLs added?

Just skip this line, it's not needed.
100% reproducible for me without it:

2020-09-14T16:52:04.039Z|00020|util|EMER|controller/ofctrl.c:1108:
  assertion ovs_list_is_empty(&f->list_node) failed in flood_remove_flows_for_sb_uuid()



> 
> There are no ACLs involved. This line was a mistake, please ignore it.
> The conjunctions come from the IPv6 port security logical flows.
> 
> https://paste.centos.org/view/bb93f7d3
> 
> Regards,
> Dumitru
> 
>>
>>> ovn-nbctl lsp-set-addresses vm1 "0a:58:fc:09:1d:4e fd00:10:244:1::5"
>>> ovn-nbctl lsp-set-port-security vm1 "0a:58:fc:09:1d:4e fd00:10:244:1::5"
>>> ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal -- set
>>> interface vm1 external-ids:iface-id=vm1
>>> sleep 1
>>> ovs-vsctl set interface vm1 external-ids:iface-id=foo
>>> sleep 1
>>> ovs-vsctl set interface vm1 external-ids:iface-id=vm1
>>> sleep 1
>>> ovs-vsctl set interface vm1 external-ids:iface-id=foo
>>>
>>> This crashes ovn-controller and if I look at the openflows before the
>>> last port unbind, I notice some weird flows with action:
>>>
>>>
>> actions=conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2)
>>>
>>> This looks broken. I tried to figure out what's happening but
>>> unfortunately I didn't manage to.
>>>
>>> Hope this helps pinpoint the root cause.
>>>
>>> Regards,
>>> Dumitru
>>>
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 



More information about the dev mailing list