[ovs-dev] [PATCH ovn v2 0/9] Incremental processing for flow installation.

Dumitru Ceara dceara at redhat.com
Fri Sep 11 19:16:37 UTC 2020


On 9/11/20 8:39 PM, Han Zhou wrote:
> 
> 
> On Fri, Sep 11, 2020 at 6:17 AM Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>> wrote:
>>
>> On 9/9/20 11:13 PM, Han Zhou wrote:
>> > On Wed, Sep 9, 2020 at 6:39 AM Mark Michelson <mmichels at redhat.com
> <mailto:mmichels at redhat.com>> wrote:
>> >>
>> >> Thanks for addressing my concerns on patch 9.
>> >>
>> >> Acked-by: Mark Michelson <mmichels at redhat.com
> <mailto:mmichels at redhat.com>>
>> >>
>> >> Warning: I'm also ACKing Numan's patch series that also adds some
>> >> lflow-level caching (but he is caching expressions). So whoever loses
>> >> this merge race may have some work to do in order to apply their patch
>> >> cleanly.
>> >>
>> > Thanks Mark for the review, and the warning :).
>> > I applied the series to master, and also the first 4 patches to
>> > branch-20.06 and patch 1,2,4 to branch-20.03, to fix the conjunction
> flow
>> > bug in those branches.
>> >
>> > Thanks,
>> > Han
>> >
>>
>> Hi Han,
>>
>> This seems to cause an assertion failure in ovn-controller compiled with
>> today's OVS/OVN master branches. I'm not completely sure about the steps
>> to reproduce because I was testing some other ovn-k8s related things but
>> I did get a core dump and the NB/DB/conf DBs.
>>
>> I opened a RedHat BZ [0] (more for my tracking than anything else) and I
>> shared there the OVS/OVN revisions and how to build the exact same image
>> that I was running when I hit the crash.
>>
>> I'll try to investigate what's happening but you're probably more
>> familiar with the code so any hints would be great.
>>
>> Thanks,
>> Dumitru
>>
>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1878139
>>
> 
> Thanks Dumitru for reporting the issue. I reviewed the code and the test
> cases again, and didn't figure out any problem yet. The failed assertion
> "ovs_list_is_empty(&f->list_node) " is to assert that the desired_flow
> is not accessed more than once in a single flood_remove call. This
> should be true because the loop at
> (https://github.com/ovn-org/ovn/blob/master/controller/ofctrl.c#L1126-L1134)
> ensures that all flows in to_be_removed (which is the only list that
> uses the "list_node" member) is detached from the cross reference lists,
> so that in the following recursive calls these flows will never be
> accessed again. Now that this assert fails, it could be some orphaned
> pointer somewhere, leading to the "f" pointing to a flow that's already
> destroyed.
> 
> One thing not clear to me is the version you used for the testing. In BZ
> you mentioned revision: 520189bf313054702f5f802acd7944cca3b6baaa.
> However, in this revision, the line number of the assertion doesn't
> match the one you pasted in BZ. What you pasted was 1135:

The revision should be good, I think the problem is with the way I
rebuilt the packages in the new container after I retrieved the core dump.

The assert message clearly points to line 1108. This is from the old
logs of the run where I saw the crash:

2020-09-11T11:53:18.426Z|00037|util|EMER|controller/ofctrl.c:1108:
assertion ovs_list_is_empty(&f->list_node) failed in
flood_remove_flows_for_sb_uuid()

> 
> #6  0x00005607d73502fa in flood_remove_flows_for_sb_uuid
> (flow_table=flow_table at entry=0x5607d8c4b380,
> sb_uuid=sb_uuid at entry=0x5607d8f91430,
> flood_remove_nodes=flood_remove_nodes at entry=0x7ffeccadb9c0) at
> controller/ofctrl.c:1135
> 
> What's in the code for revision (520189bf) is 1108:
> https://github.com/ovn-org/ovn/blob/520189bf313054702f5f802acd7944cca3b6baaa/controller/ofctrl.c#L1108
> 
> Did you change anything while doing the testing?
> 

I'll try to replicate the issue again next week, maybe we get more info.

Thanks,
Dumitru



More information about the dev mailing list