[ovs-dev] [PATCH ovn v2 0/9] Incremental processing for flow installation.

Han Zhou hzhou at ovn.org
Fri Sep 11 18:39:52 UTC 2020


On Fri, Sep 11, 2020 at 6:17 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> On 9/9/20 11:13 PM, Han Zhou wrote:
> > On Wed, Sep 9, 2020 at 6:39 AM Mark Michelson <mmichels at redhat.com>
wrote:
> >>
> >> Thanks for addressing my concerns on patch 9.
> >>
> >> Acked-by: Mark Michelson <mmichels at redhat.com>
> >>
> >> Warning: I'm also ACKing Numan's patch series that also adds some
> >> lflow-level caching (but he is caching expressions). So whoever loses
> >> this merge race may have some work to do in order to apply their patch
> >> cleanly.
> >>
> > Thanks Mark for the review, and the warning :).
> > I applied the series to master, and also the first 4 patches to
> > branch-20.06 and patch 1,2,4 to branch-20.03, to fix the conjunction
flow
> > bug in those branches.
> >
> > Thanks,
> > Han
> >
>
> Hi Han,
>
> This seems to cause an assertion failure in ovn-controller compiled with
> today's OVS/OVN master branches. I'm not completely sure about the steps
> to reproduce because I was testing some other ovn-k8s related things but
> I did get a core dump and the NB/DB/conf DBs.
>
> I opened a RedHat BZ [0] (more for my tracking than anything else) and I
> shared there the OVS/OVN revisions and how to build the exact same image
> that I was running when I hit the crash.
>
> I'll try to investigate what's happening but you're probably more
> familiar with the code so any hints would be great.
>
> Thanks,
> Dumitru
>
> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1878139
>

Thanks Dumitru for reporting the issue. I reviewed the code and the test
cases again, and didn't figure out any problem yet. The failed assertion
"ovs_list_is_empty(&f->list_node) " is to assert that the desired_flow is
not accessed more than once in a single flood_remove call. This should be
true because the loop at (
https://github.com/ovn-org/ovn/blob/master/controller/ofctrl.c#L1126-L1134)
ensures that all flows in to_be_removed (which is the only list that uses
the "list_node" member) is detached from the cross reference lists, so that
in the following recursive calls these flows will never be accessed again.
Now that this assert fails, it could be some orphaned pointer somewhere,
leading to the "f" pointing to a flow that's already destroyed.

One thing not clear to me is the version you used for the testing. In BZ
you mentioned revision: 520189bf313054702f5f802acd7944cca3b6baaa. However,
in this revision, the line number of the assertion doesn't match the one
you pasted in BZ. What you pasted was 1135:

#6  0x00005607d73502fa in flood_remove_flows_for_sb_uuid
(flow_table=flow_table at entry=0x5607d8c4b380,
sb_uuid=sb_uuid at entry=0x5607d8f91430,
flood_remove_nodes=flood_remove_nodes at entry=0x7ffeccadb9c0) at
controller/ofctrl.c:1135

What's in the code for revision (520189bf) is 1108:
https://github.com/ovn-org/ovn/blob/520189bf313054702f5f802acd7944cca3b6baaa/controller/ofctrl.c#L1108

Did you change anything while doing the testing?

Thanks,
Han


More information about the dev mailing list