[ovs-dev] [PATCH ovn v2 0/9] Incremental processing for flow installation.

Han Zhou hzhou at ovn.org
Mon Sep 14 16:36:19 UTC 2020


On Mon, Sep 14, 2020 at 6:04 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> On 9/11/20 9:16 PM, Dumitru Ceara wrote:
> > On 9/11/20 8:39 PM, Han Zhou wrote:
> >>
> >>
> >> On Fri, Sep 11, 2020 at 6:17 AM Dumitru Ceara <dceara at redhat.com
> >> <mailto:dceara at redhat.com>> wrote:
> >>>
> >>> On 9/9/20 11:13 PM, Han Zhou wrote:
> >>>> On Wed, Sep 9, 2020 at 6:39 AM Mark Michelson <mmichels at redhat.com
> >> <mailto:mmichels at redhat.com>> wrote:
> >>>>>
> >>>>> Thanks for addressing my concerns on patch 9.
> >>>>>
> >>>>> Acked-by: Mark Michelson <mmichels at redhat.com
> >> <mailto:mmichels at redhat.com>>
> >>>>>
> >>>>> Warning: I'm also ACKing Numan's patch series that also adds some
> >>>>> lflow-level caching (but he is caching expressions). So whoever
loses
> >>>>> this merge race may have some work to do in order to apply their
patch
> >>>>> cleanly.
> >>>>>
> >>>> Thanks Mark for the review, and the warning :).
> >>>> I applied the series to master, and also the first 4 patches to
> >>>> branch-20.06 and patch 1,2,4 to branch-20.03, to fix the conjunction
> >> flow
> >>>> bug in those branches.
> >>>>
> >>>> Thanks,
> >>>> Han
> >>>>
> >>>
> >>> Hi Han,
> >>>
> >>> This seems to cause an assertion failure in ovn-controller compiled
with
> >>> today's OVS/OVN master branches. I'm not completely sure about the
steps
> >>> to reproduce because I was testing some other ovn-k8s related things
but
> >>> I did get a core dump and the NB/DB/conf DBs.
> >>>
> >>> I opened a RedHat BZ [0] (more for my tracking than anything else)
and I
> >>> shared there the OVS/OVN revisions and how to build the exact same
image
> >>> that I was running when I hit the crash.
> >>>
> >>> I'll try to investigate what's happening but you're probably more
> >>> familiar with the code so any hints would be great.
> >>>
> >>> Thanks,
> >>> Dumitru
> >>>
> >>> [0] https://bugzilla.redhat.com/show_bug.cgi?id=1878139
> >>>
> >>
> >> Thanks Dumitru for reporting the issue. I reviewed the code and the
test
> >> cases again, and didn't figure out any problem yet. The failed
assertion
> >> "ovs_list_is_empty(&f->list_node) " is to assert that the desired_flow
> >> is not accessed more than once in a single flood_remove call. This
> >> should be true because the loop at
> >> (
https://github.com/ovn-org/ovn/blob/master/controller/ofctrl.c#L1126-L1134)
> >> ensures that all flows in to_be_removed (which is the only list that
> >> uses the "list_node" member) is detached from the cross reference
lists,
> >> so that in the following recursive calls these flows will never be
> >> accessed again. Now that this assert fails, it could be some orphaned
> >> pointer somewhere, leading to the "f" pointing to a flow that's already
> >> destroyed.
> >>
> >> One thing not clear to me is the version you used for the testing. In
BZ
> >> you mentioned revision: 520189bf313054702f5f802acd7944cca3b6baaa.
> >> However, in this revision, the line number of the assertion doesn't
> >> match the one you pasted in BZ. What you pasted was 1135:
> >
> > The revision should be good, I think the problem is with the way I
> > rebuilt the packages in the new container after I retrieved the core
dump.
> >
> > The assert message clearly points to line 1108. This is from the old
> > logs of the run where I saw the crash:
> >
> > 2020-09-11T11:53:18.426Z|00037|util|EMER|controller/ofctrl.c:1108:
> > assertion ovs_list_is_empty(&f->list_node) failed in
> > flood_remove_flows_for_sb_uuid()
> >
> >>
> >> #6  0x00005607d73502fa in flood_remove_flows_for_sb_uuid
> >> (flow_table=flow_table at entry=0x5607d8c4b380,
> >> sb_uuid=sb_uuid at entry=0x5607d8f91430,
> >> flood_remove_nodes=flood_remove_nodes at entry=0x7ffeccadb9c0) at
> >> controller/ofctrl.c:1135
> >>
> >> What's in the code for revision (520189bf) is 1108:
> >>
https://github.com/ovn-org/ovn/blob/520189bf313054702f5f802acd7944cca3b6baaa/controller/ofctrl.c#L1108
> >>
> >> Did you change anything while doing the testing?
> >>
> >
> > I'll try to replicate the issue again next week, maybe we get more info.
> >
> > Thanks,
> > Dumitru
> >
>
> Hi Han,
>
> I managed to minimize the way to replicate the crash (with OVN master).
> I do:
>
> make sandbox
> ovn-nbctl ls-add ls
> ovn-nbctl lsp-add ls vm1
> ovn-nbctl acl-add lsp-set-addresses vm1 "0a:58:fc:09:1d:4e
fd00:10:244:1::5"

Thanks Dumitru! Here it seems a copy/paste problem. Could you provide the
ACLs added?

> ovn-nbctl lsp-set-addresses vm1 "0a:58:fc:09:1d:4e fd00:10:244:1::5"
> ovn-nbctl lsp-set-port-security vm1 "0a:58:fc:09:1d:4e fd00:10:244:1::5"
> ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal -- set
> interface vm1 external-ids:iface-id=vm1
> sleep 1
> ovs-vsctl set interface vm1 external-ids:iface-id=foo
> sleep 1
> ovs-vsctl set interface vm1 external-ids:iface-id=vm1
> sleep 1
> ovs-vsctl set interface vm1 external-ids:iface-id=foo
>
> This crashes ovn-controller and if I look at the openflows before the
> last port unbind, I notice some weird flows with action:
>
>
actions=conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2),conjunction(2,2/2)
>
> This looks broken. I tried to figure out what's happening but
> unfortunately I didn't manage to.
>
> Hope this helps pinpoint the root cause.
>
> Regards,
> Dumitru
>


More information about the dev mailing list