[ovs-dev] [PATCH ovn 2/2] if-status: Add OVS interface status management module.

Numan Siddique numans at ovn.org
Tue May 4 14:41:43 UTC 2021


On Tue, May 4, 2021 at 8:35 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> On 5/4/21 10:20 AM, Dumitru Ceara wrote:
> > On 4/29/21 9:51 AM, Dumitru Ceara wrote:
> >> On 4/28/21 6:39 PM, Numan Siddique wrote:
> >>> On Fri, Apr 23, 2021 at 10:18 AM Dumitru Ceara <dceara at redhat.com> wrote:
> >>>>
> >>>> The initial implementation of the notification mechanism that indicates
> >>>> if OVS flows corresponding to a logical switch port have been installed
> >>>> is not resilient enough.  It didn't cover the case when transactions to
> >>>> the local OVS DB or to the Southbound fail.
> >>>>
> >>>> In order to deal with that, factor out the code that tracks the state
> >>>> of the interfaces to a new module, 'if-status', and implement it with
> >>>> a state machine that will retry setting Port_Binding.UP and
> >>>> OVS.Interface.external_ids:ovn-installed in the Southbound and local
> >>>> OVS databases.
> >>>>
> >>>> Having a separate module makes sense because tracking the interface
> >>>> state doesn't really depend on the rest of the binding module, except
> >>>> for interacting with the Port_Binding and Interface IDL records.  For
> >>>> this we add some additional APIs to binding.c.
> >>>>
> >>>> Reported-at: https://bugzilla.redhat.com/1952846
> >>>> Fixes: 4d3cb42b076b ("binding: Set Logical_Switch_Port.up when all OVS flows are installed.")
> >>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
> >>>
> >>> Hi Dumitru,
> >>
> >> Hi Numan,
> >>
> >>>
> >>> Thanks for fixing this issue.
> >>>
> >>> During my testing, I found a regression.  If suppose an ovs port
> >>> "sw0p1" is bound on
> >>> a chassis for the logical port - sw0-port1, and if I delete the ovs port sw0p1,
> >>> ovn-controller clears the chassis column of Port_Binding but it
> >>> doesn't set the "up" column to "false".
> >>> It still remains "true".
> >>>
> >>> All the tests passed for me locally but still I see this issue when I
> >>> ran in fake-multinode setup.
> >>>
> >>> Can you please check if there are existing test cases which check for
> >>> the "up" column of port binding to be
> >>> false when it is released ?
> >>
> >> You're right, the "up" field of the port binding in the SB will stay
> >> "true" but the "chassis" column is unset so the NB.LSP.up field will be
> >> set to "false".
> >>
> >> I agree this is a bit confusing though and I'll try to fix it in v2.
> >>
> >>>
> >>>
> >>> I would also suggest adding a few test cases to replicate the issue it
> >>> is trying to address. If possible.
> >>>
> >>> To test this scenario, I have one idea.  Make the SB ovsdb-server read-only
> >>> (ovn-appctl -t ovnsb_db.ctl ovsdb-server/set-active-ovsdb-server
> >>> tcp:192.0.0.1:6642 and
> >>> ovn-appctl -t ovnsb_db.ctl ovsdb-server/connect-active-ovsdb-server)
> >>> after a port is claimed, and then run ovs-vsctl commands to release
> >>> the ovs port. At this point ovn-controller will fail
> >>> to set the "chassis" and "up" column.  After a few seconds, make the
> >>> ovsdb-server active
> >>> and verify that the "chassis" and "up" columns are set properly.
> >>
> >> This is a great idea, thanks for the suggestion, I'll try to add some
> >> tests like this.
> >>
> >>>
> >>> Is this possible to do ?  Or is this a wrong configuration ? Because
> >>> ovn-controller can not come to
> >>> know when ovsdb-server becomes active. Unless it keeps trying continuously.
> >>>
> >>> In my testing, ovn-controller doesn't set it properly when I make the
> >>> ovsdb-server active again.
> >>>
> >>> What do you think ?
> >>
> >> Actually, I think this just uncovered an already existing bug.  Even
> >> without my changes, changing the iface-id while ovn-controller is
> >> connected to a backup server triggers a transaction error but when the
> >> SB becomes active again, ovn-controller will not retry the transaction
> >> and won't even clear the chassis column of the port binding.
> >>
> >> The only thing that helps is a full recompute.
> >>
> >> I suspect there's a problem either with the way ovn-controller checks
> >> for the transaction commit result or within the IDL.
> >>
> >> I'll investigate some more but likely this will be a separate patch in v2.
> >
> > This is happening due to 0401cf5f9e06 ("ovsdb idl: Try committing the
> > pending txn in ovsdb_idl_loop_run.").
> >
> > Since this commit we try to commit the transaction early in the run
> > (processing it's result) but if the transaction errored out we don't
> > return the result to the caller of ovsdb_idl_loop_run().
> >
>
> I discussed a bit with Ilya offline about 0401cf5f9e06 ("ovsdb idl: Try
> committing the pending txn in ovsdb_idl_loop_run.") and how
> ovn-controller should behave.
>
> I think it's best to revert 0401cf5f9e06 and change ovn-controller
> behavior so it doesn't process updates/clear IDL tracked data when
> transactions are in progress.  But that's a separate change and likely
> outside the scope of this fix.

Ok.  Thanks for the investigation.

>
> However, without it, I can't use the approach you suggested for the self
> test as writing to a backup ovsdb-server will not trigger recomputes.
> Do you maybe have a different suggestion in order to test the scenario
> this patch is trying to fix?
>
I can't think of any other ideas.

> Otherwise I can add a few sanity tests and send a v2.

Sounds good to me.

Regards
Numan

>
> Thanks,
> Dumitru
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list