[ovs-dev] [PATCH ovn] ovn-northd-ddlog: Fix startup race updating nb_cfg_timestamp.

Ben Pfaff blp at ovn.org
Mon Mar 8 23:47:42 UTC 2021


On Mon, Mar 08, 2021 at 02:41:09PM -0800, Han Zhou wrote:
> On Mon, Mar 8, 2021 at 1:50 PM Ben Pfaff <blp at ovn.org> wrote:
> >
> > On Mon, Mar 08, 2021 at 10:47:40AM -0800, Han Zhou wrote:
> > > On Fri, Mar 5, 2021 at 5:24 PM Ben Pfaff <blp at ovn.org> wrote:
> > > >
> > > > The DDlog code used by ovn-northd-ddlog relies on always having a
> value
> > > > (exactly one of them) in the NbCfgTimestamp relation.  The C code
> > > > inserts and updates this value.  It's supposed to insert the initial
> > > > value the first time that it processes any update from the northbound
> > > > database.
> > > >
> > > > However, it actually only did this if the first update it processed
> was
> > > > from the northbound database--if it was from the southbound database,
> > > > it forgot to do so even when the first northbound update showed up.
> > > > The DDlog code would then go kind of berzerk creating and deleting the
> > > > NB_Global record until something actually happened to increment nb_cfg
> > > > (such as "ovn-nbctl --wait=sb sync").
> > > >
> > > > This fixes the problem by only recording that a value was inserted to
> > > > NbCfgTimestamp when the first update from the northbound database is
> > > > received, same as originally intended.
> > > >
> > > > Signed-off-by: Ben Pfaff <blp at ovn.org>
> > > > ---
> > > >  northd/ovn-northd-ddlog.c | 6 ++++--
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/northd/ovn-northd-ddlog.c b/northd/ovn-northd-ddlog.c
> > > > index aa0ea73e401d..f373bb129a89 100644
> > > > --- a/northd/ovn-northd-ddlog.c
> > > > +++ b/northd/ovn-northd-ddlog.c
> > > > @@ -445,8 +445,10 @@ northd_parse_updates(struct northd_ctx *ctx,
> struct
> > > ovs_list *updates)
> > > >      if (ddlog_commit(ctx->ddlog)) {
> > > >          goto error;
> > > >      }
> > > > -    old_nb_cfg = new_nb_cfg;
> > > > -    old_nb_cfg_timestamp = new_nb_cfg_timestamp;
> > > > +    if (ctx->has_timestamp_columns) {
> > > > +        old_nb_cfg = new_nb_cfg;
> > > > +        old_nb_cfg_timestamp = new_nb_cfg_timestamp;
> > > > +    }
> > >
> > > Hi Ben, sorry that I didn't understand this patch. It seems this change
> > > wouldn't change the behavior. It only adds the if condition before
> > > assigning old_nb_cfg and old_nb_cfg_timestamp. If
> > > ctx->has_timestamp_columns is true, it is the same behavior as before;
> if
> > > ctx->has_timestamp_columns is false, new_nb_cfg and new_nb_cfg_timestamp
> > > wouldn't change (in the previous if-block not shown in this diff), then
> > > before this change the old_nb_cfg and old_nb_cfg_timestamp would just be
> > > re-assigned to their original values (a no-op) - same effect as this
> > > change. Did I misunderstand anything? I also couldn't see any link
> between
> > > this change and the commit message.
> >
> > This changes the scenario where ctx->has_timestamp_columns is false and
> > old_nb_cfg is INT64_MIN.  In the previous version of the code would
> > change old_nb_cfg to 0; in the new version of the code, it keeps its
> > value of INT64_MIN.
> 
> Oh, I guess we are looking at the patch based on different versions of
> code. Although the patch applies cleanly on top of master, I think it makes
> more sense if it were added after
> https://patchwork.ozlabs.org/project/ovn/patch/20210304041012.4128938-10-blp@ovn.org/

Oh, gosh, sorry, I didn't realize that there was a dependency.  I
actually thought that this was a race that had always been there and was
just triggering more often now.  I was developing on top of the patch
you cite ("ovn-northd-ddlog: Apply multiple database updates in single
ddlog txn.") and sent a separate patch because I thought it was separate.

This should be folded into the previous patch, as a bug fix, then.  I'll
put it into v2.

My apologies.


More information about the dev mailing list