[ovs-dev] [PATCH ovn branch-20.03] chassis: Fix the way encaps are updated for a chassis record.

Numan Siddique numans at ovn.org
Tue Sep 8 07:46:26 UTC 2020


On Tue, Sep 8, 2020 at 12:58 PM Han Zhou <zhouhan at gmail.com> wrote:

> On Mon, Sep 7, 2020 at 3:11 AM Dumitru Ceara <dceara at redhat.com> wrote:
> >
> > ovn-controller always stores the last configured system-id/chassis-id in
> > memory regardless if the connection to the SB is up or down. This is OK
> > as long as the change can be committed successfully when the SB DB
> > connection comes back up.
> >
> > Without this change, if the chassis-id changes while the SB connection is
> > down, ovn-controller will fail to create the new record but nevertheless
> > update its in-memory chassis-id. When the SB connection is restored
> > ovn-controller tries to find the record corresponding to the chassis-id
> > it stored in memory. This fails causing ovn-controller to try to insert
> > a new record. But at this point a constraint violation is hit in the SB
> > because the Encap records of the "stale" chassis still exist in the DB,
> > along with the old chassis record.
> >
> > This commit changes the way we search for a "stale" chassis record in the
> > SB to cover the above mentioned case. Also, in such cases there's no need
> > to recreate the Encaps, it's sufficient to update the chassis_name field.
> >
> > Fixes: 5344f24ecb1a ("ovn-controller: Refactor chassis.c to abstract the
> string parsing")
> > Signed-off-by: Dumitru Ceara <dceara at redhat.com>
> >
> > (cherry-picked from master commit
> 94a32fca2d2b825fece0ef5b1873459bd9857dd3)
>
> Hi Dumitru,
>
> Sorry that I missed the original review, but this patch seems causing
> problem in master already. When RBAC is enabled, it will result in below
> error:
> 2020-09-08T07:20:31.991Z|00014|binding|INFO|Claiming lport lsp1 for this
> chassis.
> 2020-09-08T07:20:31.992Z|00015|ovsdb_idl|WARN|transaction error:
> {"details":"RBAC rules for client \"chassis-1\" role \"ovn-controller\"
> prohibit modification of table \"Encap\".","error":"permission error"}
>
> You can simply reproduce by: make sandbox, and then run: ovn-nbctl
> --wait=hv sync, which will hang forever because of the problem.
> This patch seems to be updating the "name" field of Encap table, which
> violates the design of RBAC, which uses "name" as the identity of the
> client. We shouldn't allow an user to change system-id/chassis-id directly.
> I think the proper way is firstly destroying the old chassis and then
> creating a new chassis, which would also avoid the complex logic in
> ovn-controller regarding the "stale record" handling. What do you think?
>
>
Hi Han,

Yes. Dumitru submitted the patch to fix this issue in master and I applied
it just now.
Can you please test again with the latest master.

https://github.com/ovn-org/ovn/commit/849c5a492a26337789db1a2dc97570989ef08c34

We see the same issue with branch-20.06 too.

Thanks
Numan

Thanks,
> Han
>
> > ---
> >  controller/chassis.c    | 60
> ++++++++++++++++++++++++++++++-------------------
> >  tests/ovn-controller.at | 17 ++++++++++++++
> >  2 files changed, 54 insertions(+), 23 deletions(-)
> >
> > diff --git a/controller/chassis.c b/controller/chassis.c
> > index 522893e..d525463 100644
> > --- a/controller/chassis.c
> > +++ b/controller/chassis.c
> > @@ -380,10 +380,7 @@ chassis_tunnels_changed(const struct sset
> *encap_type_set,
> >  {
> >      size_t encap_type_count = 0;
> >
> > -    for (int i = 0; i < chassis_rec->n_encaps; i++) {
> > -        if (strcmp(chassis_rec->name,
> chassis_rec->encaps[i]->chassis_name)) {
> > -            return true;
> > -        }
> > +    for (size_t i = 0; i < chassis_rec->n_encaps; i++) {
> >
> >          if (!sset_contains(encap_type_set,
> chassis_rec->encaps[i]->type)) {
> >              return true;
> > @@ -456,6 +453,19 @@ chassis_build_encaps(struct ovsdb_idl_txn
> *ovnsb_idl_txn,
> >  }
> >
> >  /*
> > + * Updates encaps for a given chassis. This can happen when the chassis
> > + * name has changed. Also, the only thing we support updating is the
> > + * chassis_name. For other changes the encaps will be recreated.
> > + */
> > +static void
> > +chassis_update_encaps(const struct sbrec_chassis *chassis)
> > +{
> > +    for (size_t i = 0; i < chassis->n_encaps; i++) {
> > +        sbrec_encap_set_chassis_name(chassis->encaps[i], chassis->name);
> > +    }
> > +}
> > +
> > +/*
> >   * Returns a pointer to a chassis record from 'chassis_table' that
> >   * matches at least one tunnel config.
> >   */
> > @@ -486,9 +496,10 @@ chassis_get_stale_record(const struct
> sbrec_chassis_table *chassis_table,
> >  /* If this is a chassis config update after we initialized the record
> once
> >   * then we should always be able to find it with the ID we saved in
> >   * chassis_state.
> > - * Otherwise (i.e., first time we create the record) then we check if
> there's
> > - * a stale record from a previous controller run that didn't end
> gracefully
> > - * and reuse it. If not then we create a new record.
> > + * Otherwise (i.e., first time we create the record or if the system-id
> > + * changed) then we check if there's a stale record from a previous
> > + * controller run that didn't end gracefully and reuse it. If not then
> we
> > + * create a new record.
> >   */
> >  static const struct sbrec_chassis *
> >  chassis_get_record(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > @@ -497,29 +508,31 @@ chassis_get_record(struct ovsdb_idl_txn
> *ovnsb_idl_txn,
> >                     const struct ovs_chassis_cfg *ovs_cfg,
> >                     const char *chassis_id)
> >  {
> > -    const struct sbrec_chassis *chassis_rec;
> > +    const struct sbrec_chassis *chassis = NULL;
> >
> >      if (chassis_info_id_inited(&chassis_state)) {
> > -        chassis_rec = chassis_lookup_by_name(sbrec_chassis_by_name,
> > -
> chassis_info_id(&chassis_state));
> > -        if (!chassis_rec) {
> > -            VLOG_DBG("Could not find Chassis, will create it"
> > -                     ": stored (%s) ovs (%s)",
> > +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
> > +
> chassis_info_id(&chassis_state));
> > +        if (!chassis) {
> > +            VLOG_DBG("Could not find Chassis, will check if the id
> changed: "
> > +                     "stored (%s) ovs (%s)",
> >                       chassis_info_id(&chassis_state), chassis_id);
> > -            if (ovnsb_idl_txn) {
> > -                /* Recreate the chassis record.  */
> > -                chassis_rec = sbrec_chassis_insert(ovnsb_idl_txn);
> > -            }
> >          }
> > -    } else {
> > -        chassis_rec =
> > -            chassis_get_stale_record(chassis_table, ovs_cfg,
> chassis_id);
> > +    }
> >
> > -        if (!chassis_rec && ovnsb_idl_txn) {
> > -            chassis_rec = sbrec_chassis_insert(ovnsb_idl_txn);
> > +    if (!chassis) {
> > +        chassis = chassis_get_stale_record(chassis_table, ovs_cfg,
> chassis_id);
> > +    }
> > +
> > +    if (!chassis) {
> > +        /* Recreate the chassis record. */
> > +        VLOG_DBG("Could not find Chassis, will create it: %s",
> chassis_id);
> > +        if (ovnsb_idl_txn) {
> > +            return sbrec_chassis_insert(ovnsb_idl_txn);
> >          }
> >      }
> > -    return chassis_rec;
> > +
> > +    return chassis;
> >  }
> >
> >  /* Update a Chassis record based on the config in the ovs config. */
> > @@ -567,6 +580,7 @@ chassis_update(const struct sbrec_chassis
> *chassis_rec,
> >                                  &ovs_cfg->encap_ip_set,
> ovs_cfg->encap_csum,
> >                                  chassis_rec);
> >      if (!tunnels_changed) {
> > +        chassis_update_encaps(chassis_rec);
> >          return;
> >      }
> >
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index 63b2581..1c7aa58 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -196,6 +196,23 @@ OVS_WAIT_UNTIL([
> >      test "${sysid}" = "${chassis_id}"
> >  ])
> >
> > +# Simulate system-id changing while ovn-controller is disconnected from
> the
> > +# SB.
> > +valid_remote=$(ovs-vsctl get Open_vSwitch . external_ids:ovn-remote)
> > +invalid_remote=tcp:0.0.0.0:4242
> > +ovs-vsctl set Open_vSwitch . external_ids:ovn-remote=${invalid_remote}
> > +expected_state="not connected"
> > +OVS_WAIT_UNTIL([
> > +    test "${expected_state}" = "$(ovn-appctl -t ovn-controller
> connection-status)"
> > +])
> > +sysid=${sysid}-bar
> > +ovs-vsctl set Open_vSwitch . external-ids:system-id="${sysid}"
> > +ovs-vsctl set Open_vSwitch . external_ids:ovn-remote=${valid_remote}
> > +OVS_WAIT_UNTIL([
> > +    chassis_id=$(ovn-sbctl get Chassis "${sysid}" name)
> > +    test "${sysid}" = "${chassis_id}"
> > +])
> > +
> >  # Gracefully terminate daemons
> >  OVN_CLEANUP_SBOX([hv])
> >  OVN_CLEANUP_VSWITCH([main])
> > --
> > 1.8.3.1
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>


More information about the dev mailing list