[ovs-dev] [PATCH ovn branch-20.03] chassis: Fix the way encaps are updated for a chassis record.

Dumitru Ceara dceara at redhat.com
Tue Sep 8 09:24:05 UTC 2020


On 9/8/20 9:47 AM, Numan Siddique wrote:
> 
> 
> On Tue, Sep 8, 2020 at 1:16 PM Numan Siddique <numans at ovn.org
> <mailto:numans at ovn.org>> wrote:
> 
> 
> 
>     On Tue, Sep 8, 2020 at 12:58 PM Han Zhou <zhouhan at gmail.com
>     <mailto:zhouhan at gmail.com>> wrote:
> 
>         On Mon, Sep 7, 2020 at 3:11 AM Dumitru Ceara <dceara at redhat.com
>         <mailto:dceara at redhat.com>> wrote:
>         >
>         > ovn-controller always stores the last configured
>         system-id/chassis-id in
>         > memory regardless if the connection to the SB is up or down.
>         This is OK
>         > as long as the change can be committed successfully when the SB DB
>         > connection comes back up.
>         >
>         > Without this change, if the chassis-id changes while the SB
>         connection is
>         > down, ovn-controller will fail to create the new record but
>         nevertheless
>         > update its in-memory chassis-id. When the SB connection is
>         restored
>         > ovn-controller tries to find the record corresponding to the
>         chassis-id
>         > it stored in memory. This fails causing ovn-controller to try
>         to insert
>         > a new record. But at this point a constraint violation is hit
>         in the SB
>         > because the Encap records of the "stale" chassis still exist
>         in the DB,
>         > along with the old chassis record.
>         >
>         > This commit changes the way we search for a "stale" chassis
>         record in the
>         > SB to cover the above mentioned case. Also, in such cases
>         there's no need
>         > to recreate the Encaps, it's sufficient to update the
>         chassis_name field.
>         >
>         > Fixes: 5344f24ecb1a ("ovn-controller: Refactor chassis.c to
>         abstract the
>         string parsing")
>         > Signed-off-by: Dumitru Ceara <dceara at redhat.com
>         <mailto:dceara at redhat.com>>
>         >
>         > (cherry-picked from master commit
>         94a32fca2d2b825fece0ef5b1873459bd9857dd3)
> 
>         Hi Dumitru,
> 

Hi Han, Numan,

>         Sorry that I missed the original review, but this patch seems
>         causing
>         problem in master already. When RBAC is enabled, it will result
>         in below
>         error:
>         2020-09-08T07:20:31.991Z|00014|binding|INFO|Claiming lport lsp1
>         for this
>         chassis.
>         2020-09-08T07:20:31.992Z|00015|ovsdb_idl|WARN|transaction error:
>         {"details":"RBAC rules for client \"chassis-1\" role
>         \"ovn-controller\"
>         prohibit modification of table \"Encap\".","error":"permission
>         error"}
> 
>         You can simply reproduce by: make sandbox, and then run: ovn-nbctl
>         --wait=hv sync, which will hang forever because of the problem.
>         This patch seems to be updating the "name" field of Encap table,
>         which
>         violates the design of RBAC, which uses "name" as the identity
>         of the
>         client. We shouldn't allow an user to change
>         system-id/chassis-id directly.

How can we enforce this from within OVN?

>         I think the proper way is firstly destroying the old chassis and
>         then
>         creating a new chassis, which would also avoid the complex logic in
>         ovn-controller regarding the "stale record" handling. What do
>         you think?
> 
> 

I'm afraid this would be a behavior change that might introduce bugs in
CMSs that allow changing system-ids (without RBAC enabled) without
explicitly deleting the "stale" Chassis *and* Chassis_private record.

Also, forcing CMS to do this housekeeping will probably add complex code
in the CMS. For example, if the "old" Chassis record is deleted before
ovn-controller receives the update about the system-id change in OVS DB
then ovn-controller will just recreate it.

>     Hi Han,
> 
>     Yes. Dumitru submitted the patch to fix this issue in master and I
>     applied it just now.
>     Can you please test again with the latest master.
> 
>     https://github.com/ovn-org/ovn/commit/849c5a492a26337789db1a2dc97570989ef08c34
> 
>     We see the same issue with branch-20.06 too.
> 
> 
> 
> Hi Dumitru,
> 
> Can you please make a 2 patch series for branch-20.03 which includes
> this patch and the other one
> which fixes the RBAC issue.
> 
> Thanks
> Numan
>  

Like you said, we have two options:

1. Leave it up to the CMS to cleanup Chassis/Chassis_private when
system-id changes and completely ignore in ovn-controller any stale
chassis records that might exist in the SB DB. In this case we'd just
let transactions fail due to constraint violations if the CMS cleanup of
the Chassis/Chassis_private records doesn't happen or is incorrect.

2. Make it more user friendly for the case when RBAC is _not used_ and
try to cleanup after ourselves in ovn-controller and reuse the stale
records. This is already happening with the code on the master branch
and if we cherry pick [0] as Numan suggested and add it to this series
it would be the case for branch-20.03 too. [0] would have to go to
branch-20.06 as well.

I would say, at least for now, the least intrusive change, with the
least chance of breaking CMSs, is #2 above but I'll wait for
confirmation before sending a new version of this patch.

Thanks,
Dumitru

[0]
https://github.com/ovn-org/ovn/commit/849c5a492a26337789db1a2dc97570989ef08c34



More information about the dev mailing list