[ovs-dev] ovn-northd-ddlog bug with HA_Chassis_Groups
Mark Michelson
mmichels at redhat.com
Fri Apr 9 16:01:42 UTC 2021
Hi guys,
While developing a new feature, I found a defect in ovn-northd-ddlog
when HA_Chassis_Groups are used. I have attached a script that
replicates the issue.
If you start a sandbox environment in ovn master (`make sandbox
SANDBOXFLAGS="--ddlog"`), and then run the script, you'll find that the
script hangs, and you must ctrl+c to terminate. At this point, my system
shows ovn-northd-ddlog taking up 60-80% CPU. If you run `ovn-sbctl list
port_binding`, you'll see that chassis-resident port bindings are not
present. If you run `ovn-sbctl list ha_chassis_group` nothing is returned.
If you remove the "--wait=sb" from the final ovn-nbctl command, then the
script will not hang, but the same symptoms occur.
The attached script is the minimum I could manage. I attempted to remove
the second router and second HA_Chassis_Group, but doing that made the
issue disappear.
Ideally, rather than reporting the issue to you guys, I would be
diagnosing the issue myself and then presenting a patch to fix it.
However, I'm at a bit of a loss for how to debug this. I could try to
inspect the source for the issue, but finding what the running process
is doing and stepping through the running code would be much more sensible.
I'm looking for two pieces of information here:
1) How would you go about debugging this particular issue?
2) What's going on here? :) If you know what is going on, then how did
you make that determination?
Thanks,
Mark Michelson
More information about the dev
mailing list