[ovs-dev] ovn-northd-ddlog bug with HA_Chassis_Groups

Mark Michelson mmichels at redhat.com
Fri Apr 9 16:01:42 UTC 2021


Hi guys,

While developing a new feature, I found a defect in ovn-northd-ddlog 
when HA_Chassis_Groups are used. I have attached a script that 
replicates the issue.

If you start a sandbox environment in ovn master (`make sandbox 
SANDBOXFLAGS="--ddlog"`), and then run the script, you'll find that the 
script hangs, and you must ctrl+c to terminate. At this point, my system 
shows ovn-northd-ddlog taking up 60-80% CPU. If you run `ovn-sbctl list 
port_binding`, you'll see that chassis-resident port bindings are not 
present. If you run `ovn-sbctl list ha_chassis_group` nothing is returned.

If you remove the "--wait=sb" from the final ovn-nbctl command, then the 
script will not hang, but the same symptoms occur.

The attached script is the minimum I could manage. I attempted to remove 
the second router and second HA_Chassis_Group, but doing that made the 
issue disappear.

Ideally, rather than reporting the issue to you guys, I would be 
diagnosing the issue myself and then presenting a patch to fix it. 
However, I'm at a bit of a loss for how to debug this. I could try to 
inspect the source for the issue, but finding what the running process 
is doing and stepping through the running code would be much more sensible.

I'm looking for two pieces of information here:
1) How would you go about debugging this particular issue?
2) What's going on here? :) If you know what is going on, then how did 
you make that determination?

Thanks,
Mark Michelson


More information about the dev mailing list