[ovs-discuss] Regarding ovn-controller unavailability with RAFT

Ramteja Tadishetti ramteja.osc at gmail.com
Wed Feb 13 02:54:38 UTC 2019


Hi,

I am facing trouble in which ovn-controller stops indefinitely while
removing a node in RAFT setup.

Scenario details:
Deployment:
Running scale test as described in
https://ovn-scale-test.readthedocs.io/en/latest/ .
[ The above link assumes ovn-central as a single node. In order to fit the
rally test suit with RAFT setup , we need to change the ovn-remote in
rally/lib/python2.7/site-packages/rally_ovs/plugins/ovs/deployment/engines/ovs
]

Test details:
1) Create logical networks/switches
2) Create logical router
3) Connect logical switches to logical router
There after we are creating a batch of 25 port which involves these steps
a) Create 25 logical port , bind each port on one of the chassis
b) Wait for the port to come up on the chassis i.e

While waiting for the port to come up on the chassis, I tried removing the
current master node in raftsetup . using kick command

$ ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl cluster/kick
OVN_Southbound <Node1>

After removing the master node, failover happens gracefully  i.e another
node becomes leader.  Cluster is still available after removing node ( able
to run ovn-nbctl to add logical switch , ovn-sbctl commands to add chassis).

But ovn-controller which was in waiting for the port to come up, stops
indefinitely   ,i.e unable to make progress.
Debug log shows it is struck in poll_loop|DBG|wakeup due to
[POLLIN][POLLHUP] on fd 18 ../lib/stream-fd.c.

Could you please let us know how to remove a node with out stalling any
ovn-controller?

Thanks,
Ramteja
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190212/965d7bf1/attachment-0001.html>


More information about the discuss mailing list