[ovs-discuss] ovsdb-server unkillable, need some help
Jeff Bachtel
jbachtel at bericotechnologies.com
Thu Feb 20 05:54:50 UTC 2014
I'm running OpenVSwitch 1.11 from the RDO Havana repository. In
addition, I'm running OpenStack Havana, Neutron, and Ceph Emperor, all
on some CentOS 6.5 machines.
After installing Bacula on the previous openstack version (grizzly), I
noticed the networking had become somewhat load sensitive. ovsdb-server
was freezing - not responding to queries on its unix socket and becoming
unkillable in process state R< . Believing that it was probably due to
being behind in ovs version, I pushed ahead with an upgrade only to find
my stability problems become much much worse. Every 20-30 minutes I can
count on an ovsdb-server process freezing.
At
https://drive.google.com/folderview?id=0B-wx2_T_hW-_OXZJWGJNc0l0MzQ&usp=sharing
please find a folder with shared copies of diagnostic files from a
machine with hung ovsdb-server. There is a process list (.ps, apologies
forgot postscript until upload was done), strace, dmesg, and
/var/log/messages.
The strace didn't reveal anything suspicious to me. To mitigate I tried
lowering log verbosity, completely recreating conf.db, as well as
frequent compacting (every minute) and putting the db on a ramdisk,
nothing worked as a solution.
The ovsdb-server processes most likely to succumb to locking run on ceph
hosts running osd - meaning they can see a lot of network traffic, as
well as disk i/o.
I don't understand what a simple database RPC server could be doing that
would cause it to become unkillable, especially with the attempt at
minimizing disk i/o by putting the db file on a ramdisk.
I hope someone has some ideas of what I might do to test or mitigate the
situation. Not running ceph osd on the hosts is, unfortunately, not a
solution I can use.
Thanks,
Jeff
More information about the discuss
mailing list