[ovs-discuss] ovsdb-server unkillable, need some help

Jeff Bachtel jbachtel at bericotechnologies.com
Thu Feb 20 05:54:50 UTC 2014


I'm running OpenVSwitch 1.11 from the RDO Havana repository. In 
addition, I'm running OpenStack Havana, Neutron, and Ceph Emperor, all 
on some CentOS 6.5 machines.

After installing Bacula on the previous openstack version (grizzly), I 
noticed the networking had become somewhat load sensitive. ovsdb-server 
was freezing - not responding to queries on its unix socket and becoming 
unkillable in process state R< . Believing that it was probably due to 
being behind in ovs version, I pushed ahead with an upgrade only to find 
my stability problems become much much worse. Every 20-30 minutes I can 
count on an ovsdb-server process freezing.

At 
https://drive.google.com/folderview?id=0B-wx2_T_hW-_OXZJWGJNc0l0MzQ&usp=sharing 
please find a folder with shared copies of diagnostic files from a 
machine with hung ovsdb-server. There is a process list (.ps, apologies 
forgot postscript until upload was done), strace, dmesg, and 
/var/log/messages.

The strace didn't reveal anything suspicious to me. To mitigate I tried 
lowering log verbosity, completely recreating conf.db, as well as 
frequent compacting (every minute) and putting the db on a ramdisk, 
nothing worked as a solution.

The ovsdb-server processes most likely to succumb to locking run on ceph 
hosts running osd - meaning they can see a lot of network traffic, as 
well as disk i/o.

I don't understand what a simple database RPC server could be doing that 
would cause it to become unkillable, especially with the attempt at 
minimizing disk i/o by putting the db file on a ramdisk.

I hope someone has some ideas of what I might do to test or mitigate the 
situation. Not running ceph osd on the hosts is, unfortunately, not a 
solution I can use.

Thanks,
Jeff



More information about the discuss mailing list