[ovs-discuss] Huge number of netlink file descriptors open
paul at compose.io
Fri Mar 3 17:06:54 UTC 2017
We use OVS extensively and have a fair amount of experience operating and
Recently we've come up against an issue we've not seen before.
I should say we are running an older build of OVS due to the fact that it's
worked for years and is somewhat disruptive to upgrade:
ovs-vsctl (Open vSwitch) 2.3.0
Compiled Oct 29 2014 18:25:11
DB Schema 7.6.0
Linux REDACTED 3.19.0-59-generic #65~14.04.1-Ubuntu SMP Tue Apr 19 18:57:09
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
On one or two hosts (out of hundreds) we are seeing errors in
2017-03-03T16:51:48.021Z|26523023|dpif|WARN|system at ovs-system: failed to
add veth548eth1 as port: Too many open files
We have ulimit set to 65k file descriptors and indeed they are all in use
and almost all are netlink sockets to the kernel:
$sudo lsof -p $(cat /var/run/openvswitch/ovs-vswitchd.pid) | grep netlink |
$sudo lsof -p $(cat /var/run/openvswitch/ovs-vswitchd.pid) | wc -l
$cat /proc/$(cat /var/run/openvswitch/ovs-vswitchd.pid)/limits | grep open
Max open files 65535 65535 files
I understand that the switch uses 3 descriptors per bridge and 1 per port,
but we have only 2 bridges one with about 5 ports the other with 300
$ovs-vsctl show | grep -c Port
I've ensured there aren't any ports/interfaces in OVS that are no longer in
existence on the host.
We've seen this once before on this host and remedied it by restarting OVS
but that is obviously disruptive to our production workloads so would like
to understand what is happening.
I checked a few other hosts in our fleet and I've found a mixture - a few
that have been up for years have ~10-40k descriptors, many have more like
600. There seems to be no correlation between actual number of ovs ports
and the number of descriptors.
If anyone has any suggestions for where to look or has seen this before,
please let us know. I've found very little online or on this list that
seems directly relevant.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss