[ovs-discuss] ovs-vswitchd segment fault when set controller_rate_limit

Tony van der Peet Tony.vanderPeet at alliedtelesis.co.nz
Thu Feb 23 21:34:32 UTC 2017


This is in response to a posting by Xu Wen on 26/Jan/2016. He reported a crash but as far as I can tell, there was never a response.


We had the same issue ourselves, and I have just submitted a patch to fix it. Here's my analysis.


What happens is that when many packet-in messages are generated, the pinqueues will have controller_burst_limit packets in them, and due to the discard policy (pick the queue with the most packet-ins on it) all queues will eventually have the same length. There is one queue per port. When packets are discarded, the longest queue is found, and if the length of this queue is 1 going to 0, the queue is destroyed.


>From the above, you will see that this will only ever happen if the number of ports (queues) is greater than or equal to controller_burst_limit. The crash happens when the next_txq pointer just happens to be pointing to the destroyed queue, because it wasn't being cleared. Of the two places that call the destroy queue function, one should be OK, because it's managing next_txq, it's the other one that causes the problem.


Xu Wen, if you see this email (1 year later!) could you confirm that your setup had 200+ ports on it?


Cheers

Tony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170223/988b556f/attachment-0001.html>


More information about the discuss mailing list