[ovs-discuss] [OVN] random BFD timeouts between chassis

Krzysztof Klimonda kklimonda at syntaxhighlighted.com
Tue Mar 16 14:34:10 UTC 2021


Hi,

I'm trying to track down some issue resulting in BFD session timeouts in our deployment.

What I'm seeing is that (seemingly) randomly one chassis stops sending BFD packets to some of its neighbors (seemingly one at a time, and it seems one chassis is more prone to that behavior currently). After timeout is reached, neighbor signals that the session is down, and they re-establish it promptly. I've captured BFD packets on both chassis and it seems that one chassis stops sending its BFD packets, or at least they are not showing up on the wire. At the same time I can see incoming BFD packets from the neighbor so seemingly it's not an underlying networking issue that is causing it.

There is nothing BFD related in the logs until session is torn down by the neighbor, the only correlated logs I can see right now is constant messages like that in syslog:

```
ovs-system: deferred action limit reached, drop recirc action
```

Those seem to be caused by a constant barrage of ARP requests (500-600/s) coming from the external network router for IP addresses that are not currently in use. That seems to be putting some extra load on ovs-vswitchd process, but seemingly nowhere enough to stop it from processing other packets (ovs-vswitchd logs don't report increased CPU usage).

openvswitch version: 2.11.0
ovn version: 20.09.90 (a build from 20.09 branch from 2020.12.07)

-- 
  Krzysztof Klimonda
  kklimonda at syntaxhighlighted.com


More information about the discuss mailing list