[ovs-discuss] OpenFlow port number leak causing OVN GW data-plane down
blp at ovn.org
Fri Nov 9 23:11:38 UTC 2018
On Fri, Nov 09, 2018 at 03:06:49PM -0800, Han Zhou wrote:
> On Fri, Nov 9, 2018 at 2:34 PM Ben Pfaff <blp at ovn.org> wrote:
> > On Wed, Nov 07, 2018 at 11:01:20PM -0800, Han Zhou wrote:
> > > Now comes to my question. The time when all the GW BFD status went down
> > > matches perfectly with the time when the port number 65535 is used.
> > > However, I still didn't understand why would using the port number 65535
> > > cause BFD status down on all tunnels (to other GWs and all hypervisors).
> > > Could someone help explain here, so that we are confident that there is
> > > other potential problems?
> > It's not obvious to me why it would cause a BFD problem. Is it
> > difficult to look into it?
> It was on a live environment. It was recovered after quickly restart OVS.
> From the logs I can't find out more hints. In a test environment I could
> reproduced the port number 65535 problem easily but it didn't triggered the
> tunnel BFD status down problem. I may try more to reproduce and debug, but
> in general what could cause all BFD status down (while network connection
> to the node is fine).
My first thought is something that keeps the BFD thread from receiving
or sending BFD packets. Maybe the BFD thread is confused by the
out-of-range port number somehow.
More information about the discuss