[ovs-discuss] OpenFlow port number leak causing OVN GW data-plane down
zhouhan at gmail.com
Tue Nov 13 01:34:22 UTC 2018
On Fri, Nov 9, 2018 at 3:11 PM Ben Pfaff <blp at ovn.org> wrote:
> On Fri, Nov 09, 2018 at 03:06:49PM -0800, Han Zhou wrote:
> > On Fri, Nov 9, 2018 at 2:34 PM Ben Pfaff <blp at ovn.org> wrote:
> > >
> > > On Wed, Nov 07, 2018 at 11:01:20PM -0800, Han Zhou wrote:
> > > > Now comes to my question. The time when all the GW BFD status went
> > > > matches perfectly with the time when the port number 65535 is used.
> > > > However, I still didn't understand why would using the port number
> > > > cause BFD status down on all tunnels (to other GWs and all
> > > > Could someone help explain here, so that we are confident that
> > no
> > > > other potential problems?
> > >
> > > It's not obvious to me why it would cause a BFD problem. Is it
> > > difficult to look into it?
> > It was on a live environment. It was recovered after quickly restart
> > From the logs I can't find out more hints. In a test environment I could
> > reproduced the port number 65535 problem easily but it didn't triggered
> > tunnel BFD status down problem. I may try more to reproduce and debug,
> > in general what could cause all BFD status down (while network
> > to the node is fine).
> My first thought is something that keeps the BFD thread from receiving
> or sending BFD packets. Maybe the BFD thread is confused by the
> out-of-range port number somehow.
Sorry that I didn't have time to dig more about the link between the
out-of-range port number and the BFD problem. I de-prioritized this since
the problem is now fixed. (In addition, I observed on hypervisors that has
this port number 65535 allocated followed by OVS restart after a while, so
there are different behaviors resulted from the out-of-range port).
Now as a follow up, I submitted a fix to avoid the duplicated chassis IP
I didn't go ahead to update ovn-controller to detect and remove the old
entry, because it is violating the RBAC design.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss