[ovs-discuss] OpenFlow port number leak causing OVN GW data-plane down

Han Zhou zhouhan at gmail.com
Tue Nov 13 01:34:22 UTC 2018


On Fri, Nov 9, 2018 at 3:11 PM Ben Pfaff <blp at ovn.org> wrote:
>
> On Fri, Nov 09, 2018 at 03:06:49PM -0800, Han Zhou wrote:
> > On Fri, Nov 9, 2018 at 2:34 PM Ben Pfaff <blp at ovn.org> wrote:
> > >
> > > On Wed, Nov 07, 2018 at 11:01:20PM -0800, Han Zhou wrote:
> > > > Now comes to my question. The time when all the GW BFD status went
down
> > > > matches perfectly with the time when the port number 65535 is used.
> > > > However, I still didn't understand why would using the port number
65535
> > > > cause BFD status down on all tunnels (to other GWs and all
hypervisors).
> > > > Could someone help explain here, so that we are confident that
there is
> > no
> > > > other potential problems?
> > >
> > > It's not obvious to me why it would cause a BFD problem.  Is it
> > > difficult to look into it?
> >
> > It was on a live environment. It was recovered after quickly restart
OVS.
> > From the logs I can't find out more hints. In a test environment I could
> > reproduced the port number 65535 problem easily but it didn't triggered
the
> > tunnel BFD status down problem. I may try more to reproduce and debug,
but
> > in general what could cause all BFD status down (while network
connection
> > to the node is fine).
>
> My first thought is something that keeps the BFD thread from receiving
> or sending BFD packets.  Maybe the BFD thread is confused by the
> out-of-range port number somehow.

Sorry that I didn't have time to dig more about the link between the
out-of-range port number and the BFD problem. I de-prioritized this since
the problem is now fixed. (In addition, I observed on hypervisors that has
this port number 65535 allocated followed by OVS restart after a while, so
there are different behaviors resulted from the out-of-range port).

Now as a follow up, I submitted a fix to avoid the duplicated chassis IP
problem:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/353855.html

I didn't go ahead to update ovn-controller to detect and remove the old
entry, because it is violating the RBAC design.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20181112/7389d98f/attachment-0001.html>


More information about the discuss mailing list