[ovs-discuss] controller's role mismatch?

Ben Pfaff blp at ovn.org
Tue May 10 17:16:31 UTC 2016


I'm unlikely to continue along either of these paths because both of
them are quite onerous for me.  I think it's probably better if you
debug the problem yourself.

If you have a simpler reproduction case then I'm happy to work on that,
but I'm not going to install a 3-node ODL cluster to try to debug the
problem.

It sounds like you're not really a developer, so I guess that's part of
the problem.

On Tue, May 10, 2016 at 04:40:23PM +0000, Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) wrote:
> Hi.
> 
> How will we continue? I havent noticed your answer.
> 
> Peter Gubka
> 
> -----Original Message-----
> From: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) 
> Sent: Wednesday, May 04, 2016 9:00 AM
> To: 'Ben Pfaff' <blp at ovn.org>
> Cc: bugs at openvswitch.org
> Subject: RE: [ovs-discuss] controller's role mismatch?
> 
> Hi.
> 
> 1) For you to reproduce the problem:
> 
> Download odl controller and install it in cluster way, you will need 3 nodes(VMs probably or docker containers) . But if you have't done it before, it can be time consuming and you'll not be sure if all is well configured. https://nexus.opendaylight.org/content/repositories/opendaylight.release/org/opendaylight/integration/distribution-karaf/0.4.1-Beryllium-SR1/distribution-karaf-0.4.1-Beryllium-SR1.zip
> Then follow steps i wrote at the beginnig(ovs-vsctl commands). If you want, i can prepare python script for you to test automatically, but you'll have to wait 1-2 days.
> 
> 2) For me to reproduce the problem:
> Build for me rpm for fedora 22(preferably) or 23 with improved logging.
> 
> 3) Send me zipped ovs with your changes with steps how to build. I dont like to build things if i dont have to, but i can do it.  I will use f22 for building,
> 
> 
> Peter Gubka
> 
> 
> 
> -----Original Message-----
> From: Ben Pfaff [mailto:blp at ovn.org] 
> Sent: Tuesday, May 03, 2016 7:35 PM
> To: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) <pgubka at cisco.com>
> Cc: bugs at openvswitch.org
> Subject: Re: [ovs-discuss] controller's role mismatch?
> 
> On Tue, May 03, 2016 at 07:02:45AM +0000, Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) wrote:
> > Are you sure about "reconnecting" switches? As i wrote before , to reproduce the problem, i had to use 2 switches/bridges.
> 
> You're right, now that I look again.  I missed the differences between the connections.
> 
> > $ grep -r rconn ovs-vswitchd.log | grep 6653
> > 2016-04-22T08:48:52.725Z|00022|rconn|INFO|s2<->tcp:127.0.0.1:6653: connecting...
> > 2016-04-22T08:48:52.726Z|00023|rconn|WARN|s2<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:52.726Z|00024|rconn|INFO|s2<->tcp:127.0.0.1:6653: 
> > waiting 1 seconds before reconnect
> > 2016-04-22T08:48:52.726Z|00029|rconn|INFO|s1<->tcp:127.0.0.1:6653: connecting...
> > 2016-04-22T08:48:52.726Z|00030|rconn|WARN|s1<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:52.726Z|00031|rconn|INFO|s1<->tcp:127.0.0.1:6653: 
> > waiting 1 seconds before reconnect
> > 2016-04-22T08:48:52.811Z|00032|rconn|WARN|s2<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:52.811Z|00033|rconn|WARN|s1<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:53.317Z|00070|rconn|INFO|s1<->tcp:10.25.2.14:6653: connecting...
> > 2016-04-22T08:48:53.330Z|00075|rconn|INFO|s1<->tcp:10.25.2.14:6653: 
> > connected
> > 2016-04-22T08:48:53.449Z|00085|rconn|INFO|s2<->tcp:10.25.2.13:6653: connecting...
> > 2016-04-22T08:48:53.459Z|00090|rconn|INFO|s2<->tcp:10.25.2.13:6653: 
> > connected
> > 2016-04-22T08:48:56.690Z|00184|rconn|INFO|s1<->tcp:10.25.2.12:6653: connecting...
> > 2016-04-22T08:48:56.706Z|00189|rconn|INFO|s1<->tcp:10.25.2.12:6653: 
> > connected
> > 2016-04-22T08:48:56.854Z|00199|rconn|INFO|s1<->tcp:10.25.2.13:6653: connecting...
> > 2016-04-22T08:48:56.865Z|00204|rconn|INFO|s1<->tcp:10.25.2.13:6653: 
> > connected
> > 2016-04-22T08:48:57.039Z|00214|rconn|INFO|s2<->tcp:10.25.2.12:6653: connecting...
> > 2016-04-22T08:48:57.049Z|00219|rconn|INFO|s2<->tcp:10.25.2.12:6653: 
> > connected
> > 2016-04-22T08:48:57.184Z|00229|rconn|INFO|s2<->tcp:10.25.2.14:6653: connecting...
> > 2016-04-22T08:48:57.199Z|00234|rconn|INFO|s2<->tcp:10.25.2.14:6653: 
> > connected
> > 
> > There is only 6x "connected", so i believe that was no reconnection. 2 bridges with 3 controllers each.
> > 1)  Around time 08:48:53  14 became master s1 and 13 for s2
> > 2) After time 08:48:56  i setup 2 more controllers for both s1 (12,13) and s2(12,14).
> > 
> > How do i know if i see "vconn|DBG|tcp:10.25.2.14:6653: received: OFPT_ROLE_REQUEST (OF1.3) " if it is a request towards s1 or s2?
> 
> You can't tell.  This hasn't been an issue for me before, so probably, as one outcome here, we should improve the logging.
> 
> I can think of two different ways to hunt down what you're seeing.  One way would be for you to explain to me some simple way to reproduce it.
> If I can have that, I'm willing to spend some time trying to find or explain the problem.  The other way would be to suggest some places that you can add additional logging to OVS, which would help to explain what is going on.  That will probably take more back-and-forth and trial and error, but it wouldn't require that I be able to reproduce the problem here.  What's your preference?



More information about the discuss mailing list