[ovs-discuss] controller's role mismatch?

Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) pgubka at cisco.com
Wed May 11 06:38:40 UTC 2016


Hi.
Sounds like you're not a developer eighter. I hoped at least to prepare a patch or to send me a zip of patched ovs sources may be ok for you (my proposal #3).
Your proposal was  " The other way would be to suggest some places that you can add additional logging to OVS, which would help to explain what is going on.  " , so lets go this way. 

Peter Gubka 


-----Original Message-----
From: Ben Pfaff [mailto:blp at ovn.org] 
Sent: Tuesday, May 10, 2016 7:17 PM
To: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) <pgubka at cisco.com>
Cc: bugs at openvswitch.org
Subject: Re: [ovs-discuss] controller's role mismatch?

I'm unlikely to continue along either of these paths because both of them are quite onerous for me.  I think it's probably better if you debug the problem yourself.

If you have a simpler reproduction case then I'm happy to work on that, but I'm not going to install a 3-node ODL cluster to try to debug the problem.

It sounds like you're not really a developer, so I guess that's part of the problem.

On Tue, May 10, 2016 at 04:40:23PM +0000, Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) wrote:
> Hi.
> 
> How will we continue? I havent noticed your answer.
> 
> Peter Gubka
> 
> -----Original Message-----
> From: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco)
> Sent: Wednesday, May 04, 2016 9:00 AM
> To: 'Ben Pfaff' <blp at ovn.org>
> Cc: bugs at openvswitch.org
> Subject: RE: [ovs-discuss] controller's role mismatch?
> 
> Hi.
> 
> 1) For you to reproduce the problem:
> 
> Download odl controller and install it in cluster way, you will need 3 
> nodes(VMs probably or docker containers) . But if you have't done it 
> before, it can be time consuming and you'll not be sure if all is well 
> configured. 
> https://nexus.opendaylight.org/content/repositories/opendaylight.relea
> se/org/opendaylight/integration/distribution-karaf/0.4.1-Beryllium-SR1
> /distribution-karaf-0.4.1-Beryllium-SR1.zip
> Then follow steps i wrote at the beginnig(ovs-vsctl commands). If you want, i can prepare python script for you to test automatically, but you'll have to wait 1-2 days.
> 
> 2) For me to reproduce the problem:
> Build for me rpm for fedora 22(preferably) or 23 with improved logging.
> 
> 3) Send me zipped ovs with your changes with steps how to build. I 
> dont like to build things if i dont have to, but i can do it.  I will 
> use f22 for building,
> 
> 
> Peter Gubka
> 
> 
> 
> -----Original Message-----
> From: Ben Pfaff [mailto:blp at ovn.org]
> Sent: Tuesday, May 03, 2016 7:35 PM
> To: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) 
> <pgubka at cisco.com>
> Cc: bugs at openvswitch.org
> Subject: Re: [ovs-discuss] controller's role mismatch?
> 
> On Tue, May 03, 2016 at 07:02:45AM +0000, Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) wrote:
> > Are you sure about "reconnecting" switches? As i wrote before , to reproduce the problem, i had to use 2 switches/bridges.
> 
> You're right, now that I look again.  I missed the differences between the connections.
> 
> > $ grep -r rconn ovs-vswitchd.log | grep 6653
> > 2016-04-22T08:48:52.725Z|00022|rconn|INFO|s2<->tcp:127.0.0.1:6653: connecting...
> > 2016-04-22T08:48:52.726Z|00023|rconn|WARN|s2<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:52.726Z|00024|rconn|INFO|s2<->tcp:127.0.0.1:6653: 
> > waiting 1 seconds before reconnect
> > 2016-04-22T08:48:52.726Z|00029|rconn|INFO|s1<->tcp:127.0.0.1:6653: connecting...
> > 2016-04-22T08:48:52.726Z|00030|rconn|WARN|s1<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:52.726Z|00031|rconn|INFO|s1<->tcp:127.0.0.1:6653: 
> > waiting 1 seconds before reconnect
> > 2016-04-22T08:48:52.811Z|00032|rconn|WARN|s2<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:52.811Z|00033|rconn|WARN|s1<->tcp:127.0.0.1:6653: 
> > connection failed (Connection refused)
> > 2016-04-22T08:48:53.317Z|00070|rconn|INFO|s1<->tcp:10.25.2.14:6653: connecting...
> > 2016-04-22T08:48:53.330Z|00075|rconn|INFO|s1<->tcp:10.25.2.14:6653: 
> > connected
> > 2016-04-22T08:48:53.449Z|00085|rconn|INFO|s2<->tcp:10.25.2.13:6653: connecting...
> > 2016-04-22T08:48:53.459Z|00090|rconn|INFO|s2<->tcp:10.25.2.13:6653: 
> > connected
> > 2016-04-22T08:48:56.690Z|00184|rconn|INFO|s1<->tcp:10.25.2.12:6653: connecting...
> > 2016-04-22T08:48:56.706Z|00189|rconn|INFO|s1<->tcp:10.25.2.12:6653: 
> > connected
> > 2016-04-22T08:48:56.854Z|00199|rconn|INFO|s1<->tcp:10.25.2.13:6653: connecting...
> > 2016-04-22T08:48:56.865Z|00204|rconn|INFO|s1<->tcp:10.25.2.13:6653: 
> > connected
> > 2016-04-22T08:48:57.039Z|00214|rconn|INFO|s2<->tcp:10.25.2.12:6653: connecting...
> > 2016-04-22T08:48:57.049Z|00219|rconn|INFO|s2<->tcp:10.25.2.12:6653: 
> > connected
> > 2016-04-22T08:48:57.184Z|00229|rconn|INFO|s2<->tcp:10.25.2.14:6653: connecting...
> > 2016-04-22T08:48:57.199Z|00234|rconn|INFO|s2<->tcp:10.25.2.14:6653: 
> > connected
> > 
> > There is only 6x "connected", so i believe that was no reconnection. 2 bridges with 3 controllers each.
> > 1)  Around time 08:48:53  14 became master s1 and 13 for s2
> > 2) After time 08:48:56  i setup 2 more controllers for both s1 (12,13) and s2(12,14).
> > 
> > How do i know if i see "vconn|DBG|tcp:10.25.2.14:6653: received: OFPT_ROLE_REQUEST (OF1.3) " if it is a request towards s1 or s2?
> 
> You can't tell.  This hasn't been an issue for me before, so probably, as one outcome here, we should improve the logging.
> 
> I can think of two different ways to hunt down what you're seeing.  One way would be for you to explain to me some simple way to reproduce it.
> If I can have that, I'm willing to spend some time trying to find or explain the problem.  The other way would be to suggest some places that you can add additional logging to OVS, which would help to explain what is going on.  That will probably take more back-and-forth and trial and error, but it wouldn't require that I be able to reproduce the problem here.  What's your preference?



More information about the discuss mailing list