[ovs-discuss] controller's role mismatch?

Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) pgubka at cisco.com
Tue May 10 16:40:23 UTC 2016


Hi.

How will we continue? I havent noticed your answer.

Peter Gubka

-----Original Message-----
From: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) 
Sent: Wednesday, May 04, 2016 9:00 AM
To: 'Ben Pfaff' <blp at ovn.org>
Cc: bugs at openvswitch.org
Subject: RE: [ovs-discuss] controller's role mismatch?

Hi.

1) For you to reproduce the problem:

Download odl controller and install it in cluster way, you will need 3 nodes(VMs probably or docker containers) . But if you have't done it before, it can be time consuming and you'll not be sure if all is well configured. https://nexus.opendaylight.org/content/repositories/opendaylight.release/org/opendaylight/integration/distribution-karaf/0.4.1-Beryllium-SR1/distribution-karaf-0.4.1-Beryllium-SR1.zip
Then follow steps i wrote at the beginnig(ovs-vsctl commands). If you want, i can prepare python script for you to test automatically, but you'll have to wait 1-2 days.

2) For me to reproduce the problem:
Build for me rpm for fedora 22(preferably) or 23 with improved logging.

3) Send me zipped ovs with your changes with steps how to build. I dont like to build things if i dont have to, but i can do it.  I will use f22 for building,


Peter Gubka



-----Original Message-----
From: Ben Pfaff [mailto:blp at ovn.org] 
Sent: Tuesday, May 03, 2016 7:35 PM
To: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) <pgubka at cisco.com>
Cc: bugs at openvswitch.org
Subject: Re: [ovs-discuss] controller's role mismatch?

On Tue, May 03, 2016 at 07:02:45AM +0000, Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) wrote:
> Are you sure about "reconnecting" switches? As i wrote before , to reproduce the problem, i had to use 2 switches/bridges.

You're right, now that I look again.  I missed the differences between the connections.

> $ grep -r rconn ovs-vswitchd.log | grep 6653
> 2016-04-22T08:48:52.725Z|00022|rconn|INFO|s2<->tcp:127.0.0.1:6653: connecting...
> 2016-04-22T08:48:52.726Z|00023|rconn|WARN|s2<->tcp:127.0.0.1:6653: 
> connection failed (Connection refused)
> 2016-04-22T08:48:52.726Z|00024|rconn|INFO|s2<->tcp:127.0.0.1:6653: 
> waiting 1 seconds before reconnect
> 2016-04-22T08:48:52.726Z|00029|rconn|INFO|s1<->tcp:127.0.0.1:6653: connecting...
> 2016-04-22T08:48:52.726Z|00030|rconn|WARN|s1<->tcp:127.0.0.1:6653: 
> connection failed (Connection refused)
> 2016-04-22T08:48:52.726Z|00031|rconn|INFO|s1<->tcp:127.0.0.1:6653: 
> waiting 1 seconds before reconnect
> 2016-04-22T08:48:52.811Z|00032|rconn|WARN|s2<->tcp:127.0.0.1:6653: 
> connection failed (Connection refused)
> 2016-04-22T08:48:52.811Z|00033|rconn|WARN|s1<->tcp:127.0.0.1:6653: 
> connection failed (Connection refused)
> 2016-04-22T08:48:53.317Z|00070|rconn|INFO|s1<->tcp:10.25.2.14:6653: connecting...
> 2016-04-22T08:48:53.330Z|00075|rconn|INFO|s1<->tcp:10.25.2.14:6653: 
> connected
> 2016-04-22T08:48:53.449Z|00085|rconn|INFO|s2<->tcp:10.25.2.13:6653: connecting...
> 2016-04-22T08:48:53.459Z|00090|rconn|INFO|s2<->tcp:10.25.2.13:6653: 
> connected
> 2016-04-22T08:48:56.690Z|00184|rconn|INFO|s1<->tcp:10.25.2.12:6653: connecting...
> 2016-04-22T08:48:56.706Z|00189|rconn|INFO|s1<->tcp:10.25.2.12:6653: 
> connected
> 2016-04-22T08:48:56.854Z|00199|rconn|INFO|s1<->tcp:10.25.2.13:6653: connecting...
> 2016-04-22T08:48:56.865Z|00204|rconn|INFO|s1<->tcp:10.25.2.13:6653: 
> connected
> 2016-04-22T08:48:57.039Z|00214|rconn|INFO|s2<->tcp:10.25.2.12:6653: connecting...
> 2016-04-22T08:48:57.049Z|00219|rconn|INFO|s2<->tcp:10.25.2.12:6653: 
> connected
> 2016-04-22T08:48:57.184Z|00229|rconn|INFO|s2<->tcp:10.25.2.14:6653: connecting...
> 2016-04-22T08:48:57.199Z|00234|rconn|INFO|s2<->tcp:10.25.2.14:6653: 
> connected
> 
> There is only 6x "connected", so i believe that was no reconnection. 2 bridges with 3 controllers each.
> 1)  Around time 08:48:53  14 became master s1 and 13 for s2
> 2) After time 08:48:56  i setup 2 more controllers for both s1 (12,13) and s2(12,14).
> 
> How do i know if i see "vconn|DBG|tcp:10.25.2.14:6653: received: OFPT_ROLE_REQUEST (OF1.3) " if it is a request towards s1 or s2?

You can't tell.  This hasn't been an issue for me before, so probably, as one outcome here, we should improve the logging.

I can think of two different ways to hunt down what you're seeing.  One way would be for you to explain to me some simple way to reproduce it.
If I can have that, I'm willing to spend some time trying to find or explain the problem.  The other way would be to suggest some places that you can add additional logging to OVS, which would help to explain what is going on.  That will probably take more back-and-forth and trial and error, but it wouldn't require that I be able to reproduce the problem here.  What's your preference?



More information about the discuss mailing list