[ovs-discuss] controller's role mismatch?

Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) pgubka at cisco.com
Tue May 3 07:02:45 UTC 2016


Hi.
Are you sure about "reconnecting" switches? As i wrote before , to reproduce the problem, i had to use 2 switches/bridges.

$ grep -r rconn ovs-vswitchd.log | grep 6653
2016-04-22T08:48:52.725Z|00022|rconn|INFO|s2<->tcp:127.0.0.1:6653: connecting...
2016-04-22T08:48:52.726Z|00023|rconn|WARN|s2<->tcp:127.0.0.1:6653: connection failed (Connection refused)
2016-04-22T08:48:52.726Z|00024|rconn|INFO|s2<->tcp:127.0.0.1:6653: waiting 1 seconds before reconnect
2016-04-22T08:48:52.726Z|00029|rconn|INFO|s1<->tcp:127.0.0.1:6653: connecting...
2016-04-22T08:48:52.726Z|00030|rconn|WARN|s1<->tcp:127.0.0.1:6653: connection failed (Connection refused)
2016-04-22T08:48:52.726Z|00031|rconn|INFO|s1<->tcp:127.0.0.1:6653: waiting 1 seconds before reconnect
2016-04-22T08:48:52.811Z|00032|rconn|WARN|s2<->tcp:127.0.0.1:6653: connection failed (Connection refused)
2016-04-22T08:48:52.811Z|00033|rconn|WARN|s1<->tcp:127.0.0.1:6653: connection failed (Connection refused)
2016-04-22T08:48:53.317Z|00070|rconn|INFO|s1<->tcp:10.25.2.14:6653: connecting...
2016-04-22T08:48:53.330Z|00075|rconn|INFO|s1<->tcp:10.25.2.14:6653: connected
2016-04-22T08:48:53.449Z|00085|rconn|INFO|s2<->tcp:10.25.2.13:6653: connecting...
2016-04-22T08:48:53.459Z|00090|rconn|INFO|s2<->tcp:10.25.2.13:6653: connected
2016-04-22T08:48:56.690Z|00184|rconn|INFO|s1<->tcp:10.25.2.12:6653: connecting...
2016-04-22T08:48:56.706Z|00189|rconn|INFO|s1<->tcp:10.25.2.12:6653: connected
2016-04-22T08:48:56.854Z|00199|rconn|INFO|s1<->tcp:10.25.2.13:6653: connecting...
2016-04-22T08:48:56.865Z|00204|rconn|INFO|s1<->tcp:10.25.2.13:6653: connected
2016-04-22T08:48:57.039Z|00214|rconn|INFO|s2<->tcp:10.25.2.12:6653: connecting...
2016-04-22T08:48:57.049Z|00219|rconn|INFO|s2<->tcp:10.25.2.12:6653: connected
2016-04-22T08:48:57.184Z|00229|rconn|INFO|s2<->tcp:10.25.2.14:6653: connecting...
2016-04-22T08:48:57.199Z|00234|rconn|INFO|s2<->tcp:10.25.2.14:6653: connected

There is only 6x "connected", so i believe that was no reconnection. 2 bridges with 3 controllers each.
1)  Around time 08:48:53  14 became master s1 and 13 for s2
2) After time 08:48:56  i setup 2 more controllers for both s1 (12,13) and s2(12,14).

How do i know if i see "vconn|DBG|tcp:10.25.2.14:6653: received: OFPT_ROLE_REQUEST (OF1.3) " if it is a request towards s1 or s2?

Peter Gubka

-----Original Message-----
From: Ben Pfaff [mailto:blp at ovn.org] 
Sent: Monday, May 02, 2016 11:14 PM
To: Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) <pgubka at cisco.com>
Cc: bugs at openvswitch.org
Subject: Re: [ovs-discuss] controller's role mismatch?

On Fri, Apr 22, 2016 at 09:32:26AM +0000, Peter Gubka -X (pgubka - PANTHEON TECHNOLOGIES at Cisco) wrote:
> Hello,
> 
> I had to use 2 switches/bridges to reproduce the problem. Logs in attachments.
> 
> Just for the time orientation:
> Enabling 2 masters for 2 switches (controller firstly sent slave 
> automatically, and when it finds out that it is the first connection 
> from that device, it sends master then)

Thanks for the logs.  Here is my interpretation.

First, 14 makes itself master:

vconn|DBG|tcp:10.25.2.14:6653: received: OFPT_ROLE_REQUEST (OF1.3) 
vconn|DBG|(xid=0x3): role=master generation_id=1
vconn|DBG|tcp:10.25.2.14:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) 
vconn|DBG|(xid=0x3): role=master generation_id=1
vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) 
vconn|DBG|(xid=0x3): role=nochange
vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) 
vconn|DBG|(xid=0x3): role=slave generation_id=0

Then 13 makes itself master:

vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) 
vconn|DBG|(xid=0x4): role=master generation_id=1
vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) 
vconn|DBG|(xid=0x4): role=master generation_id=1
rconn|INFO|s1<->tcp:10.25.2.12:6653: connected
vconn|DBG|tcp:10.25.2.12:6653: received: OFPT_ROLE_REQUEST (OF1.3) 
vconn|DBG|(xid=0x0): role=nochange
vconn|DBG|tcp:10.25.2.12:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) 
vconn|DBG|(xid=0x0): role=equal generation_id=1
vconn|DBG|tcp:10.25.2.12:6653: received: OFPT_ROLE_REQUEST (OF1.3) 
vconn|DBG|(xid=0x1): role=slave generation_id=2
vconn|DBG|tcp:10.25.2.12:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) 
vconn|DBG|(xid=0x1): role=slave generation_id=2

Then 13 drops the connection and reconnects.  Therefore it's initially "equal" and there's no master:

rconn|INFO|s1<->tcp:10.25.2.13:6653: connected
vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) 
vconn|DBG|(xid=0x0): role=nochange
vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) 
vconn|DBG|(xid=0x0): role=equal generation_id=2

Then 13 requests that it become a "slave" and there's still no master:

vconn|DBG|tcp:10.25.2.13:6653: received: OFPT_ROLE_REQUEST (OF1.3) 
vconn|DBG|(xid=0x1): role=slave generation_id=3
vconn|DBG|tcp:10.25.2.13:6653: sent (Success): OFPT_ROLE_REPLY (OF1.3) 
vconn|DBG|(xid=0x1): role=slave generation_id=3

And no controller ever after in the logs asks to become master, so there's never any master.



More information about the discuss mailing list