[ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch

Tony Liu tonyliu0592 at hotmail.com
Fri Jul 24 02:50:54 UTC 2020


Hi Han,

Thanks for taking the time to look into this. This problem is not consistently reproduced.
Developers normally ignore it:) I think we collected enough context and we can let it go for now.
I will rebuild setup, tune that RAFT heartbeat timer and rerun the test. Will keep you posted.


Thanks again!

Tony

________________________________
From: Han Zhou <zhouhan at gmail.com>
Sent: July 23, 2020 06:53 PM
To: Tony Liu <tonyliu0592 at hotmail.com>; Ben Pfaff <blp at ovn.org>
Cc: Numan Siddique <numans at ovn.org>; ovs-dev <ovs-dev at openvswitch.org>; ovs-discuss at openvswitch.org <ovs-discuss at openvswitch.org>
Subject: Re: [ovs-dev] OVN: Two datapath-bindings are created for the same logical-switch


On Thu, Jul 23, 2020 at 10:33 AM Tony Liu <tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com>> wrote:
>
> Changed the title for this specific problem.
> I looked into logs and have more findings.
> The problem was happening when sb-db leader switched.

Hi Tony,

Thanks for this detailed information. Could you confirm which version of OVS is used (to understand OVSDB behavior).

>
> For ovsdb cluster, what may trigger the leader switch? Given the log,
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired, starting election
> The election is asked by a follower node. Is that because the connection from follower to leader timeout,
> then follower assumes the leader is dead and starts an election?

You are right, the RAFT heart beat would timeout when server is too busy and the election timer is too small (default 1s). For large scale test, please increase the election timer by:
ovn-appctl -t <ovn-sb> cluster/change-election-timer OVN_Southbound <time>

I suggest to set <time> to be at least bigger than 10000 or more in your case. (you need to increase the value gradually - 2000, 4000, 8000, 16000 - so it will take you 4 commands to reach this from the initial default value 1000, not very convenient, I know :)

<ovn-sb> here is the path to the socket ctl file of ovn-sb, usually under /var/run/ovn.

>
> For ovn-northd (3 instances), they all connect to the sb-db leader, whoever has the locker is the master.
> When sb-db leader switches, all ovn-northd instances look for the new leader. In this case, there is no
> guarantee that the old ovn-northd master remains the role, other ovn-northd instance may find the
> leader and acquire the lock first. So, the sb-db leader switch may also cause ovn-northd master switch.
> Such switch may happen in the middle of ovn-northd transaction, in that case, is there any guarantee to
> the transaction completeness? My guess is that, the older created a datapath-binding for a logical-switch,
> switch happened when this transaction is not completed, then the new master/leader created another
> data-path binding for the same logical-switch. Does it make any sense?

I agree with you it could be related to the failover and the lock behavior during the failover. It could be a lock problem causing 2 northds became active at the same time for a short moment. However, I still can't imagine how the duplicated entries are created with different tunnel keys. If both northd create the datapath binding for the same LS at the same time, they should allocate the same tunnel key, and then one of them should fail during the transaction commit because of index conflict in DB. But here they have different keys so both were inserted in DB.

(OVSDB transaction is atomic even during failover and no client should see partial data of a transaction.)

(cc Ben to comment more on the possibility of both clients acquiring the lock during failover)

>
> From the log, when sb-db switched, ovn-northd master connected to the new leader and lost the master,
> but immediately, it acquired the lock and become master again. Not sure how this happened.

>From the ovn-northd logs, the ovn-northd on .86 firstly connected to SB DB on .85, which suggests that it regarded .85 as the leader (otherwise it would disconnect and retry another server), and then immediately after connecting .85 and acquiring the lock, it disconnected because it somehow noticed that .85 is not the leader, and then retried and connected to .86 (the new leader) and found out that the lock is already acquired by .85 northd, so it switched to standby. The .85 northd luckly connected to .86 in the first try so it was able to acquire the lock on the leader node first. Maybe the key thing is to figure out why the .86 northd initially connected to .85 DB which is not the leader and acquired lock.

Thanks,
Han

>
> Here are some loggings.
> ==== .84 sb-db leader =================
> 2020-07-21T01:08:20.221Z|01408|raft|INFO|current entry eid 639238ba-bc00-4efe-bb66-6ac766bb5f4b does not match prerequisite 78e8e167-8b4c-4292-8e25-d9975631b010 in execute_command_request
>
> 2020-07-21T01:08:38.450Z|01409|timeval|WARN|Unreasonably long 1435ms poll interval (1135ms user, 43ms system)
> 2020-07-21T01:08:38.451Z|01410|timeval|WARN|faults: 5942 minor, 0 major
> 2020-07-21T01:08:38.451Z|01411|timeval|WARN|disk: 0 reads, 50216 writes
> 2020-07-21T01:08:38.452Z|01412|timeval|WARN|context switches: 60 voluntary, 25 involuntary
> 2020-07-21T01:08:38.453Z|01413|coverage|INFO|Skipping details of duplicate event coverage for hash=45329980
>
> 2020-07-21T01:08:38.455Z|01414|raft|WARN|ignoring vote request received as leader
> 2020-07-21T01:08:38.456Z|01415|raft|INFO|server 1f9e is leader for term 2
> 2020-07-21T01:08:38.457Z|01416|raft|INFO|rejected append_reply (not leader)
> 2020-07-21T01:08:38.471Z|01417|raft|INFO|rejected append_reply (not leader)
>
> 2020-07-21T01:23:00.890Z|01418|timeval|WARN|Unreasonably long 1336ms poll interval (1102ms user, 20ms system)
>
> ==== .85 sb-db ==================
> 2020-07-21T01:05:25.734Z|00061|coverage|INFO|Skipping details of duplicate event coverage for hash=a1e16f5b
>
> 2020-07-21T01:08:38.166Z|00062|raft|INFO|server 1f9e is leader for term 2
> 2020-07-21T01:08:38.455Z|00063|raft|INFO|rejecting term 1 < current term 2 received in append_request message from server 39a9
>
> 2020-07-21T01:23:44.500Z|00064|timeval|WARN|Unreasonably long 1447ms poll interval (1118ms user, 33ms system)
>
> ==== .86 sb-db =================
> 2020-07-21T01:02:35.607Z|00073|coverage|INFO|70 events never hit
>
> 2020-07-21T01:08:38.119Z|00074|raft|INFO|term 2: 1135 ms timeout expired, starting election
> 2020-07-21T01:08:38.165Z|00075|raft|INFO|term 2: elected leader by 2+ of 3 servers
> 2020-07-21T01:08:38.455Z|00076|raft|INFO|rejecting term 1 < current term 2 received in append_request message from server 39a9
> 2020-07-21T01:08:38.456Z|00077|raft|INFO|rejecting term 1 < current term 2 received in vote_reply message from server 39a9
>
> 2020-07-21T01:12:51.363Z|00078|timeval|WARN|Unreasonably long 1466ms poll interval (1186ms user, 27ms system)
>
> ==== .85 ovn-northd master ==============
> 2020-07-21T01:08:25.925Z|00479|poll_loop|INFO|wakeup due to [POLLIN] on fd 8 (10.6.20.85:45478<http://10.6.20.85:45478><->10.6.20.84:6641<http://10.6.20.84:6641>) at lib/stream-fd.c:157 (54% CPU usage)
>
> 2020-07-21T01:08:38.464Z|00480|ovsdb_idl|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>: clustered database server is not cluster leader; trying another server
> 2020-07-21T01:08:38.465Z|00481|reconnect|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>: connection attempt timed out
> 2020-07-21T01:08:38.465Z|00482|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>: connecting...
> 2020-07-21T01:08:38.465Z|00483|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>: connected
> 2020-07-21T01:08:38.465Z|00484|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
> 2020-07-21T01:08:38.465Z|00485|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
> 2020-07-21T01:08:38.966Z|00486|ovn_northd|INFO|deleting Datapath_Binding 108cf745-db82-43c0-a9d3-afe27a41e4aa with duplicate external-ids:logical-switch/router 8a5d1d3c-e9fc-4cbe-a461-98ff838e6473
>
> 2020-07-20T21:32:48.803Z|00019|reconnect|INFO|tcp:10.6.20.84:6641<http://10.6.20.84:6641>: connected
>
> ==== .86 ovn-northd ==============
> 2020-07-21T01:08:38.463Z|00020|ovsdb_idl|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>: clustered database server is not cluster leader; trying another server
> 2020-07-21T01:08:38.464Z|00021|reconnect|INFO|tcp:10.6.20.84:6642<http://10.6.20.84:6642>: connection attempt timed out
> 2020-07-21T01:08:38.464Z|00022|reconnect|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>: connecting...
> 2020-07-21T01:08:38.464Z|00023|reconnect|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>: connected
> 2020-07-21T01:08:38.464Z|00024|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
> 2020-07-21T01:08:38.465Z|00025|ovsdb_idl|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>: clustered database server is not cluster leader; trying another server
> 2020-07-21T01:08:38.466Z|00026|reconnect|INFO|tcp:10.6.20.85:6642<http://10.6.20.85:6642>: connection attempt timed out
> 2020-07-21T01:08:39.467Z|00027|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>: connecting...
> 2020-07-21T01:08:39.468Z|00028|reconnect|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>: connected
> 2020-07-21T01:08:39.468Z|00029|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
>
> 2020-07-21T02:04:22.718Z|00030|ovsdb_idl|INFO|tcp:10.6.20.86:6642<http://10.6.20.86:6642>: clustered database server is not cluster leader; trying another server
>
>
> Thanks!
>
> Tony
> ________________________________
>
> From: Numan Siddique <numans at ovn.org<mailto:numans at ovn.org>>
> Sent: July 23, 2020 03:15 AM
> To: Tony Liu <tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com>>
> Cc: ovs-dev <ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>>; ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org> <ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>>
> Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync
>
>
>
> On Thu, Jul 23, 2020 at 8:22 AM Tony Liu <tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com><mailto:tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com>>> wrote:
> Hi,
>
> I see why sb-db broke at 1568th port-binding.
> The 1568th datapath-binding in sb-db references the same
>
> _uuid               : 108cf745-db82-43c0-a9d3-afe27a41e4aa
> external_ids        : {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key          : 1567
>
> _uuid               : d934ed92-2f3c-4b31-8a76-2a5047a3bb46
> external_ids        : {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key          : 1568
>
> I don't believe this is supposed to happen. Any idea how could it happen?
> Then ovn-northd is stuck in trying to delete this duplication, and it ignores all the following updates.
> That caused out-of-sync between nb-db and sb-db.
> Any way I can fix it manually, like with ovn-sbctl to delete it?
>
> If you delete the ovn sb db resources manually, ovn-northd should sync it up.
>
> But I'm surprised why ovn-northd didn't sync earlier. There's something wrong related to raft going
> on here. Not sure what.
>
> Thanks
> Numan
>
>
>
>
> Thanks!
>
> Tony
>
> ________________________________
> From: dev <ovs-dev-bounces at openvswitch.org<mailto:ovs-dev-bounces at openvswitch.org><mailto:ovs-dev-bounces at openvswitch.org<mailto:ovs-dev-bounces at openvswitch.org>>> on behalf of Tony Liu <tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com><mailto:tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com>>>
> Sent: July 22, 2020 11:33 AM
> To: ovs-dev <ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org><mailto:ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>>>
> Subject: [ovs-dev] OVN nb-db and sb-db out of sync
>
> Hi,
>
> During a scaling test where 4000 networks are created from OpenStack, I see that
> nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports
> (GW port and service port of each network) are created in nb-db. In sb-db,
> only 1567 port-bindings, 4000 is expected.
>
> [root at ovn-db-2 ~]# ovn-nbctl list nb_global
> _uuid               : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
> connections         : []
> external_ids        : {"neutron:liveness_check_at"="2020-07-22 04:03:17.726917+00:00"}
> hv_cfg              : 312
> ipsec               : false
> name                : ""
> nb_cfg              : 2636
> options             : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> sb_cfg              : 2005
> ssl                 : []
>
> [root at ovn-db-2 ~]# ovn-sbctl list sb_global
> _uuid               : 3720bc1d-b0da-47ce-85ca-96fa8d398489
> connections         : []
> external_ids        : {}
> ipsec               : false
> nb_cfg              : 312
> options             : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> ssl                 : []
>
> Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db,
> like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster for both
> nb-db and sb-db.
>
> Is that "incremental update" implemented in 20.03?
> If not, in which release it's going to be available?
>
>
> Thanks!
>
> Tony
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org<mailto:dev at openvswitch.org><mailto:dev at openvswitch.org<mailto:dev at openvswitch.org>>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org<mailto:discuss at openvswitch.org><mailto:discuss at openvswitch.org<mailto:discuss at openvswitch.org>>
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> _______________________________________________
> dev mailing list
> dev at openvswitch.org<mailto:dev at openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list