[ovs-discuss] ovsdb cluster locking not working (*)

Fri Jun 19 16:48:00 UTC 2020

On Fri, Jun 19, 2020 at 10:13 PM Matthew Booth <mbooth at redhat.com> wrote:

> (*) ... as expected by me.
>
> I'm currently attempting to configure a 3 node OVN control plane on
> kubernetes. This consists of 3 pods, each running a nb, sb, and
> northd. I expected only 1 of the northds to be active at any one time
> as (IIUC) they are supposed to hold a northd lock in the south db.
> However, I can see from the logs that they are all active. All 3
> northds report:
>
> 2020-06-19T15:53:20Z|00009|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
>
> I executed the following on 2 different nodes:
>
> $ ovsdb-client lock unix:/pod-run/ovnnb_db.sock foo_lock
>
> Both locks were granted concurrently. I can confirm the same behaviour
> when connecting over tcp, except when connecting to the same db node
> over tcp. I would assume that the nodes are not clustered correctly,
> but they appear to be:
>
> # ovs-appctl -t /pod-run/ovnsb_db.ctl cluster/status OVN_Southbound
> 5c61
> Name: OVN_Southbound
> Cluster ID: aa50 (aa505b7b-f0c4-4219-b357-bf7d0e2b79e9)
> Server ID: 5c61 (5c6196d8-6a08-4b55-bbd9-b039d57a8e6d)
> Address: tcp:10.131.0.12:6644
> Status: cluster member
> Role: candidate
> Term: 3
> Leader: 4cba
> Vote: self
>
> Log: [2, 8]
> Entries not yet committed: 0
> Entries not yet applied: 0
> Connections: ->4cba ->7283 <-7283 <-4cba
> Servers:
>     5c61 (5c61 at tcp:10.131.0.12:6644) (self) (voted for 5c61)
>     4cba (4cba at tcp:10.128.2.15:6644) (voted for 4cba)
>     7283 (7283 at tcp:10.129.2.18:6644)
>
> The dbs are launched slightly differently on node 0 from nodes 1 and 2
> to account for bootstrapping. The bootstrapping node runs:
>
>             exec /usr/share/openvswitch/scripts/ovn-ctl \
>             --no-monitor \
>             --db-nb-create-insecure-remote=yes \
>             --db-nb-cluster-local-addr="${LOCAL_IP}" \
>             --db-nb-cluster-local-proto=tcp \
>             --ovn-nb-log="-vconsole:${OVN_LOG_LEVEL} -vfile:off" \
>             run_sb_ovsdb
>
> The others run:
>
>             exec /usr/share/openvswitch/scripts/ovn-ctl \
>             --no-monitor \
>             --db-nb-create-insecure-remote=yes \
>             --db-nb-cluster-remote-addr="${BOOTSTRAP_IP}" \
>             --db-nb-cluster-local-addr="${LOCAL_IP}" \
>             --db-nb-cluster-loca-proto=tcp \
>             --db-nb-cluster-remote-proto=tcp \
>             --ovn-nb-log="-vconsole:${OVN_LOG_LEVEL} -vfile:off" \
>             run_sb_ovsdb
>
> The north db is the same, and northd in each pod connects the local
> unix sockets for both north and south dbs.
>
>
I think this is wrong. You need to tell ovn-northd to connect to the
cluster IPs.
Like tcp:IP1:6641,tcp:IP2:6641,tcp:IP3:6641 for northd db and the same for
the south db.

If you start this way, all the ovn-northd will connect to the cluster
leader and only one
will get the lock.

Thanks
Numan

> OVS_RUNDIR=/pod-run and OVS_DBDIR=/var/lib/openvswitch in all cases.
>
> Any idea what's going on here?
>
> Thanks,
>
> Matt
> --
> Matthew Booth
> Red Hat OpenStack Engineer, Compute DFG
>
> Phone: +442070094448 (UK)
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200619/1e43419f/attachment.html>