[ovs-discuss] Question to OVN DB pacemaker script

aginwala aginwala at asu.edu
Thu May 10 20:54:56 UTC 2018


Hi :

Just to further update, I am able to re-open tcp port for failover scenario
when new master is getting promoted with additional code changes as below
which do require stop of ovs service on the new selected master to reset
the tcp settings:


diff --git a/ovn/utilities/ovndb-servers.ocf
b/ovn/utilities/ovndb-servers.ocf
index 164b6bc..8cb4c25 100755
--- a/ovn/utilities/ovndb-servers.ocf
+++ b/ovn/utilities/ovndb-servers.ocf
@@ -295,8 +295,8 @@ ovsdb_server_start() {

     set ${OVN_CTL}

-    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
-    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
+    set $@ --db-nb-port=${NB_MASTER_PORT}
+    set $@ --db-sb-port=${SB_MASTER_PORT}

     if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
         set $@ --db-nb-create-insecure-remote=yes
@@ -307,6 +307,8 @@ ovsdb_server_start() {
     fi

     if [ "x${present_master}" = x ]; then
+        set $@ --db-nb-create-insecure-remote=yes
+        set $@ --db-sb-create-insecure-remote=yes
         # No master detected, or the previous master is not among the
         # set starting.
         #
@@ -316,6 +318,8 @@ ovsdb_server_start() {
         set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
--db-sb-sync-from-addr=${INVALID_IP_ADDRESS}

     elif [ ${present_master} != ${host_name} ]; then
+        set $@ --db-nb-create-insecure-remote=no
+        set $@ --db-sb-create-insecure-remote=no
         # An existing master is active, connect to it
         set $@ --db-nb-sync-from-addr=${MASTER_IP}
--db-sb-sync-from-addr=${MASTER_IP}
         set $@ --db-nb-sync-from-port=${NB_MASTER_PORT}
@@ -416,6 +420,8 @@ ovsdb_server_promote() {
             ;;
     esac

+    ${OVN_CTL} stop_ovsdb
+    ovsdb_server_start
     ${OVN_CTL} promote_ovnnb
     ${OVN_CTL} promote_ovnsb



Below are the scenarios tested:
MasterSlaveScenarioResult

   -


   -

reboot/failure New master gets promoted with tcp ports enabled to start
taking LB traffic.

   -


   -

reboot/failure
No change and current master continues taking traffic with slave continue
to sync from master.

   -


   -

reboot/failure
New master gets promoted with tcp ports enabled to start taking LB traffic.

Also sync on slaves from master works as expected:
# On master
ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add  556
# on slave port is shutdown as expected
ovn-nbctl --db=tcp:10.169.129.34:6641 show
ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
refused)
# on slave local unix socket, above lswitch 556 gets replicated too as
--sync-from=tcp:10.149.4.252:6641
ovn-nbctl show
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)

# Same testing for sb db too
# Slave port 6642 is shutdown too
ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and
# Using master ip works
 ovn-sbctl --db=tcp:10.169.129.33:6642 show
Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715"
    hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com"
    Encap geneve
        ip: "10.169.129.34"
        options: {csum="true"}



# Accessing via LB vip works fine too as only one member is active:
for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:6642 show; done
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)
switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556)


Everything works fine as expected. Let me know for any corner case missed.
I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY for using LB
with tcp  to avoid breaking existing functionality accordingly.



Regards,
Aliasgar



On Thu, May 10, 2018 at 9:55 AM, aginwala <aginwala at asu.edu> wrote:

> Thanks folks for suggestions:
>
> For LB vip configurations, I did  the testing further and yes it does
> tries to hit the slave db as per the logs below and fails as slave do not
> have write permission of which LB is not aware of:
> for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add
> $i590;done
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
> ovn-nbctl: transaction error: {"details":"insert operation not allowed
> when database server is in read only mode","error":"not allowed"}
>
> Hence, with little more code changes(in the same patch without the flag
> variable suggestion), I am able to shutdown the tcp port on the slave and
> it works fine as below:
> #Master Node
> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
> #Slave Node
> # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
> refused)
>
> Code to shutdown tcp port on slave db along with only master listening on
> tcp ports:
> diff --git a/ovn/utilities/ovndb-servers.ocf
> b/ovn/utilities/ovndb-servers.ocf
> index 164b6bc..b265df6 100755
> --- a/ovn/utilities/ovndb-servers.ocf
> +++ b/ovn/utilities/ovndb-servers.ocf
> @@ -295,8 +295,8 @@ ovsdb_server_start() {
>
>      set ${OVN_CTL}
>
> -    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
> -    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
> +    set $@ --db-nb-port=${NB_MASTER_PORT}
> +    set $@ --db-sb-port=${SB_MASTER_PORT}
>
>      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>          set $@ --db-nb-create-insecure-remote=yes
> @@ -307,6 +307,8 @@ ovsdb_server_start() {
>      fi
>
>      if [ "x${present_master}" = x ]; then
> +        set $@ --db-nb-create-insecure-remote=yes
> +        set $@ --db-sb-create-insecure-remote=yes
>          # No master detected, or the previous master is not among the
>          # set starting.
>          #
> @@ -316,6 +318,8 @@ ovsdb_server_start() {
>          set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS}
> --db-sb-sync-from-addr=${INVALID_IP_ADDR
>
>      elif [ ${present_master} != ${host_name} ]; then
> +        set $@ --db-nb-create-insecure-remote=no
> +        set $@ --db-sb-create-insecure-remote=no
>
>
> But I noticed that if the slave becomes active post failover after active
> node reboot/failure, pacemaker shows it online but I am not able to access
> the dbs.
>
> # crm status
> Online: [ test-pace2-2365308 ]
> OFFLINE: [ test-pace1-2365293 ]
>
> Full list of resources:
>
>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>      Masters: [ test-pace2-2365308 ]
>      Stopped: [ test-pace1-2365293 ]
>
>
> # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444
> ovn-nbctl: tcp:10.169.129.33:6641: database connection failed (Connection
> refused)
> # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444
> ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection
> refused)
>
> Hence, if failover happens, slave is already running with
> --sync-from=lbVIP:6641/6642 for nb and sb db respectively. Thus, re-opening
> of tcp ports for nb and sb db on the slave that is getting promoted to
> master is not happening automatically.
>
> Let me know if there is a valid way/approach too which I am missing to
> handle it during slave promote logic?  Will do further code changes
> accordingly.
>
> Note: Current code changes for use with LB will needs to be handled for
> ssl too. Will have to handle that separately but want to get the tcp
> working first and we can add ssl support later.
>
>
> Regards,
> Aliasgar
>
> On Wed, May 9, 2018 at 12:19 PM, Numan Siddique <nusiddiq at redhat.com>
> wrote:
>
>>
>>
>> On Thu, May 10, 2018 at 12:44 AM, Han Zhou <zhouhan at gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, May 9, 2018 at 11:51 AM, Numan Siddique <nusiddiq at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, May 10, 2018 at 12:15 AM, Han Zhou <zhouhan at gmail.com> wrote:
>>>>
>>>>> Thanks Ali for the quick patch. Please see my comments inline.
>>>>>
>>>>> On Wed, May 9, 2018 at 9:30 AM, aginwala <aginwala at asu.edu> wrote:
>>>>> >
>>>>> > Thanks Han and Numan for the clarity to help sort it out.
>>>>> >
>>>>> > For making vip work with using LB in my two node setup, I had
>>>>> changed below code to skip setting master IP  when creating pcs resource
>>>>> for ovndbs and listen on 0.0.0.0 instead. Hence, the discussion seems
>>>>> inline with the code change which is small for sure as below:
>>>>> >
>>>>> >
>>>>> > diff --git a/ovn/utilities/ovndb-servers.ocf
>>>>> b/ovn/utilities/ovndb-servers.ocf
>>>>> > index 164b6bc..d4c9ad7 100755
>>>>> > --- a/ovn/utilities/ovndb-servers.ocf
>>>>> > +++ b/ovn/utilities/ovndb-servers.ocf
>>>>> > @@ -295,8 +295,8 @@ ovsdb_server_start() {
>>>>> >
>>>>> >      set ${OVN_CTL}
>>>>> >
>>>>> > -    set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>>>> > -    set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>>>> > +    set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>> > +    set $@ --db-sb-port=${SB_MASTER_PORT}
>>>>> >
>>>>> >      if [ "x${NB_MASTER_PROTO}" = xtcp ]; then
>>>>> >          set $@ --db-nb-create-insecure-remote=yes
>>>>> >
>>>>>
>>>>> This change solves the IP binding problem. It will just listen on
>>>>> 0.0.0.0.
>>>>>
>>>>
>>>> One problem with this approach I see is that it would listen on all the
>>>> IPs. May be it's not a good idea and may have some security issues.
>>>>
>>>> Can we instead check the value of  MASTER_IP param something like
>>>> below ?
>>>>
>>>>  if [ "$MASTER_IP" == "0.0.0.0" ]; then
>>>>      set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT}
>>>>      set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT}
>>>> else
>>>>      set $@ --db-nb-port=${NB_MASTER_PORT}
>>>>      set $@ --db-sb-port=${SB_MASTER_PORT}
>>>> fi
>>>>
>>>> And when you create OVN pacemaker resource in your deployment, you can
>>>> pass master_ip=0.0.0.0
>>>>
>>>> Will this work ?
>>>>
>>>>
>>> Maybe some misunderstanding here. We still need to use master_ip = LB
>>> VIP, so that the standby nodes can "sync-from" the active node. So we
>>> cannot pass 0.0.0.0 explicitly.
>>>
>>
>> I misunderstood earlier. I thought you wouldn't need master ip at all.
>> Thanks for the clarification.
>>
>>>
>>> I didn't understand your code above either. Why would we specify the
>>> master_ip if we know it is 0.0.0.0? Or do you mean the other way around but
>>> just a typo in the code?
>>>
>>> For security of listening on any IP, I am not quit sure. It may be a
>>> problem if the nodes sits on multiple networks and some of them are
>>> considered insecure, and you want to listen on the security one only. If
>>> this is the concern, we can add a parameter e.g. LISTEN_ON_MASTER_IP_ONLY,
>>> and set it to true by default. What do you think?
>>>
>>
>> I would prefer adding the parameter as you have suggested so that the
>> existing behavior remain intact.
>>
>> Thanks
>> Numan
>>
>>
>>> Thanks,
>>> Han
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180510/8c604872/attachment-0001.html>


More information about the discuss mailing list