[ovs-dev] OVN L3-HA request for feedback

Wed May 24 11:19:14 UTC 2017

I wanted to share a small status update:

Anil and I have been working on this [1] and we expect to post
some preliminary  patches before the end of the week.

I can confirm that the BFD + bundle(active_backup) strategy
works well from the hypervisors point of view. With 1 sec BFD
pings we get a ~2.8s failover time.

So far we have only focused on case "2" so far for the distributed
routers where we specify a set of hosts to act as chassis.

"""
     ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.1/24 \
                  -- set Logical_Router_Port alice \
                  options:redirect-chassis=gw1:10,gw2:20,gw3:30
"""

We wonder if there's any value at all in exploring support on "1"
the old way of pinning a logical router to a chassis.

If anybody wants to give it a try you can use [2] to quickly deploy
2 gw hosts + 2 "hv" hosts, + 1 service host (accessible through an
'external' network via gw1 and gw2) (see ascii diagram [3] and [4] details)

Then you can ping the external service from a port in hv1 with:

    $ vagrant ssh hv1 -c "sudo ip netns exec vm1 ping 10.0.0.111"

or the vm3 via floating point with:

    $ vagrant ssh svc1 -c "ping 10.0.0.16"

you can trigger a failover anytime by doing:

    $ vagrant ssh gw1 -c "sudo ifdown eth1"

and a failback, by:

     $ vagrant ssh gw1 -c "sudo ifup eth1"

We are currently working on:

1) Addressing the monitoring of the inter-gateway bfd, to make sure that
    non-master routers will drop any packet (external/internal) or any ARP
request.
2) Same as 1 but for playing gARPs when a router is in a new chassis.
3) Documentation changes.
4) Tests

And we have some questions:

    About preemption (see failover/failback example above), we have several
options:
   a) we stick to have preemptive failbacks (if a gateway chassis comes
back online, the routers which were scheduled there will bome back)
   b) not preemtive: (when a chassis goes down all logical router ports with
redirect chassis will be recalculated). or
   c) we make it configurable.

My intuition says that with very low failover times "a" could be a
reasonable
thing for most cases, since your load stays balanced when your gateway
chassis
comes back.  But I'm not an operator, how could we gather feedback on
this area?

Best regards,
Miguel Ángel Ajo

[1] https://github.com/mangelajo/ovs/commits/l3ha
[2] https://github.com/mangelajo/vagrants/tree/master/ovn-l3-ha
[3]
https://github.com/mangelajo/vagrants/blob/master/ovn-l3-ha/Vagrantfile#L16
[4] https://github.com/mangelajo/vagrants/blob/master/ovn-l3-ha/gw1.sh#L67

On Fri, Apr 7, 2017 at 9:14 AM, Miguel Angel Ajo Pelayo <majopela at redhat.com
> wrote:

> Updating what I wrote yesterday (I hope I won't make people's
> eyes hurt today) after a talk on IRC (thank you Mickey Spiegel
> and Gurucharan Shetty):
>
> I propose having:
>
>    1) chassis on NB/Logical_Router accept multiple chassis, to cover
> HA on the centralized gateway case for DNAT/SNAT.
>
>            ovn-nbctl create Logical_Router name=edge1 \
>                      options:chassis=gw1:10,gw2:20,gw3:30
>
>         Or multiple chassis without priorities:
>
>            ovn-nbctl create Logical_Router name=edge1 \
>                      options:chassis=gw1,gw2,gw3
>
>         and in this case we let ovn decide -and rewrite the option-
>         to balance priorities between gateways to spread the load.
>
>    2) redirect-chassis on NB/Logical_Router_Port to accept multiple
> chassis to cover HA for centralized SNAT on distributed routers.
>
>         ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.1/24 \
>                   -- set Logical_Router_Port alice \
>                   options:redirect-chassis=gw1:10,gw2:20,gw3:30
>
>         (or again, without priorities)
>
>         ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.1/24 \
>                   -- set Logical_Router_Port alice \
>                   options:redirect-chassis=gw1,gw2,gw3
>
>
> These logical model changes allow for Active/Active L3 when we have
> that implemented, for example by assigning the same priorities.
>
> Alternatively in such case we could add another option
> for case (1):  ha-chassis-mode=active_standby/active_active,
> and ha-redirect-mode=active_standby/active_active for case (2).
>
> For the dataplane implementation I propose following what [1] defines
> for Active/Standby per-router implemetation, with BFD monitoring for
> tunnel endpoints, where the location of the master router is
> independently calculated at every chassis,  making the solution
> independent of the controller connection via SB database.
>
> There are to start with, a few gaps that we need to properly defined yet:
>
> 1) I'd like to see reporting of the master gateway somehow through
> SB db [up to the NB db?], in a way that the administrator can inspect
> the system and see what's it's current state.
>
> 2) While how hypervisors will direct traffic to the calculated
> master router via the bundle action with the active_backup algorithm,
> I believe we can't have anything in OpenFlow to drop packets in the
> standby routers based on the inter-gateway link matrix status.
>
> 3) Other related changes in the SouthBound DB.
>
> Best regards,
> Miguel Ángel Ajo
>
> [1] https://github.com/openvswitch/ovs/blob/master/
> Documentation/topics/high-availability.rst
>
>
> On Thu, Apr 6, 2017 at 12:13 PM, Miguel Angel Ajo Pelayo <
> majopela at redhat.com> wrote:
>
>> Hello everybody,
>>
>>      First I'd like to say hello, because I'll be able to spend more time
>> working
>> with this community, and I'm sure it will be an enjoyable journey for
>> what I've
>> seen (silently watching) during the last few years.
>>
>>      I'm planning to start work (together with Anil) on the L3 High
>> availability area of OVN. We've been reading [1], and it seems quite
>> reasonable.
>>
>>      We're wondering to fast forward and skip the naive active/backup
>> implementation
>> in favor of the Active/Standby (per router) based on bfd +
>> bundle(active_backup)
>> output actions, since the proposal of having ovn-northd monitoring the
>> gateways
>> seems a bit unnatural, and the difference in effort (naive vs
>> active/standby) is
>> probably not very big (warning: I tend to be optimistic).
>>
>>      I spent a couple of days looking at how L3 works now, and, very
>> naively, I would
>> propose either having the redirect-chassis option of Logical_Router_Ports
>> accept
>> multiple chassis with priorities.
>>
>>     For example:
>>
>>         ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.1/24 \
>>              -- set Logical_Router_Port alice
>> options:redirect-chassis=gw1:10,gw2:20,gw3:30
>>
>> Or multiple chassis without priorities:
>>
>>         ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.1/24 \
>>              -- set Logical_Router_Port alice
>> options:redirect-chassis=gw1,gw2,gw3
>>
>>         (and in this case we let ovn decide -and may be rewrite the
>> option- how to balance
>> priorities between gateways to spread the load)
>>
>>
>>         We may want to have another field in the Logical_Router_Port, to
>> let us know which
>> one(s) is(are) the active gateway(s)
>>
>>
>>        This logical model would also allow for Active/Active L3 when we
>> have that implemented,
>> for example by assigning the same priorities.
>>
>>
>>        Alternatively we could have two options:
>>           * ha-redirect-chassis=<chassis>:<priority>[ ..
>> :<chassis2>:<priority2>]
>>           * ha-redirect-mode=active_standby/active_active
>>
>>
>> Best regards,
>> Miguel Ángel Ajo
>>
>> [1] https://github.com/openvswitch/ovs/blob/master/Documenta
>> tion/topics/high-availability.rst
>>
>>
>