[ovs-discuss] active_backup failover issue

Numan Siddique numans at ovn.org
Tue Apr 27 20:19:50 UTC 2021


On Tue, Apr 27, 2021 at 9:11 AM Francois <rigault.francois at gmail.com> wrote:
>
> Hello OpenvSwitch!
> I have 2 chassis with external connectivity, chassis-1 hosts port-1
> and chassis-2 hosts port-2. SNAT is done through a gateway hosted on
> chassis-1, and both chassis exchange BFD. There is no floating IP.
>
> I see chassis-1 does not have any flow for tunnelling, which is logic
> since it hosts the gateway. Traffic goes straight to the external port
> of the chassis, which is fine.
> I see however, chassis-2 having an extra flow:
>
>  cookie=0x7a15360f, duration=4116.970s, table=37, n_packets=1471,
> n_bytes=144158, priority=100,reg15=0x3,metadata=0x4
> actions=load:0x4->NXM_NX_TUN_ID[0..23],set_field:0x3->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],bundle(eth_src,0,active_backup,ofport,members:"ovn-chassi-0")
>
> In my case I have only 2 chassis,  the bundle only contains a single member.
>
> I am now killing the ovs process from chassis-1. Chassis-2 properly
> detects that chassis-1 is dead, however packets going out are still
> using this flow, and are not sent outside.
>
> If I add a third chassis chassis-3, I see it monitors properly
> chassis-1 and chassis-2, and the bundle members contain both chassis.
> This case is fine and chassis-2 does the SNAT for chassis-3.
>
> I am wondering if there is something wrong with my set-up. I would
> expect that when chassis-1 dies and the gateway fails over to
> chassis-2, traffic from port-2 actually goes out from chassis-2. It
> should not be dropped (or be sent to the next chassis in the list,
> although I did not try this). Any help would be very appreciated!
>
> (this should be the master branch of ovn).

ovn-controller comes to know about the BFD failures when ovs-vswitchd detects it
and updates the OVS interface BFD information in the local ovs
conf.db.  In your case
since you killed the ovs process, the BFD status is not updated in the
local ovs conf.db.
The ovn-controller running on chassis-1 will not detect the BFD failover.

The other issue is since ovs-vswitchd is down, ovn-controller will
lose connectivity
to ovs-vswitchd.  Also since ovs-vswitchd is down, the traffic
originating from the VMs
in that chassis will go through fine if there are datapath flows.  Any
new traffic will
be anyway dropped since there is no ovs-vswitchd to handle the upcall.

In my opinion, the correct way to test is to disconnect chassis-1 from
your physical network
rather than killing ovs-vswitchd.

Thanks
Numan

> Thanks
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>


More information about the discuss mailing list