[ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

Daniel Alvarez Sanchez dalvarez at redhat.com
Fri Nov 23 15:40:08 UTC 2018


Hi Han,

Yes, I agree that the patch is not enough. I'll take a look at the
GARP thing because it's either not implemented or not working. Here's
a reproducer while I jump back into it.

When you ping 172.24.4.200 from the namespace 1 the first time, a
MAC_Binding entry gets created:

# ovn-sbctl list mac_binding | grep 200 -C2
_uuid               : 07967416-c89c-4233-8cc2-4dc929720838
datapath            : 918a9363-fa6e-4086-98ee-8d073b924d29
ip                  : "172.24.4.200"
logical_port        : "lr0-public"
mac                 : "00:00:20:20:12:15"


After recreating lr1 and sw1 using a different MAC address,
172.24.4.200 becomes unreachable from sw0 as the MAC_Binding entry
never gets updated.


reproducer.sh

#!/bin/bash
for i in $(ovn-sbctl list mac_binding | grep uuid  | awk '{print
$3}'); do ovn-sbctl destroy mac_binding $i; done

ip net del ns1
ip net del ns2
ovs-vsctl del-port ns1
ovs-vsctl del-port ns2
ovn-nbctl lr-del lr0
ovn-nbctl lr-del lr1
ovn-nbctl ls-del sw0
ovn-nbctl ls-del sw1
ovn-nbctl ls-del public

chassis_name=`ovn-sbctl find chassis | grep ^name | awk '{print $3}'`
ovn-nbctl ls-add sw0
ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 10.0.0.10"


ovn-nbctl lr-add lr0
# Connect sw0 to lr0
ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.254/24
ovn-nbctl lsp-add sw0 sw0-lr0
ovn-nbctl lsp-set-type sw0-lr0 router
ovn-nbctl lsp-set-addresses sw0-lr0 router
ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0


ovn-nbctl ls-add public
ovn-nbctl lrp-add lr0  lr0-public 00:00:20:20:12:13 172.24.4.220/24
ovn-nbctl lsp-add public public-lr0
ovn-nbctl lsp-set-type public-lr0 router
ovn-nbctl lsp-set-addresses public-lr0 router
ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public

# localnet port
ovn-nbctl lsp-add public ln-public
ovn-nbctl lsp-set-type ln-public localnet
ovn-nbctl lsp-set-addresses ln-public unknown
ovn-nbctl lsp-set-options ln-public network_name=public

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10"

ovn-nbctl lr-add lr1
# Connect sw1 to lr1
ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24
ovn-nbctl lsp-add sw1 sw1-lr1
ovn-nbctl lsp-set-type sw1-lr1 router
ovn-nbctl lsp-set-addresses sw1-lr1 router
ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1

ovn-nbctl lrp-add lr1  lr1-public 00:00:20:20:12:15 172.24.4.221/24
ovn-nbctl lsp-add public public-lr1
ovn-nbctl lsp-set-type public-lr1 router
ovn-nbctl lsp-set-addresses public-lr1 router
ovn-nbctl lsp-set-options public-lr1 router-port=lr1-public


ovn-nbctl lr-nat-add lr0 snat 172.24.4.220 10.0.0.0/24
ovn-nbctl lr-nat-add lr1 snat 172.24.4.221  20.0.0.0/24

# Create the FIPs
ovn-nbctl lr-nat-add lr0 dnat_and_snat 172.24.4.100 10.0.0.10
ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10

# Schedule the gateways
ovn-nbctl lrp-set-gateway-chassis lr0-public $chassis_name 20
ovn-nbctl lrp-set-gateway-chassis lr1-public $chassis_name  20


add_phys_port() {
    name=$1
    mac=$2
    ip=$3
    mask=$4
    gw=$5
    iface_id=$6
    ip netns add $name
    ovs-vsctl add-port br-int $name -- set interface $name type=internal
    ip link set $name netns $name
    ip netns exec $name ip link set $name address $mac
    ip netns exec $name ip addr add $ip/$mask dev $name
    ip netns exec $name ip link set $name up
    ip netns exec $name ip route add default via $gw
    ovs-vsctl set Interface $name external_ids:iface-id=$iface_id
}


add_phys_port ns1 50:54:00:00:00:01 10.0.0.10  24 10.0.0.254 sw0-port1
add_phys_port ns2 50:57:00:00:00:02 20.0.0.10  24 20.0.0.254 sw1-port1

# Pinging from sw0
ip net e ns1 ping -c 4 172.24.4.200

ovn-nbctl lr-del lr1
ovn-nbctl ls-del sw1

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "50:57:00:00:00:02 20.0.0.10"

ovn-nbctl lr-add lr1
# Connect sw1 to lr1
ovn-nbctl lrp-add lr1 lr1-sw1 00:00:00:00:ff:02 20.0.0.254/24
ovn-nbctl lsp-add sw1 sw1-lr1
ovn-nbctl lsp-set-type sw1-lr1 router
ovn-nbctl lsp-set-addresses sw1-lr1 router
ovn-nbctl lsp-set-options sw1-lr1 router-port=lr1-sw1


# Change the MAC address of the LRP
ovn-nbctl lrp-add lr1  lr1-public 00:00:20:20:12:95 172.24.4.221/24

ovn-nbctl lr-nat-add lr1 snat 172.24.4.221  20.0.0.0/24
ovn-nbctl lr-nat-add lr1 dnat_and_snat 172.24.4.200 20.0.0.10

ovn-nbctl lrp-set-gateway-chassis lr1-public centosl-rdocloud 20

# Pinging from sw0 won't work now. For the outside it will.
ip net e ns1 ping -c 4 172.24.4.200
On Wed, Nov 21, 2018 at 9:04 PM Han Zhou <zhouhan at gmail.com> wrote:
>
>
>
> On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson <mmichels at redhat.com> wrote:
> >
> > Hi Daniel,
> >
> > I agree with Numan that this seems like a good approach to take.
> >
> > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > >
> > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff <blp at ovn.org
> > > <mailto:blp at ovn.org>> wrote:
> > >  >
> > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > > <dalvarez at redhat.com <mailto:dalvarez at redhat.com>>
> > >  > > wrote:
> > >  > >
> > >  > > > Hi,
> > >  > > >
> > >  > > > After digging further. The problem seems to be reduced to reusing an
> > >  > > > old gateway IP address for a dnat_and_snat entry.
> > >  > > > When a gateway port is bound to a chassis, its entry will show up in
> > >  > > > the MAC_Binding table (at least when that Logical Switch is connected
> > >  > > > to more than one Logical Router). After deleting the Logical Router
> > >  > > > and all its ports, this entry will remain there. If a new Logical
> > >  > > > Router is created and a Floating IP (dnat_and_snat) is assigned to a
> > >  > > > VM with the old gw IP address, it will become unreachable.
> > >  > > >
> > >  > > > A workaround now from networking-ovn (OpenStack integration) is to
> > >  > > > delete MAC_Binding entries for that IP address upon a FIP creation. I
> > >  > > > think that this however should be done from OVN, what do you folks
> > >  > > > think?
> > >  > > >
> > >  > > >
> > >  > > Agree. Since the MAC_Binding table row is created by ovn-controller, it
> > >  > > should
> > >  > > be handled properly within OVN.
> > >  >
> > >  > I see that this has been sitting here for a while.  The solution seems
> > >  > reasonable to me.  Are either of you working on it?
> > >
> > > I started working on it. I came up with a solution (see patch below)
> > > which works but I wanted to give you a bit more of context and get your
> > > feedback:
> > >
> > >
> > >                             ^ localnet
> > >                             |
> > >                         +---+---+
> > >                         |       |
> > >                  +------+  pub  +------+
> > >                  |      |       |      |
> > >                  |      +-------+      |
> > >                  | 172.24.4.0/24 <http://172.24.4.0/24>    |
> > >                  |                     |
> > >     172.24.4.220 |                     | 172.24.4.221
> > >              +---+---+             +---+---+
> > >              |       |             |       |
> > >              |  LR0  |             |  LR1  |
> > >              |       |             |       |
> > >              +---+---+             +---+---+
> > >       10.0.0.254 |                     | 20.0.0.254
> > >                  |                     |
> > >              +---+---+             +---+---+
> > >              |       |             |       |
> > > 10.0.0.0/24 <http://10.0.0.0/24> |  SW0  |             |  SW1  |
> > > 20.0.0.0/24 <http://20.0.0.0/24>
> > >              |       |             |       |
> > >              +---+---+             +---+---+
> > >                  |                     |
> > >                  |                     |
> > >              +---+---+             +---+---+
> > >              |       |             |       |
> > >              |  VM0  |             |  VM1  |
> > >              |       |             |       |
> > >              +-------+             +-------+
> > >              10.0.0.10             20.0.0.10
> > >            172.24.4.100           172.24.4.200
> > >
> > >
> > > When I ping VM1 floating IP from the external network, a new entry for
> > > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding table:
> > >
> > > _uuid               : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> > > datapath            : ac5984b9-0fea-485f-84d4-031bdeced29b
> > > ip                  : "172.24.4.221"
> > > logical_port        : "lrp02"
> > > mac                 : "00:00:02:01:02:04"
> > >
> > >
> > > Now, if LR1 gets removed and the old gateway IP (172.24.4.221) is reused
> > > for VM2 FIP with different MAC and new gateway IP is created (for
> > > example 172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes unreachable
> > > from VM1 until the old MAC_Binding entry gets deleted as pinging
> > > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04").
> > >
> > > With the patch below, removing LR1 results in deleting all MAC_Binding
> > > entries for every datapath where '172.24.4.221' appears in the 'ip'
> > > column so the problem goes away.
> > >
> > > Another solution would be implementing some kind of 'aging' for
> > > MAC_Binding entries but perhaps it's more complex.
> > > Looking forward for your comments :)
> > >
> > >
> > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > > index 58bef7d..a86733e 100644
> > > --- a/ovn/northd/ovn-northd.c
> > > +++ b/ovn/northd/ovn-northd.c
> > > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct northd_context *ctx,
> > > struct hmap *ports)
> > >       }
> > >   }
> > >
> > > +static void
> > > +delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip)
> > > +{
> > > +    const struct sbrec_mac_binding *b, *n;
> > > +    SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> > > +        if (strstr(ip, b->ip)) {
> > > +            sbrec_mac_binding_delete(b);
> > > +        }
> > > +    }
> > > +}
> > > +
> > > +
> > >   /* Updates the southbound Port_Binding table so that it contains the
> > > logical
> > >    * switch ports specified by the northbound database.
> > >    *
> > > @@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx,
> > >       /* Delete southbound records without northbound matches. */
> > >       LIST_FOR_EACH_SAFE(op, next, list, &sb_only) {
> > >           ovs_list_remove(&op->list);
> > > +
> > > +        /* Delete all MAC_Binding entries which match the IP addresses
> > > of the
> > > +         * deleted logical router port (ie. port with a peer). */
> > > +        const char *peer = smap_get(&op->sb->options, "peer");
> > > +        if (peer) {
> > > +            for (int i = 0; i < op->sb->n_mac; i++) {
> > > +                delete_mac_binding_by_ip(ctx, op->sb->mac[i]);
> > > +            }
> > > +        }
> > >           sbrec_port_binding_delete(op->sb);
> > >           ovn_port_destroy(ports, op);
> > >       }
> > >
>
> Hi,
>
> Sorry that I didn't notice this discussion until now. I encountered similar problems before. It was not in floating IP scenario, but for external IPs - ports on the same networks but not aware by OVN. When IP relocates from one MAC to another, the previous mac-binding entry will not get updated and therefore the re-located IP is unreachable.
>
> This happens for external router IPs on the localnet network behind the gateways (which hosts the 172.24.4.221 port in Daniel's example). It also happens for nested workloads that run inside a VM - the VM port is known by OVN, but the internal workloads (e.g. containers) runs on same subnets but relies on mac-binding to communicate.
>
> For both of my use cases, the problem has been solved by this patch (merged): https://github.com/openvswitch/ovs/commit/b068454082f5d76727ffde34542ff19fed20e178
>
> The idea is, mac-binding entry should be updated when the IP is announced in a new location by GARP/ARP request/ARP response. So I think the best way to solve the problem for floating IP is similar. We just need to generate GARP when a new FIP is attached. I was under the impression that OVN already supports GARP when a new NAT entry is added. But if the problem is still there it means something is wrong there (or the GARP feature is not there yet for the NAT case), and I need to check the code.
>
> For the patch proposed in this discussion, I think there are two problems.
>
> Firstly, I think it doesn't solve the problem completely. It only deletes mac-binding when a logical router port is deleted. However, in any of the above use cases (including FIP), IP relocation can happen without deleting the router port. Or did I misunderstood anything here?
>
> Secondly, northd just reconciles between current state and desired state for SB - it is declarative. We should avoid relying on the northd cleanup logic to trigger important operations. I think the design principle of northd should be making sure the desired state is reached, but not care about how is it reached. For example, it can be reached by deleting extra records one by one, but it is also correct if it deletes everything and recreate the desired entries - this is just an example, it may be inefficient, but it may be reasonable in some scenarios. Adding logic in northd that relies on *how* the desired state is computed would make it unreliable and hard to maintain. I think it would also create challenges for the DDlog implementation.
>
> For the mac-binding aging mechanism mentioned by Daniel, I agree. It is required for fault scenarios when SB is temporarily down. Since we rely on SB DB to store the ARP cache/Neighbor table for the virtual routers, if ARP updates happens when the DB is down, changes are lost. However, the aging mechanism seems tricky when scale is considered. Only the idle entries should be timed out, but it is costly to update states whenever a mac-binding entry is hit. I haven't thought about any clever way to achieve it without sacrificing scalability. Any thoughts here? A workaround to the problem is to resend GARP periodically (e.g. every 1 min).
>
> Thanks,
> Han


More information about the discuss mailing list