[ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

Han Zhou zhouhan at gmail.com
Wed Nov 21 20:04:37 UTC 2018


On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson <mmichels at redhat.com> wrote:
>
> Hi Daniel,
>
> I agree with Numan that this seems like a good approach to take.
>
> On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> >
> > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff <blp at ovn.org
> > <mailto:blp at ovn.org>> wrote:
> >  >
> >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique wrote:
> >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > <dalvarez at redhat.com <mailto:dalvarez at redhat.com>>
> >  > > wrote:
> >  > >
> >  > > > Hi,
> >  > > >
> >  > > > After digging further. The problem seems to be reduced to
reusing an
> >  > > > old gateway IP address for a dnat_and_snat entry.
> >  > > > When a gateway port is bound to a chassis, its entry will show
up in
> >  > > > the MAC_Binding table (at least when that Logical Switch is
connected
> >  > > > to more than one Logical Router). After deleting the Logical
Router
> >  > > > and all its ports, this entry will remain there. If a new Logical
> >  > > > Router is created and a Floating IP (dnat_and_snat) is assigned
to a
> >  > > > VM with the old gw IP address, it will become unreachable.
> >  > > >
> >  > > > A workaround now from networking-ovn (OpenStack integration) is
to
> >  > > > delete MAC_Binding entries for that IP address upon a FIP
creation. I
> >  > > > think that this however should be done from OVN, what do you
folks
> >  > > > think?
> >  > > >
> >  > > >
> >  > > Agree. Since the MAC_Binding table row is created by
ovn-controller, it
> >  > > should
> >  > > be handled properly within OVN.
> >  >
> >  > I see that this has been sitting here for a while.  The solution
seems
> >  > reasonable to me.  Are either of you working on it?
> >
> > I started working on it. I came up with a solution (see patch below)
> > which works but I wanted to give you a bit more of context and get your
> > feedback:
> >
> >
> >                             ^ localnet
> >                             |
> >                         +---+---+
> >                         |       |
> >                  +------+  pub  +------+
> >                  |      |       |      |
> >                  |      +-------+      |
> >                  | 172.24.4.0/24 <http://172.24.4.0/24>    |
> >                  |                     |
> >     172.24.4.220 |                     | 172.24.4.221
> >              +---+---+             +---+---+
> >              |       |             |       |
> >              |  LR0  |             |  LR1  |
> >              |       |             |       |
> >              +---+---+             +---+---+
> >       10.0.0.254 |                     | 20.0.0.254
> >                  |                     |
> >              +---+---+             +---+---+
> >              |       |             |       |
> > 10.0.0.0/24 <http://10.0.0.0/24> |  SW0  |             |  SW1  |
> > 20.0.0.0/24 <http://20.0.0.0/24>
> >              |       |             |       |
> >              +---+---+             +---+---+
> >                  |                     |
> >                  |                     |
> >              +---+---+             +---+---+
> >              |       |             |       |
> >              |  VM0  |             |  VM1  |
> >              |       |             |       |
> >              +-------+             +-------+
> >              10.0.0.10             20.0.0.10
> >            172.24.4.100           172.24.4.200
> >
> >
> > When I ping VM1 floating IP from the external network, a new entry for
> > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding table:
> >
> > _uuid               : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> > datapath            : ac5984b9-0fea-485f-84d4-031bdeced29b
> > ip                  : "172.24.4.221"
> > logical_port        : "lrp02"
> > mac                 : "00:00:02:01:02:04"
> >
> >
> > Now, if LR1 gets removed and the old gateway IP (172.24.4.221) is reused
> > for VM2 FIP with different MAC and new gateway IP is created (for
> > example 172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes unreachable
> > from VM1 until the old MAC_Binding entry gets deleted as pinging
> > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04").
> >
> > With the patch below, removing LR1 results in deleting all MAC_Binding
> > entries for every datapath where '172.24.4.221' appears in the 'ip'
> > column so the problem goes away.
> >
> > Another solution would be implementing some kind of 'aging' for
> > MAC_Binding entries but perhaps it's more complex.
> > Looking forward for your comments :)
> >
> >
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index 58bef7d..a86733e 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct northd_context *ctx,
> > struct hmap *ports)
> >       }
> >   }
> >
> > +static void
> > +delete_mac_binding_by_ip(struct northd_context *ctx, const char *ip)
> > +{
> > +    const struct sbrec_mac_binding *b, *n;
> > +    SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> > +        if (strstr(ip, b->ip)) {
> > +            sbrec_mac_binding_delete(b);
> > +        }
> > +    }
> > +}
> > +
> > +
> >   /* Updates the southbound Port_Binding table so that it contains the
> > logical
> >    * switch ports specified by the northbound database.
> >    *
> > @@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx,
> >       /* Delete southbound records without northbound matches. */
> >       LIST_FOR_EACH_SAFE(op, next, list, &sb_only) {
> >           ovs_list_remove(&op->list);
> > +
> > +        /* Delete all MAC_Binding entries which match the IP addresses
> > of the
> > +         * deleted logical router port (ie. port with a peer). */
> > +        const char *peer = smap_get(&op->sb->options, "peer");
> > +        if (peer) {
> > +            for (int i = 0; i < op->sb->n_mac; i++) {
> > +                delete_mac_binding_by_ip(ctx, op->sb->mac[i]);
> > +            }
> > +        }
> >           sbrec_port_binding_delete(op->sb);
> >           ovn_port_destroy(ports, op);
> >       }
> >

Hi,

Sorry that I didn't notice this discussion until now. I encountered similar
problems before. It was not in floating IP scenario, but for external IPs -
ports on the same networks but not aware by OVN. When IP relocates from one
MAC to another, the previous mac-binding entry will not get updated and
therefore the re-located IP is unreachable.

This happens for external router IPs on the localnet network behind the
gateways (which hosts the 172.24.4.221 port in Daniel's example). It also
happens for nested workloads that run inside a VM - the VM port is known by
OVN, but the internal workloads (e.g. containers) runs on same subnets but
relies on mac-binding to communicate.

For both of my use cases, the problem has been solved by this patch
(merged):
https://github.com/openvswitch/ovs/commit/b068454082f5d76727ffde34542ff19fed20e178

The idea is, mac-binding entry should be updated when the IP is announced
in a new location by GARP/ARP request/ARP response. So I think the best way
to solve the problem for floating IP is similar. We just need to generate
GARP when a new FIP is attached. I was under the impression that OVN
already supports GARP when a new NAT entry is added. But if the problem is
still there it means something is wrong there (or the GARP feature is not
there yet for the NAT case), and I need to check the code.

For the patch proposed in this discussion, I think there are two problems.

Firstly, I think it doesn't solve the problem completely. It only deletes
mac-binding when a logical router port is deleted. However, in any of the
above use cases (including FIP), IP relocation can happen without deleting
the router port. Or did I misunderstood anything here?

Secondly, northd just reconciles between current state and desired state
for SB - it is declarative. We should avoid relying on the northd cleanup
logic to trigger important operations. I think the design principle of
northd should be making sure the desired state is reached, but not care
about how is it reached. For example, it can be reached by deleting extra
records one by one, but it is also correct if it deletes everything and
recreate the desired entries - this is just an example, it may be
inefficient, but it may be reasonable in some scenarios. Adding logic in
northd that relies on *how* the desired state is computed would make it
unreliable and hard to maintain. I think it would also create challenges
for the DDlog implementation.

For the mac-binding aging mechanism mentioned by Daniel, I agree. It is
required for fault scenarios when SB is temporarily down. Since we rely on
SB DB to store the ARP cache/Neighbor table for the virtual routers, if ARP
updates happens when the DB is down, changes are lost. However, the aging
mechanism seems tricky when scale is considered. Only the idle entries
should be timed out, but it is costly to update states whenever a
mac-binding entry is hit. I haven't thought about any clever way to achieve
it without sacrificing scalability. Any thoughts here? A workaround to the
problem is to resend GARP periodically (e.g. every 1 min).

Thanks,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20181121/092fdce9/attachment-0001.html>


More information about the discuss mailing list