[ovs-discuss] OVN: MAC_Binding entries not getting updated leads to unreachable destinations

Han Zhou zhouhan at gmail.com
Sat Dec 1 20:44:46 UTC 2018


On Fri, Nov 30, 2018 at 7:29 AM Daniel Alvarez Sanchez <dalvarez at redhat.com>
wrote:
>
> Thanks folks again for the discussion.
> I sent an RFC patch here [0]. I tried it out with my reproducer and it
> seems to work well. Instead of outputting the packet to the localnet
> ofport, it will inject it to the public switch pipeline so it'll get
> broadcasted to the rest of the ports resulting in other Logical
> Routers connected to the external switch updating their neighbours. As
> it's broadcasted, the GARP will also be sent out through the localnet
> port as before.
>
> Looking forward to your comments before moving on and writing tests.
>
> Thanks Numan for your help!
>
> [0]
https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354220.html
> On Wed, Nov 28, 2018 at 3:32 PM Daniel Alvarez Sanchez
> <dalvarez at redhat.com> wrote:
> >
> > Hi all,
> >
> > As this thread is getting big I'm summarizing the issue I see so far:
> >
> > * When a dnat_and_snat entry is added to a logical router (or port
> > gets bound to a chassis), ovn-controller will send GARPs to announce
> > the MAC address of the FIP(s) (either the gw port or of the actual FIP
> > MAC address if distributed) only through localnet ports [0].
> >
> > * This means that gateway ports bound to that same chassis and
> > connected to the public switch won't get the GARPs, so they won't
> > update their MAC_Binding entries causing unreachability. In the
> > diagram of this thread, LR0 won't get the GARP sent by ovn-controller
> > if both gateway ports are bound to the same chassis.
> >
> > I tried out sending GARPs from the external network using master
> > branch and MAC_Binding entries get updated. However, in order to cover
> > missing cases, I think it would make sense to send the GARPs not only
> > to localnet ports but to all ports of those logical switches that have
> > a localnet port. What do you think?
> >
> > [0]
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073
> >
> > [0]
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2073On
> > Fri, Nov 23, 2018 at 5:28 PM Daniel Alvarez Sanchez
> > <dalvarez at redhat.com> wrote:
> > >
> > > On Wed, Nov 21, 2018 at 9:04 PM Han Zhou <zhouhan at gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Nov 20, 2018 at 5:21 AM Mark Michelson <mmichels at redhat.com>
wrote:
> > > > >
> > > > > Hi Daniel,
> > > > >
> > > > > I agree with Numan that this seems like a good approach to take.
> > > > >
> > > > > On 11/16/2018 12:41 PM, Daniel Alvarez Sanchez wrote:
> > > > > >
> > > > > > On Sat, Nov 10, 2018 at 12:21 AM Ben Pfaff <blp at ovn.org
> > > > > > <mailto:blp at ovn.org>> wrote:
> > > > > >  >
> > > > > >  > On Mon, Oct 29, 2018 at 05:21:13PM +0530, Numan Siddique
wrote:
> > > > > >  > > On Mon, Oct 29, 2018 at 5:00 PM Daniel Alvarez Sanchez
> > > > > > <dalvarez at redhat.com <mailto:dalvarez at redhat.com>>
> > > > > >  > > wrote:
> > > > > >  > >
> > > > > >  > > > Hi,
> > > > > >  > > >
> > > > > >  > > > After digging further. The problem seems to be reduced
to reusing an
> > > > > >  > > > old gateway IP address for a dnat_and_snat entry.
> > > > > >  > > > When a gateway port is bound to a chassis, its entry
will show up in
> > > > > >  > > > the MAC_Binding table (at least when that Logical Switch
is connected
> > > > > >  > > > to more than one Logical Router). After deleting the
Logical Router
> > > > > >  > > > and all its ports, this entry will remain there. If a
new Logical
> > > > > >  > > > Router is created and a Floating IP (dnat_and_snat) is
assigned to a
> > > > > >  > > > VM with the old gw IP address, it will become
unreachable.
> > > > > >  > > >
> > > > > >  > > > A workaround now from networking-ovn (OpenStack
integration) is to
> > > > > >  > > > delete MAC_Binding entries for that IP address upon a
FIP creation. I
> > > > > >  > > > think that this however should be done from OVN, what do
you folks
> > > > > >  > > > think?
> > > > > >  > > >
> > > > > >  > > >
> > > > > >  > > Agree. Since the MAC_Binding table row is created by
ovn-controller, it
> > > > > >  > > should
> > > > > >  > > be handled properly within OVN.
> > > > > >  >
> > > > > >  > I see that this has been sitting here for a while.  The
solution seems
> > > > > >  > reasonable to me.  Are either of you working on it?
> > > > > >
> > > > > > I started working on it. I came up with a solution (see patch
below)
> > > > > > which works but I wanted to give you a bit more of context and
get your
> > > > > > feedback:
> > > > > >
> > > > > >
> > > > > >                             ^ localnet
> > > > > >                             |
> > > > > >                         +---+---+
> > > > > >                         |       |
> > > > > >                  +------+  pub  +------+
> > > > > >                  |      |       |      |
> > > > > >                  |      +-------+      |
> > > > > >                  | 172.24.4.0/24 <http://172.24.4.0/24>    |
> > > > > >                  |                     |
> > > > > >     172.24.4.220 |                     | 172.24.4.221
> > > > > >              +---+---+             +---+---+
> > > > > >              |       |             |       |
> > > > > >              |  LR0  |             |  LR1  |
> > > > > >              |       |             |       |
> > > > > >              +---+---+             +---+---+
> > > > > >       10.0.0.254 |                     | 20.0.0.254
> > > > > >                  |                     |
> > > > > >              +---+---+             +---+---+
> > > > > >              |       |             |       |
> > > > > > 10.0.0.0/24 <http://10.0.0.0/24> |  SW0  |             |  SW1  |
> > > > > > 20.0.0.0/24 <http://20.0.0.0/24>
> > > > > >              |       |             |       |
> > > > > >              +---+---+             +---+---+
> > > > > >                  |                     |
> > > > > >                  |                     |
> > > > > >              +---+---+             +---+---+
> > > > > >              |       |             |       |
> > > > > >              |  VM0  |             |  VM1  |
> > > > > >              |       |             |       |
> > > > > >              +-------+             +-------+
> > > > > >              10.0.0.10             20.0.0.10
> > > > > >            172.24.4.100           172.24.4.200
> > > > > >
> > > > > >
> > > > > > When I ping VM1 floating IP from the external network, a new
entry for
> > > > > > 172.24.4.221 in the LR0 datapath appears in the MAC_Binding
table:
> > > > > >
> > > > > > _uuid               : 85e30e87-3c59-423e-8681-ec4cfd9205f9
> > > > > > datapath            : ac5984b9-0fea-485f-84d4-031bdeced29b
> > > > > > ip                  : "172.24.4.221"
> > > > > > logical_port        : "lrp02"
> > > > > > mac                 : "00:00:02:01:02:04"
> > > > > >
> > > > > >
> > > > > > Now, if LR1 gets removed and the old gateway IP (172.24.4.221)
is reused
> > > > > > for VM2 FIP with different MAC and new gateway IP is created
(for
> > > > > > example 172.24.4.222 00:00:02:01:02:99),  VM2 FIP becomes
unreachable
> > > > > > from VM1 until the old MAC_Binding entry gets deleted as pinging
> > > > > > 172.24.4.221 will use the wrong address ("00:00:02:01:02:04").
> > > > > >
> > > > > > With the patch below, removing LR1 results in deleting all
MAC_Binding
> > > > > > entries for every datapath where '172.24.4.221' appears in the
'ip'
> > > > > > column so the problem goes away.
> > > > > >
> > > > > > Another solution would be implementing some kind of 'aging' for
> > > > > > MAC_Binding entries but perhaps it's more complex.
> > > > > > Looking forward for your comments :)
> > > > > >
> > > > > >
> > > > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > > > > > index 58bef7d..a86733e 100644
> > > > > > --- a/ovn/northd/ovn-northd.c
> > > > > > +++ b/ovn/northd/ovn-northd.c
> > > > > > @@ -2324,6 +2324,18 @@ cleanup_mac_bindings(struct
northd_context *ctx,
> > > > > > struct hmap *ports)
> > > > > >       }
> > > > > >   }
> > > > > >
> > > > > > +static void
> > > > > > +delete_mac_binding_by_ip(struct northd_context *ctx, const
char *ip)
> > > > > > +{
> > > > > > +    const struct sbrec_mac_binding *b, *n;
> > > > > > +    SBREC_MAC_BINDING_FOR_EACH_SAFE (b, n, ctx->ovnsb_idl) {
> > > > > > +        if (strstr(ip, b->ip)) {
> > > > > > +            sbrec_mac_binding_delete(b);
> > > > > > +        }
> > > > > > +    }
> > > > > > +}
> > > > > > +
> > > > > > +
> > > > > >   /* Updates the southbound Port_Binding table so that it
contains the
> > > > > > logical
> > > > > >    * switch ports specified by the northbound database.
> > > > > >    *
> > > > > > @@ -2383,6 +2395,15 @@ build_ports(struct northd_context *ctx,
> > > > > >       /* Delete southbound records without northbound matches.
*/
> > > > > >       LIST_FOR_EACH_SAFE(op, next, list, &sb_only) {
> > > > > >           ovs_list_remove(&op->list);
> > > > > > +
> > > > > > +        /* Delete all MAC_Binding entries which match the IP
addresses
> > > > > > of the
> > > > > > +         * deleted logical router port (ie. port with a peer).
*/
> > > > > > +        const char *peer = smap_get(&op->sb->options, "peer");
> > > > > > +        if (peer) {
> > > > > > +            for (int i = 0; i < op->sb->n_mac; i++) {
> > > > > > +                delete_mac_binding_by_ip(ctx, op->sb->mac[i]);
> > > > > > +            }
> > > > > > +        }
> > > > > >           sbrec_port_binding_delete(op->sb);
> > > > > >           ovn_port_destroy(ports, op);
> > > > > >       }
> > > > > >
> > > >
> > > > Hi,
> > > >
> > > > Sorry that I didn't notice this discussion until now. I encountered
similar problems before. It was not in floating IP scenario, but for
external IPs - ports on the same networks but not aware by OVN. When IP
relocates from one MAC to another, the previous mac-binding entry will not
get updated and therefore the re-located IP is unreachable.
> > > >
> > > > This happens for external router IPs on the localnet network behind
the gateways (which hosts the 172.24.4.221 port in Daniel's example). It
also happens for nested workloads that run inside a VM - the VM port is
known by OVN, but the internal workloads (e.g. containers) runs on same
subnets but relies on mac-binding to communicate.
> > > >
> > > > For both of my use cases, the problem has been solved by this patch
(merged):
https://github.com/openvswitch/ovs/commit/b068454082f5d76727ffde34542ff19fed20e178
> > > >
> > > > The idea is, mac-binding entry should be updated when the IP is
announced in a new location by GARP/ARP request/ARP response. So I think
the best way to solve the problem for floating IP is similar. We just need
to generate GARP when a new FIP is attached. I was under the impression
that OVN already supports GARP when a new NAT entry is added. But if the
problem is still there it means something is wrong there (or the GARP
feature is not there yet for the NAT case), and I need to check the code.
> > >
> > > I think we're only sending the GARPs only for distributed floating IPs
> > > (nat_addresses field in the Port_Binding table) [0].
> > > Anyways even with that, I'm not quite sure if the MAC_Binding table
> > > would get updated as I think that first time I hit this issue it was
> > > on a DVR environment (ie. distributed FIPs, dnat_and_snat entries with
> > > a logical_port and external_mac).
> > >
> > > [0]
https://github.com/openvswitch/ovs/blob/master/ovn/controller/pinctrl.c#L2497
> > > >
> > > > For the patch proposed in this discussion, I think there are two
problems.
> > > >
> > > > Firstly, I think it doesn't solve the problem completely. It only
deletes mac-binding when a logical router port is deleted. However, in any
of the above use cases (including FIP), IP relocation can happen without
deleting the router port. Or did I misunderstood anything here?
> > > >
> > > > Secondly, northd just reconciles between current state and desired
state for SB - it is declarative. We should avoid relying on the northd
cleanup logic to trigger important operations. I think the design principle
of northd should be making sure the desired state is reached, but not care
about how is it reached. For example, it can be reached by deleting extra
records one by one, but it is also correct if it deletes everything and
recreate the desired entries - this is just an example, it may be
inefficient, but it may be reasonable in some scenarios. Adding logic in
northd that relies on *how* the desired state is computed would make it
unreliable and hard to maintain. I think it would also create challenges
for the DDlog implementation.
> > > >
> > > > For the mac-binding aging mechanism mentioned by Daniel, I agree.
It is required for fault scenarios when SB is temporarily down. Since we
rely on SB DB to store the ARP cache/Neighbor table for the virtual
routers, if ARP updates happens when the DB is down, changes are lost.
However, the aging mechanism seems tricky when scale is considered. Only
the idle entries should be timed out, but it is costly to update states
whenever a mac-binding entry is hit. I haven't thought about any clever way
to achieve it without sacrificing scalability. Any thoughts here? A
workaround to the problem is to resend GARP periodically (e.g. every 1 min).
> > > >
> > > > Thanks,
> > > > Han

Hi Daniel,

Thanks for digging it out! Now the root cause is well explained, and the
RFC patch looks good to me. It is a simple and correct fix. However, I'd
like to bring up an alternative for discussion.

Alternatively, since the NAT IPs are already known in NB, instead of
relying on dynamic ARP (mac-binding), the problem can be solved from northd
directly by adding static ARP-resolving flows for floating IPs, just like
how it is handling the IPs in "addresses" column of lports. This would be
more straightforward and reliable than relying on mac-binding updating by
GARP.

I think we can still go ahead with the mac-binding patch, since it fixes an
improperly injected broadcast packet, and I don't see any negative impact.
In addition, we may think of the northd change as a further improvement of
the feature, if necessary. What do you think?

Thanks,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20181201/e97875d4/attachment-0001.html>


More information about the discuss mailing list