[ovs-dev] [PATCH] ofproto-dpif: Fix nondeterministic flow revalidation behavior.
Ben Pfaff
blp at nicira.com
Tue Dec 27 21:29:43 UTC 2011
I reposted this patch as the first in a longer series:
http://openvswitch.org/pipermail/dev/2011-December/013948.html
Please review it as part of that series.
On Thu, Dec 22, 2011 at 04:49:05PM -0800, Ben Pfaff wrote:
> SLB bonds are very strange beasts. It's taken OVS a while to figure out
> how they should really work. Way back in the mists of time, when we were
> in the midst of this process, we noticed that the following could happen:
>
> 1. Local VM sends a packet to the OVS bridge.
> 2. OVS bridge learns VM's MAC, forwards packets to SLB bond.
> 3. Remote switch hasn't learned packet's destination and
> forwards packet back to other interfaces in the SLB bond.
>
> Normally nothing bad happens in this scenario because OVS has already
> learned the local port for the VM's MAC in step 2 and the rules for SLB
> bonding (this is rule #2 in vswitchd/INTERNALS). But at the time we were
> implementing this, OVS didn't yet use active flows to keep MAC learning
> entries alive; only new flows prevented a MAC entry from aging out. So
> in steady state (e.g. just "ping" traffic) OVS would regularly forget MAC
> addresses. If the remote switch also happened to forward a packet back to
> one of the SLB bond interfaces, then OVS would learn the VM's address on
> the bond, with the result that any traffic coming in from the remote switch
> would be black-holed until the VM sent a new packet. This was not good.
>
> The fix we applied at the time was commit 2416b8ecea (bridge: Eject NORMAL
> flows without a learning entry from datapath.) followed by a small
> refinement in commit e96a4d8035 (bridge: Feed flow stats into learning
> table.). This fix causes flows that don't have a learning entry to be
> ejected from the datapath if revalidation occurs. This forced the next
> packet in the flow to go to userspace, which in turn caused learning to
> happen, fixing the problem.
>
> However, this isn't a good solution for several reasons:
>
> * It forces more packets to userspace, which is expensive.
>
> * It doesn't just affect the cases where it helps, those where an
> SLB bond is actually involved. (This could be fixed, but it is
> not worth it.)
>
> * It means that flow installability becomes nondeterministic. When
> the first packet shows up for a flow, we install it. But later
> if we revalidate it, we have to uninstall it. That doesn't make
> sense; a flow should be either installable or not installable,
> not some weird mix.
>
> Fortunately, the situation has improved since this fix was originally
> designed. First, active flows now keep MAC learning entries alive, since
> commit e96a4d8035367 (bridge: Feed flow stats into learning table.)
> Second, gratuitous ARP locking, added in commit 7febb9100b (bridge: Filte
> some gratuitous ARPs on bond slaves.) means that gratuitous ARPs reflected
> on bond slaves don't cause confusion (this is rule #4 in
> vswitchd/INTERNALS).
>
> These improvements mean that it is no longer necessary to have this
> strange special case at all. Therefore, this commit removes it.
>
> I found this while investigating reports from code that I added to
> occasionally check that flow actions were correct.
>
> Signed-off-by: Ben Pfaff <blp at nicira.com>
> ---
> ofproto/ofproto-dpif.c | 8 --------
> 1 files changed, 0 insertions(+), 8 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 32d90da..a84dcf9 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -5312,14 +5312,6 @@ xlate_normal(struct action_xlate_ctx *ctx)
> if (mac->port.p != in_bundle) {
> output_normal(ctx, mac->port.p, vlan);
> }
> - } else if (!ctx->packet && !eth_addr_is_multicast(ctx->flow.dl_dst)) {
> - /* If we are revalidating but don't have a learning entry then eject
> - * the flow. Installing a flow that floods packets opens up a window
> - * of time where we could learn from a packet reflected on a bond and
> - * blackhole packets before the learning table is updated to reflect
> - * the correct port. */
> - ctx->may_set_up_flow = false;
> - return;
> } else {
> struct ofbundle *bundle;
>
> --
> 1.7.2.5
>
More information about the dev
mailing list