[ovs-dev] [ovs-discuss] Commit 355fef6f2 seems to break connectivity in my setup

Ilya Maximets i.maximets at ovn.org
Sat May 14 00:10:17 UTC 2022


On 5/13/22 10:36, Frode Nordahl wrote:
> On Fri, Mar 11, 2022 at 2:04 PM Liam Young <liam.young at canonical.com> wrote:
>>
>> Hi,
>>
>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my setup</tl;dr>
> 
> After OVN started colocating sNAT and dNAT operations in the same CT
> zone [0] the above mentioned OVS commit appears to also break OVN [1].
> I have been discussing with Numan and he has found a correlation with
> the behavior of the open vswitch kernel data path conntrack
> `skb_nfct_cached` function, i.e. if that is changed to always return
> 'false' the erratic behavior disappears.
> 
> So I guess the question then becomes, is this a OVS userspace or OVS
> kernel problem?
> 
> We have a reproducer in [3].
> 
> 0: https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a
> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
> 2: https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683
> 3: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6
> 

Hrm.  I think, there is a logical bug in implementations of ct()
datapath action in both datapaths.

The problem appears when the OpenFlow pipeline for the packet contains
several ct() actions.  NXAST_CT action (i.e. every ct() action) must
always put the packet into an untracked state.  Tracked state will be
set only in the 'recirc_table' where the packet is cloned by the ct()
action for further processing.

If an OF pipeline looks like this:

  actions=ct(),something,something,ct(),something

each action must be entered with the 'untracked' conntrack state in
the packet metadata.

However, ct() action in the datapath, unlike OpenFlow, doesn't work
this way.  It modifies the conntrack state for the packet it is processing.
During the flow translation OVS inserts recirculation right after the
datapath ct() action.  This way conntrack state affects only processing
after recirculation.  Sort of.

The issue is that not all OpenFlow ct() actions have recirc_table
specified.  These actions supposed to change the state of the conntrack
subsystem, but they still must leave the packet itself in the untracked
state with no strings attached.  But since datapath ct() actions doesn't
work that way and since recirculation is not inserted (no recirc_table
specified), packet after conntrack execution carries the state along to
all other actions.
This doesn't impact normal actions, but seems to break subsequent ct()
actions on the same pipeline.

In general, it looks to me that ct() action in the datapath is
not supposed to cache any connection data inside the packet's metadata.
This seems to be the root cause of the problem.  Fields that OF knows
about should not trigger any issues if carried along with a packet,
but datapath-specific metadata can not be cleared, because OFproto
simply doesn't know about it.

But, I guess, not caching the connection and other internal structures
might significantly impact datapath performance.  Will it?

Looking at the reproducer in [3], it has, for example, the flow with the
following actions:

  actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
          set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
          set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
          ct(zone=8),recirc(0x4)

So, if the first ct() will change some datapath-specific packet metadata,
the second ct() may try to use that information.

It looks like during the flow translation we must add ct_clear action
explicitly after every ct() action, unless it was the last action in
the list.  This will end up with adding a lot of ct_clear actions
everywhere.

Another option is the patch below that tries to track if ct_clear
is required and adds that action before the next ct() action in
the datapath.

BTW, the test [3] fails with both kernel and userspace datapaths.

The change below should fix the problem, I think.
It looks like we also have to put ct_clear action before translating
output to the patch port if we're in 'conntracked' state.

And I do not know how to fix the problem for kernels without ct_clear
support.  I don't think we can clear metadata that is internal to
the kernel.  Ideas are welcome.

CC: Aaron, Paolo.

Any thoughts?

Best regards, Ilya Maximets.

---
diff --git a/include/openvswitch/ofp-packet.h b/include/openvswitch/ofp-packet.h
index 77128d829..26b07e07f 100644
--- a/include/openvswitch/ofp-packet.h
+++ b/include/openvswitch/ofp-packet.h
@@ -133,6 +133,9 @@ struct ofputil_packet_in_private {
     /* NXCPT_CONNTRACKED. */
     bool conntracked;
 
+    /* NXCPT_PENDING_CT_CLEAR. */
+    bool pending_ct_clear;
+
     /* NXCPT_ACTIONS. */
     struct ofpact *actions;
     size_t actions_len;
diff --git a/lib/ofp-packet.c b/lib/ofp-packet.c
index 9485ddfc9..1df092471 100644
--- a/lib/ofp-packet.c
+++ b/lib/ofp-packet.c
@@ -425,6 +425,7 @@ enum nx_continuation_prop_type {
     NXCPT_ACTIONS,
     NXCPT_ACTION_SET,
     NXCPT_ODP_PORT,
+    NXCPT_PENDING_CT_CLEAR,
 };
 
 /* Only NXT_PACKET_IN2 (not NXT_RESUME) should include NXCPT_USERDATA, so this
@@ -460,6 +461,10 @@ ofputil_put_packet_in_private(const struct ofputil_packet_in_private *pin,
         ofpprop_put_flag(msg, NXCPT_CONNTRACKED);
     }
 
+    if (pin->pending_ct_clear) {
+        ofpprop_put_flag(msg, NXCPT_PENDING_CT_CLEAR);
+    }
+
     if (pin->actions_len) {
         /* Divide 'pin->actions' into groups that begins with an
          * unroll_xlate action.  For each group, emit a NXCPT_TABLE_ID and
@@ -863,6 +868,10 @@ ofputil_decode_packet_in_private(const struct ofp_header *oh, bool loose,
             pin->conntracked = true;
             break;
 
+        case NXCPT_PENDING_CT_CLEAR:
+            pin->pending_ct_clear = true;
+            break;
+
         case NXCPT_TABLE_ID:
             error = ofpprop_parse_u8(&payload, &table_id);
             break;
diff --git a/ofproto/ofproto-dpif-rid.h b/ofproto/ofproto-dpif-rid.h
index 4df630c62..af6aee475 100644
--- a/ofproto/ofproto-dpif-rid.h
+++ b/ofproto/ofproto-dpif-rid.h
@@ -153,6 +153,7 @@ struct frozen_state {
     size_t stack_size;
     mirror_mask_t mirrors;        /* Mirrors already output. */
     bool conntracked;             /* Conntrack occurred prior to freeze. */
+    bool pending_ct_clear;        /* ct_clear required before the next ct(). */
     bool was_mpls;                /* MPLS packet */
     struct uuid xport_uuid;       /* UUID of 1st port packet received on. */
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 8e5d030ac..2826b3d10 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -396,6 +396,10 @@ struct xlate_ctx {
      * state from the datapath should be honored after thawing. */
     bool conntracked;
 
+    /* True if datapath ct_clear action has to be performed before the next
+     * ct() action. */
+    bool pending_ct_clear;
+
     /* Pointer to an embedded NAT action in a conntrack action, or NULL. */
     struct ofpact_nat *ct_nat_action;
 
@@ -3858,6 +3862,7 @@ patch_port_output(struct xlate_ctx *ctx, const struct xport *in_dev,
     struct flow old_flow = ctx->xin->flow;
     struct flow_tnl old_flow_tnl_wc = ctx->wc->masks.tunnel;
     bool old_conntrack = ctx->conntracked;
+    bool old_ct_clear = ctx->pending_ct_clear;
     bool old_was_mpls = ctx->was_mpls;
     ovs_version_t old_version = ctx->xin->tables_version;
     struct ofpbuf old_stack = ctx->stack;
@@ -3877,6 +3882,12 @@ patch_port_output(struct xlate_ctx *ctx, const struct xport *in_dev,
     ctx->wc->masks.tunnel.metadata.tab = flow->tunnel.metadata.tab;
     memset(flow->regs, 0, sizeof flow->regs);
     flow->actset_output = OFPP_UNSET;
+    if (ctx->conntracked) {
+        /* Patch ports do not carry conntrack information, so any conntrack
+         * state must be removed from the packet before processing ct actions
+         * in another bridge. */
+        ctx->pending_ct_clear = true;
+    }
     clear_conntrack(ctx);
     ctx->xin->trace = xlate_report(ctx, OFT_BRIDGE, "bridge(\"%s\")",
                                    out_dev->xbridge->name);
@@ -3945,6 +3956,7 @@ patch_port_output(struct xlate_ctx *ctx, const struct xport *in_dev,
     /* The out bridge's conntrack execution should have no effect on the
      * original bridge. */
     ctx->conntracked = old_conntrack;
+    ctx->pending_ct_clear = old_ct_clear;
 
     /* The fact that the out bridge exits (for any reason) does not mean
      * that the original bridge should exit.  Specifically, if the out
@@ -4930,6 +4942,7 @@ xlate_controller_action(struct xlate_ctx *ctx, int len,
         .stack_size = ctx->stack.size,
         .mirrors = ctx->mirrors,
         .conntracked = ctx->conntracked,
+        .pending_ct_clear = ctx->pending_ct_clear,
         .was_mpls = ctx->was_mpls,
         .ofpacts = NULL,
         .ofpacts_len = 0,
@@ -5005,6 +5018,7 @@ finish_freezing__(struct xlate_ctx *ctx, uint8_t table)
         .stack_size = ctx->stack.size,
         .mirrors = ctx->mirrors,
         .conntracked = ctx->conntracked,
+        .pending_ct_clear = ctx->pending_ct_clear,
         .was_mpls = ctx->was_mpls,
         .xport_uuid = ctx->xin->xport_uuid,
         .ofpacts = ctx->frozen_actions.data,
@@ -5835,6 +5849,7 @@ clone_xlate_actions(const struct ofpact *actions, size_t actions_len,
     struct flow old_base = ctx->base_flow;
     bool old_was_mpls = ctx->was_mpls;
     bool old_conntracked = ctx->conntracked;
+    bool old_ct_clear = ctx->pending_ct_clear;
 
     /* The actions are not reversible, a datapath clone action is
      * required to encode the translation. Select the clone action
@@ -5883,6 +5898,7 @@ dp_clone_done:
     /* The clone's conntrack execution should have no effect on the original
      * packet. */
     ctx->conntracked = old_conntracked;
+    ctx->pending_ct_clear = old_ct_clear;
 
     /* Popping MPLS from the clone should have no effect on the original
      * packet. */
@@ -6128,6 +6144,18 @@ freeze_unroll_actions(const struct ofpact *a, const struct ofpact *end,
     }
 }
 
+static void
+compose_ct_clear_action(struct xlate_ctx *ctx)
+{
+    clear_conntrack(ctx);
+    ctx->pending_ct_clear = false;
+    /* This action originally existed without dpif support. So to preserve
+     * compatibility, only append it if the dpif supports it. */
+    if (ctx->xbridge->support.ct_clear) {
+        nl_msg_put_flag(ctx->odp_actions,  OVS_ACTION_ATTR_CT_CLEAR);
+    }
+}
+
 static void
 put_ct_mark(const struct flow *flow, struct ofpbuf *odp_actions,
             struct flow_wildcards *wc)
@@ -6336,25 +6364,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
         compose_recirculate_and_fork(ctx, ofc->recirc_table, zone);
     }
 
+    /* Datapath will update the conntrack state for the packet, but
+     * NXAST_CT action must always put the packet into an untracked
+     * state.  Execution continues in the current table without
+     * recirculation, hence conntrack state should not be available.
+     * We'll ask the datapath to clear what OVS_ACTION_ATTR_CT might
+     * have changed in the packet metadata before the next ct() action. */
+    ctx->pending_ct_clear = true;
+
     ctx->ct_nat_action = NULL;
 
     /* The ct_* fields are only available in the scope of the 'recirc_table'
      * call chain. */
-    flow_clear_conntrack(&ctx->xin->flow);
     xlate_report(ctx, OFT_DETAIL, "Sets the packet to an untracked state, "
                  "and clears all the conntrack fields.");
-    ctx->conntracked = false;
-}
-
-static void
-compose_ct_clear_action(struct xlate_ctx *ctx)
-{
     clear_conntrack(ctx);
-    /* This action originally existed without dpif support. So to preserve
-     * compatibility, only append it if the dpif supports it. */
-    if (ctx->xbridge->support.ct_clear) {
-        nl_msg_put_flag(ctx->odp_actions,  OVS_ACTION_ATTR_CT_CLEAR);
-    }
 }
 
 /* check_pkt_larger action checks the packet length and stores the
@@ -6415,6 +6439,7 @@ xlate_check_pkt_larger(struct xlate_ctx *ctx,
     struct flow old_base = ctx->base_flow;
     bool old_was_mpls = ctx->was_mpls;
     bool old_conntracked = ctx->conntracked;
+    bool old_ct_clear = ctx->pending_ct_clear;
 
     size_t offset = nl_msg_start_nested(ctx->odp_actions,
                                         OVS_ACTION_ATTR_CHECK_PKT_LEN);
@@ -6436,6 +6461,7 @@ xlate_check_pkt_larger(struct xlate_ctx *ctx,
     ctx->base_flow = old_base;
     ctx->was_mpls = old_was_mpls;
     ctx->conntracked = old_conntracked;
+    ctx->pending_ct_clear = old_ct_clear;
     ctx->xin->flow = old_flow;
 
     /* If the flow translation for the IF_GREATER case requires freezing,
@@ -6466,6 +6492,7 @@ xlate_check_pkt_larger(struct xlate_ctx *ctx,
     ctx->base_flow = old_base;
     ctx->was_mpls = old_was_mpls;
     ctx->conntracked = old_conntracked;
+    ctx->pending_ct_clear = old_ct_clear;
     ctx->xin->flow = old_flow;
     ctx->exit = old_exit;
 }
@@ -7316,11 +7343,17 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
         }
 
         case OFPACT_CT:
+            if (ctx->pending_ct_clear) {
+                /* Previous ct() action might have made some changes but
+                 * the packet must be untracked.  Clearing the state before
+                 * executing a new ct() action. */
+                compose_ct_clear_action(ctx);
+            }
             compose_conntrack_action(ctx, ofpact_get_CT(a), last);
             break;
 
         case OFPACT_CT_CLEAR:
-            if (ctx->conntracked) {
+            if (ctx->conntracked || ctx->pending_ct_clear) {
                 compose_ct_clear_action(ctx);
             }
             break;
@@ -7710,6 +7743,7 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
 
         .was_mpls = false,
         .conntracked = false,
+        .pending_ct_clear = false,
 
         .ct_nat_action = NULL,
 
@@ -7774,6 +7808,8 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
             clear_conntrack(&ctx);
         }
 
+        ctx.pending_ct_clear = state->pending_ct_clear;
+
         /* Restore pipeline metadata. May change flow's in_port and other
          * metadata to the values that existed when freezing was triggered. */
         frozen_metadata_to_flow(&ctx.xbridge->ofproto->up,
@@ -8113,6 +8149,7 @@ xlate_resume(struct ofproto_dpif *ofproto,
         .stack_size = pin->stack_size,
         .mirrors = pin->mirrors,
         .conntracked = pin->conntracked,
+        .pending_ct_clear = pin->pending_ct_clear,
         .xport_uuid = UUID_ZERO,
 
         /* When there are no actions, xlate_actions() will search the flow
---


More information about the dev mailing list