[ovs-dev] [PATCH ovn v3 2/2] northd: Provide the option to not use ct.inv in lflows.
numans at ovn.org
numans at ovn.org
Mon Apr 19 20:58:03 UTC 2021
From: Numan Siddique <numans at ovn.org>
Presently we add 65535 priority lflows in the stages -
'ls_in_acl' and 'ls_out_acl' to drop packets which
match on 'ct.inv'.
This patch provides an option to not use this field in the
logical flow matches for the following
reasons:
• Some of the smart NICs which support offloading datapath
flows may not support this field. In such cases, almost
all the datapath flows (matching on the ct_state field +inv
or -inv cannot be offloaded if ACLs/load balancers
are configured on the logical switch datapath.
• A recent commit in kernel ovs datapath sets the committed
connection tracking entry to be liberal for out-of-window
tcp packets (nf_ct_set_tcp_be_liberal()). Such TCP
packets will not be marked as invalid.
There are still certain scenarios where a packet can be marked
invalid. Like
• If the tcp flags are not correct.
• If there are checksum errors.
In such cases, the packet will be delivered to the destination
port. So CMS should evaluate their usecases before enabling
this option.
Signed-off-by: Numan Siddique <numans at ovn.org>
Co-authored-by: Ben Pfaff <blp at ovn.org>
Signed-off-by: Ben Pfaff <blp at ovn.org>
---
v2 -> v3
----
* Updated the documentation in ovn-nb.xml as per Mark G's suggestion.
v1 -> v2
----
* Adopted the code changes shared by Ben Pfaff for the ddlog code.
NEWS | 7 +-
northd/lswitch.dl | 7 +
northd/ovn-northd.c | 40 +++---
northd/ovn_northd.dl | 304 ++++++++++++++++++++++---------------------
ovn-nb.xml | 15 +++
5 files changed, 207 insertions(+), 166 deletions(-)
diff --git a/NEWS b/NEWS
index 1ddde15f86..5f1365775d 100644
--- a/NEWS
+++ b/NEWS
@@ -6,11 +6,16 @@ Post-v21.03.0
expected to scale better than the C implementation, for large deployments.
(This may take testing and tuning to be effective.) This version of OVN
requires DDLog 0.36.
- - Introduce ovn-controller incremetal processing engine statistics
+ - Introduce ovn-controller incremental processing engine statistics
- Introduced parallel processing in ovn-northd with the NB_Global config option
'use_parallel_build' to enable it. It is disabled by default.
- Support vlan-passthru mode for tag=0 localnet ports.
- Support custom 802.11ad EthType for localnet ports.
+ - Add a new NB Global option - use_ct_inv_match with the default value of true.
+ If this is set to false, then the logical field - "ct.inv" will not be
+ used in the logical flow matches. CMS can consider setting this to false,
+ if they want to use smart NICs which don't support offloading datapath flows
+ with this field used.
OVN v21.03.0 - 12 Mar 2021
-------------------------
diff --git a/northd/lswitch.dl b/northd/lswitch.dl
index 47c497e0cf..27ac3cadc5 100644
--- a/northd/lswitch.dl
+++ b/northd/lswitch.dl
@@ -742,3 +742,10 @@ function put_svc_monitor_mac(options: Map<string,string>,
relation SvcMonitorMac(mac: eth_addr)
SvcMonitorMac(get_svc_monitor_mac(options, uuid)) :-
nb::NB_Global(._uuid = uuid, .options = options).
+
+relation UseCtInvMatch[bool]
+UseCtInvMatch[options.get_bool_def("use_ct_inv_match", true)] :-
+ nb::NB_Global(.options = options).
+UseCtInvMatch[true] :-
+ Unit(),
+ not nb in nb::NB_Global().
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index de1aa896d3..750e943ab8 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -4099,6 +4099,9 @@ do_ovn_lflow_add(struct hmap *lflow_map, bool shared,
hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
}
+/* If this option is 'true' northd will make use of ct.inv match fields.
+ * Otherwise, it will avoid using it. The default is true. */
+static bool use_ct_inv_match = true;
/* Adds a row with the specified contents to the Logical_Flow table. */
static void
@@ -5712,12 +5715,13 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows,
* for deletion (bit 0 of ct_label is set).
*
* This is enforced at a higher priority than ACLs can be defined. */
+ char *match = xasprintf("%s(ct.est && ct.rpl && ct_label.blocked == 1)",
+ use_ct_inv_match ? "ct.inv || " : "");
ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
- "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
- "drop;");
+ match, "drop;");
ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
- "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
- "drop;");
+ match, "drop;");
+ free(match);
/* Ingress and Egress ACL Table (Priority 65535).
*
@@ -5728,14 +5732,15 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows,
* direction to hit the currently defined policy from ACLs.
*
* This is enforced at a higher priority than ACLs can be defined. */
+ match = xasprintf("ct.est && !ct.rel && !ct.new%s && "
+ "ct.rpl && ct_label.blocked == 0",
+ use_ct_inv_match ? " && !ct.inv" : "");
+
ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
- "ct.est && !ct.rel && !ct.new && !ct.inv "
- "&& ct.rpl && ct_label.blocked == 0",
- "next;");
+ match, "next;");
ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
- "ct.est && !ct.rel && !ct.new && !ct.inv "
- "&& ct.rpl && ct_label.blocked == 0",
- "next;");
+ match, "next;");
+ free(match);
/* Ingress and Egress ACL Table (Priority 65535).
*
@@ -5748,14 +5753,14 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows,
* a dynamically negotiated FTP data channel), but will allow
* related traffic such as an ICMP Port Unreachable through
* that's generated from a non-listening UDP port. */
+ match = xasprintf("!ct.est && ct.rel && !ct.new%s && "
+ "ct_label.blocked == 0",
+ use_ct_inv_match ? " && !ct.inv" : "");
ovn_lflow_add(lflows, od, S_SWITCH_IN_ACL, UINT16_MAX,
- "!ct.est && ct.rel && !ct.new && !ct.inv "
- "&& ct_label.blocked == 0",
- "next;");
+ match, "next;");
ovn_lflow_add(lflows, od, S_SWITCH_OUT_ACL, UINT16_MAX,
- "!ct.est && ct.rel && !ct.new && !ct.inv "
- "&& ct_label.blocked == 0",
- "next;");
+ match, "next;");
+ free(match);
/* Ingress and Egress ACL Table (Priority 65535).
*
@@ -13195,6 +13200,9 @@ ovnnb_db_run(struct northd_context *ctx,
use_logical_dp_groups = smap_get_bool(&nb->options,
"use_logical_dp_groups", false);
+ use_ct_inv_match = smap_get_bool(&nb->options,
+ "use_ct_inv_match", true);
+
/* deprecated, use --event instead */
controller_event_en = smap_get_bool(&nb->options,
"controller_event", false);
diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index 2f5d36c599..a0f7944103 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -2280,173 +2280,179 @@ for (Reject(lsuuid, pipeline, stage, acl, fair_meter, extra_match_, extra_action
}
/* build_acls */
-for (sw in &Switch(.ls = ls))
-var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
-{
- /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
- * default. A related rule at priority 1 is added below if there
- * are any stateful ACLs in this datapath. */
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_IN_ACL(),
- .priority = 0,
- .__match = "1",
- .actions = "next;",
- .external_ids = map_empty());
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_OUT_ACL(),
- .priority = 0,
- .__match = "1",
- .actions = "next;",
- .external_ids = map_empty());
-
- if (has_stateful) {
- /* Ingress and Egress ACL Table (Priority 1).
- *
- * By default, traffic is allowed. This is partially handled by
- * the Priority 0 ACL flows added earlier, but we also need to
- * commit IP flows. This is because, while the initiater's
- * direction may not have any stateful rules, the server's may
- * and then its return traffic would not have an associated
- * conntrack entry and would return "+invalid".
- *
- * We use "ct_commit" for a connection that is not already known
- * by the connection tracker. Once a connection is committed,
- * subsequent packets will hit the flow at priority 0 that just
- * uses "next;"
- *
- * We also check for established connections that have ct_label.blocked
- * set on them. That's a connection that was disallowed, but is
- * now allowed by policy again since it hit this default-allow flow.
- * We need to set ct_label.blocked=0 to let the connection continue,
- * which will be done by ct_commit() in the "stateful" stage.
- * Subsequent packets will hit the flow at priority 0 that just
- * uses "next;". */
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_IN_ACL(),
- .priority = 1,
- .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
- .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
- .external_ids = map_empty());
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_OUT_ACL(),
- .priority = 1,
- .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
- .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
- .external_ids = map_empty());
-
- /* Ingress and Egress ACL Table (Priority 65535).
- *
- * Always drop traffic that's in an invalid state. Also drop
- * reply direction packets for connections that have been marked
- * for deletion (bit 0 of ct_label is set).
- *
- * This is enforced at a higher priority than ACLs can be defined. */
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_IN_ACL(),
- .priority = 65535,
- .__match = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
- .actions = "drop;",
- .external_ids = map_empty());
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_OUT_ACL(),
- .priority = 65535,
- .__match = "ct.inv || (ct.est && ct.rpl && ct_label.blocked == 1)",
- .actions = "drop;",
- .external_ids = map_empty());
-
- /* Ingress and Egress ACL Table (Priority 65535).
- *
- * Allow reply traffic that is part of an established
- * conntrack entry that has not been marked for deletion
- * (bit 0 of ct_label). We only match traffic in the
- * reply direction because we want traffic in the request
- * direction to hit the currently defined policy from ACLs.
- *
- * This is enforced at a higher priority than ACLs can be defined. */
+for (UseCtInvMatch[use_ct_inv_match]) {
+ (var ct_inv_or, var and_not_ct_inv) = match (use_ct_inv_match) {
+ true -> ("ct.inv || ", "&& !ct.inv "),
+ false -> ("", ""),
+ } in
+ for (sw in &Switch(.ls = ls))
+ var has_stateful = sw.has_stateful_acl or sw.has_lb_vip in
+ {
+ /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
+ * default. A related rule at priority 1 is added below if there
+ * are any stateful ACLs in this datapath. */
Flow(.logical_datapath = ls._uuid,
.stage = s_SWITCH_IN_ACL(),
- .priority = 65535,
- .__match = "ct.est && !ct.rel && !ct.new && !ct.inv "
- "&& ct.rpl && ct_label.blocked == 0",
+ .priority = 0,
+ .__match = "1",
.actions = "next;",
.external_ids = map_empty());
Flow(.logical_datapath = ls._uuid,
.stage = s_SWITCH_OUT_ACL(),
- .priority = 65535,
- .__match = "ct.est && !ct.rel && !ct.new && !ct.inv "
- "&& ct.rpl && ct_label.blocked == 0",
+ .priority = 0,
+ .__match = "1",
.actions = "next;",
.external_ids = map_empty());
- /* Ingress and Egress ACL Table (Priority 65535).
- *
- * Allow traffic that is related to an existing conntrack entry that
- * has not been marked for deletion (bit 0 of ct_label).
- *
- * This is enforced at a higher priority than ACLs can be defined.
- *
- * NOTE: This does not support related data sessions (eg,
- * a dynamically negotiated FTP data channel), but will allow
- * related traffic such as an ICMP Port Unreachable through
- * that's generated from a non-listening UDP port. */
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_IN_ACL(),
- .priority = 65535,
- .__match = "!ct.est && ct.rel && !ct.new && !ct.inv "
- "&& ct_label.blocked == 0",
- .actions = "next;",
- .external_ids = map_empty());
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_OUT_ACL(),
- .priority = 65535,
- .__match = "!ct.est && ct.rel && !ct.new && !ct.inv "
- "&& ct_label.blocked == 0",
- .actions = "next;",
- .external_ids = map_empty());
+ if (has_stateful) {
+ /* Ingress and Egress ACL Table (Priority 1).
+ *
+ * By default, traffic is allowed. This is partially handled by
+ * the Priority 0 ACL flows added earlier, but we also need to
+ * commit IP flows. This is because, while the initiater's
+ * direction may not have any stateful rules, the server's may
+ * and then its return traffic would not have an associated
+ * conntrack entry and would return "+invalid".
+ *
+ * We use "ct_commit" for a connection that is not already known
+ * by the connection tracker. Once a connection is committed,
+ * subsequent packets will hit the flow at priority 0 that just
+ * uses "next;"
+ *
+ * We also check for established connections that have ct_label.blocked
+ * set on them. That's a connection that was disallowed, but is
+ * now allowed by policy again since it hit this default-allow flow.
+ * We need to set ct_label.blocked=0 to let the connection continue,
+ * which will be done by ct_commit() in the "stateful" stage.
+ * Subsequent packets will hit the flow at priority 0 that just
+ * uses "next;". */
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_IN_ACL(),
+ .priority = 1,
+ .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
+ .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
+ .external_ids = map_empty());
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_OUT_ACL(),
+ .priority = 1,
+ .__match = "ip && (!ct.est || (ct.est && ct_label.blocked == 1))",
+ .actions = "${rEGBIT_CONNTRACK_COMMIT()} = 1; next;",
+ .external_ids = map_empty());
- /* Ingress and Egress ACL Table (Priority 65535).
- *
- * Not to do conntrack on ND packets. */
+ /* Ingress and Egress ACL Table (Priority 65535).
+ *
+ * Always drop traffic that's in an invalid state. Also drop
+ * reply direction packets for connections that have been marked
+ * for deletion (bit 0 of ct_label is set).
+ *
+ * This is enforced at a higher priority than ACLs can be defined. */
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_IN_ACL(),
+ .priority = 65535,
+ .__match = ct_inv_or ++ "(ct.est && ct.rpl && ct_label.blocked == 1)",
+ .actions = "drop;",
+ .external_ids = map_empty());
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_OUT_ACL(),
+ .priority = 65535,
+ .__match = ct_inv_or ++ "(ct.est && ct.rpl && ct_label.blocked == 1)",
+ .actions = "drop;",
+ .external_ids = map_empty());
+
+ /* Ingress and Egress ACL Table (Priority 65535).
+ *
+ * Allow reply traffic that is part of an established
+ * conntrack entry that has not been marked for deletion
+ * (bit 0 of ct_label). We only match traffic in the
+ * reply direction because we want traffic in the request
+ * direction to hit the currently defined policy from ACLs.
+ *
+ * This is enforced at a higher priority than ACLs can be defined. */
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_IN_ACL(),
+ .priority = 65535,
+ .__match = "ct.est && !ct.rel && !ct.new " ++ and_not_ct_inv ++
+ "&& ct.rpl && ct_label.blocked == 0",
+ .actions = "next;",
+ .external_ids = map_empty());
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_OUT_ACL(),
+ .priority = 65535,
+ .__match = "ct.est && !ct.rel && !ct.new " ++ and_not_ct_inv ++
+ "&& ct.rpl && ct_label.blocked == 0",
+ .actions = "next;",
+ .external_ids = map_empty());
+
+ /* Ingress and Egress ACL Table (Priority 65535).
+ *
+ * Allow traffic that is related to an existing conntrack entry that
+ * has not been marked for deletion (bit 0 of ct_label).
+ *
+ * This is enforced at a higher priority than ACLs can be defined.
+ *
+ * NOTE: This does not support related data sessions (eg,
+ * a dynamically negotiated FTP data channel), but will allow
+ * related traffic such as an ICMP Port Unreachable through
+ * that's generated from a non-listening UDP port. */
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_IN_ACL(),
+ .priority = 65535,
+ .__match = "!ct.est && ct.rel && !ct.new " ++ and_not_ct_inv ++
+ "&& ct_label.blocked == 0",
+ .actions = "next;",
+ .external_ids = map_empty());
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_OUT_ACL(),
+ .priority = 65535,
+ .__match = "!ct.est && ct.rel && !ct.new " ++ and_not_ct_inv ++
+ "&& ct_label.blocked == 0",
+ .actions = "next;",
+ .external_ids = map_empty());
+
+ /* Ingress and Egress ACL Table (Priority 65535).
+ *
+ * Not to do conntrack on ND packets. */
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_IN_ACL(),
+ .priority = 65535,
+ .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2",
+ .actions = "next;",
+ .external_ids = map_empty());
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_OUT_ACL(),
+ .priority = 65535,
+ .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2",
+ .actions = "next;",
+ .external_ids = map_empty())
+ };
+
+ /* Add a 34000 priority flow to advance the DNS reply from ovn-controller,
+ * if the CMS has configured DNS records for the datapath.
+ */
+ if (sw.has_dns_records) {
+ Flow(.logical_datapath = ls._uuid,
+ .stage = s_SWITCH_OUT_ACL(),
+ .priority = 34000,
+ .__match = "udp.src == 53",
+ .actions = if has_stateful "ct_commit; next;" else "next;",
+ .external_ids = map_empty())
+ };
+
+ /* Add a 34000 priority flow to advance the service monitor reply
+ * packets to skip applying ingress ACLs. */
Flow(.logical_datapath = ls._uuid,
.stage = s_SWITCH_IN_ACL(),
- .priority = 65535,
- .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2",
+ .priority = 34000,
+ .__match = "eth.dst == $svc_monitor_mac",
.actions = "next;",
.external_ids = map_empty());
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_OUT_ACL(),
- .priority = 65535,
- .__match = "nd || nd_ra || nd_rs || mldv1 || mldv2",
- .actions = "next;",
- .external_ids = map_empty())
- };
-
- /* Add a 34000 priority flow to advance the DNS reply from ovn-controller,
- * if the CMS has configured DNS records for the datapath.
- */
- if (sw.has_dns_records) {
Flow(.logical_datapath = ls._uuid,
.stage = s_SWITCH_OUT_ACL(),
.priority = 34000,
- .__match = "udp.src == 53",
- .actions = if has_stateful "ct_commit; next;" else "next;",
+ .__match = "eth.src == $svc_monitor_mac",
+ .actions = "next;",
.external_ids = map_empty())
- };
-
- /* Add a 34000 priority flow to advance the service monitor reply
- * packets to skip applying ingress ACLs. */
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_IN_ACL(),
- .priority = 34000,
- .__match = "eth.dst == $svc_monitor_mac",
- .actions = "next;",
- .external_ids = map_empty());
- Flow(.logical_datapath = ls._uuid,
- .stage = s_SWITCH_OUT_ACL(),
- .priority = 34000,
- .__match = "eth.src == $svc_monitor_mac",
- .actions = "next;",
- .external_ids = map_empty())
+ }
}
/* This stage builds hints for the IN/OUT_ACL stage. Based on various
diff --git a/ovn-nb.xml b/ovn-nb.xml
index feb38b5d31..e337be4eb3 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -240,6 +240,21 @@
</p>
</column>
+ <column name="options" key="use_ct_inv_match">
+ <p>
+ If set to false, <code>ovn-northd</code> will not use the
+ <code>ct.inv</code> field in any of the logical flow matches.
+ The default value is true. If the NIC supports offloading
+ OVS datapath flows but doesn't support offloading ct_state
+ <code>inv</code> flag, then the datapath flows matching on this flag
+ (either <code>+inv</code> or <code>-inv</code>) will not be
+ offloaded. CMS should consider setting <code>use_ct_inv_match</code>
+ to <code>false</code> in such cases. This results in a side effect
+ of the invalid packets getting delivered to the destination VIF, which
+ otherwise would have been dropped by <code>OVN</code>.
+ </p>
+ </column>
+
<group title="Options for configuring interconnection route advertisement">
<p>
These options control how routes are advertised between OVN
--
2.29.2
More information about the dev
mailing list