[ovs-dev] [PATCH] ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.

Ilya Maximets i.maximets at ovn.org
Mon Nov 1 20:14:38 UTC 2021

Commit dc0bd12f5b04 removed restriction that tunnel endpoint must be a
bridge port.  So, currently OVS has to check if the native tunnel needs
to be terminated regardless of the output port.  Unfortunately, there
is a side effect: tnl_port_map_lookup() always adds at least 'dl_dst'
match to the megaflow that ends up in the corresponding datapath flow.
And since tunneling works on L3 level and not restricted by any
particular bridge, this extra match criteria is added to every
datapath flow on every bridge even if that bridge cannot be part of
a tunnel processing.

For example, if OVS has at least one tunnel configured and we're
adding a completely separate bridge with 2 ports and simple rules
to forward packets between two ports, there still will be a match on
a destination mac address:

 1. <create a tunnel configuration in OVS>
 2. ovs-vsctl add-br br-non-tunnel -- set bridge datapath_type=netdev
 3. ovs-vsctl add-port br-non-tunnel port0
           -- add-port br-non-tunnel port1
 4. ovs-ofctl del-flows br-non-tunnel
 5. ovs-ofctl add-flow br-non-tunnel in_port=port0,actions=port1
 6. ovs-ofctl add-flow br-non-tunnel in_port=port1,actions=port0

 # ovs-appctl ofproto/trace br-non-tunnel in_port=port0

 Flow: in_port=1,vlan_tci=0x0000,

  0. in_port=1, priority 32768

 Final flow: unchanged
 Megaflow: recirc_id=0,eth,in_port=1,dl_dst=00:00:00:00:00:00,dl_type=0x0000
 Datapath actions: 5                 ^^^^^^^^^^^^^^^^^^^^^^^^

This increases the number of upcalls and installed datapath flows,
since separate flow needs to be installed per destination MAC, reducing
the switching performance.  This also blocks datapath performance
optimizations that are based on the datapath flow simplicity.

In general, in order to be a tunnel endpoint, port has to have an IP
address.  Hence native tunnel termination should be attempted only
for such ports.  This allows to avoid extra matches in most cases.

Fixes: dc0bd12f5b04 ("userspace: Enable non-bridge port as tunnel endpoint.")
Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388904.html
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

I'm sure not an expert in this part of a code, but the change seems to pass
all unit and system tests.  It also passes OVN tests.  So, I'm assuming that
it's correct.

 ofproto/ofproto-dpif-xlate.c | 31 +++++++++++++++----
 tests/packet-type-aware.at   |  4 +--
 tests/tunnel-push-pop.at     | 58 ++++++++++++++++++++++++++++++++++++
 3 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 9d336bc6a..701902b0c 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4081,14 +4081,35 @@ is_neighbor_reply_correct(const struct xlate_ctx *ctx, const struct flow *flow)
 static bool
-terminate_native_tunnel(struct xlate_ctx *ctx, struct flow *flow,
-                        struct flow_wildcards *wc, odp_port_t *tnl_port)
+xport_has_ip(const struct xport *xport)
+    struct in6_addr *ip_addr, *mask;
+    int n_in6 = 0;
+    if (netdev_get_addr_list(xport->netdev, &ip_addr, &mask, &n_in6)) {
+        n_in6 = 0;
+    } else {
+        free(ip_addr);
+        free(mask);
+    }
+    return n_in6 ? true : false;
+static bool
+terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
+                        struct flow *flow, struct flow_wildcards *wc,
+                        odp_port_t *tnl_port)
     *tnl_port = ODPP_NONE;
     /* XXX: Write better Filter for tunnel port. We can use in_port
-     * in tunnel-port flow to avoid these checks completely. */
-    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)) {
+     * in tunnel-port flow to avoid these checks completely.
+     *
+     * Port without an IP address cannot be a tunnel termination point.
+     * Not performing a lookup in this case to avoid unwildcarding extra
+     * flow fields (dl_dst). */
+    if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)
+        && xport_has_ip(xport)) {
         *tnl_port = tnl_port_map_lookup(flow, wc);
         /* If no tunnel port was found and it's about an ARP or ICMPv6 packet,
@@ -4242,7 +4263,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
             native_tunnel_output(ctx, xport, flow, odp_port, truncate);
             flow->tunnel = flow_tnl; /* Restore tunnel metadata */
-        } else if (terminate_native_tunnel(ctx, flow, wc,
+        } else if (terminate_native_tunnel(ctx, xport, flow, wc,
                                            &odp_tnl_port)) {
             /* Intercept packet to be received on native tunnel port. */
             nl_msg_put_odp_port(ctx->odp_actions, OVS_ACTION_ATTR_TUNNEL_POP,
diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at
index 73aa14cea..054dcc9cc 100644
--- a/tests/packet-type-aware.at
+++ b/tests/packet-type-aware.at
@@ -892,7 +892,7 @@ ovs-appctl time/warp 1000
     ovs-appctl dpctl/dump-flows --names dummy at ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from the main thread:
-recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:n1,pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=de:af:be:ef:ba:be,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=,dst=,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br2)),n2)
+recirc_id(0),in_port(n0),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:n1,pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=de:af:be:ef:ba:be,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=,dst=,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br2)),n2)
@@ -1021,7 +1021,7 @@ AT_CHECK([
 ], [0], [flow-dump from the main thread:
 recirc_id(0),in_port(p0),packet_type(ns=0,id=0),eth(src=aa:bb:cc:00:00:02,dst=aa:bb:cc:00:00:01),eth_type(0x0800),ipv4(dst=,proto=47,frag=no), packets:3, bytes:378, used:0.0s, actions:tnl_pop(gre_sys)
 tunnel(src=,dst=,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=1,id=0x8847),eth_type(0x8847),mpls(label=999/0x0,tc=0/0,ttl=64/0x0,bos=1/1), packets:3, bytes:264, used:0.0s, actions:push_eth(src=00:00:00:00:00:00,dst=00:00:00:00:00:00),pop_mpls(eth_type=0x800),recirc(0x1)
-tunnel(src=,dst=,flags(-df-csum)),recirc_id(0x1),in_port(gre_sys),packet_type(ns=0,id=0),eth(dst=00:00:00:00:00:00),eth_type(0x0800),ipv4(ttl=64,frag=no), packets:3, bytes:294, used:0.0s, actions:set(ipv4(ttl=63)),int-br
+tunnel(src=,dst=,flags(-df-csum)),recirc_id(0x1),in_port(gre_sys),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(ttl=64,frag=no), packets:3, bytes:294, used:0.0s, actions:set(ipv4(ttl=63)),int-br
 ovs-appctl time/warp 1000
diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
index 636465397..ed72ff986 100644
--- a/tests/tunnel-push-pop.at
+++ b/tests/tunnel-push-pop.at
@@ -703,3 +703,61 @@ NXST_FLOW reply:
+AT_SETUP([tunnel_push_pop - flow translation on unrelated bridges])
+    [add-port br0 p0 dnl
+     -- set Interface p0 type=dummy ofport_request=1 dnl
+                         other-config:hwaddr=aa:55:aa:55:00:00])
+AT_CHECK([ovs-appctl vlog/set dpif_netdev:dbg])
+AT_CHECK([ovs-vsctl add-br int-br -- set bridge int-br datapath_type=dummy])
+AT_CHECK([ovs-vsctl add-port int-br t2 dnl
+          -- set Interface t2 type=geneve options:remote_ip= dnl
+                              options:key=123 ofport_request=2])
+dnl First setup dummy interface IP address, then add the route
+dnl so that tnl-port table can get valid IP address for the device.
+AT_CHECK([ovs-appctl netdev-dummy/ip4addr br0], [0], [OK
+AT_CHECK([ovs-appctl ovs/route/add br0], [0], [OK
+AT_CHECK([ovs-ofctl add-flow br0 action=normal])
+dnl This ARP reply from p0 has two effects:
+dnl 1. The ARP cache will learn that is at f8:bc:12:44:34:b6.
+dnl 2. The br0 mac learning will learn that f8:bc:12:44:34:b6 is on p0.
+  ovs-appctl netdev-dummy/receive p0 dnl
+      'recirc_id(0),in_port(2),dnl
+       eth(src=f8:bc:12:44:34:b6,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),dnl
+       arp(sip=,tip=,op=2,sha=f8:bc:12:44:34:b6,tha=00:00:00:00:00:00)'
+dnl Creating a separate bridge that is completely unrelated to a tunnel
+dnl configuration.  Ports in this bridge cannot be tunnel endpoints.
+AT_CHECK([ovs-vsctl add-br br-non-tunnel dnl
+          -- set bridge br-non-tunnel datapath_type=dummy fail-mode=secure dnl
+          -- add-port br-non-tunnel port0 -- set Interface port0 type=dummy dnl
+          -- add-port br-non-tunnel port1 -- set Interface port1 type=dummy])
+AT_CHECK([ovs-ofctl del-flows br-non-tunnel])
+AT_CHECK([ovs-ofctl add-flow br-non-tunnel in_port=port0,action=port1])
+AT_CHECK([ovs-ofctl add-flow br-non-tunnel in_port=port1,action=port0])
+dnl Checking that tunnel configuration doesn't impact flow translation
+dnl on this bridge (Megaflow should contain a bare minimum of fields
+dnl according to installed OF rules).
+AT_CHECK([ovs-appctl ofproto/trace br-non-tunnel in_port=port0], [0], [stdout])
+AT_CHECK([tail -2 stdout], [0], [dnl
+Megaflow: recirc_id=0,eth,in_port=2,dl_type=0x0000
+Datapath actions: 3
+AT_CHECK([ovs-appctl ofproto/trace br-non-tunnel in_port=port1], [0], [stdout])
+AT_CHECK([tail -2 stdout], [0], [dnl
+Megaflow: recirc_id=0,eth,in_port=1,dl_type=0x0000
+Datapath actions: 5

More information about the dev mailing list