[ovs-dev] [PATCH] Allow to create tunnel ports with the same config

Ihar Hrachyshka ihrachys at redhat.com
Wed Dec 2 01:48:56 UTC 2020


Hi Ben,

I was trying to make local_ip work for my needs and hit some
unexpected connectivity loss between tunnel endpoints when source IPs
enforced in port config. Perhaps there's something I am missing about
using local_ip with ovs userspace routing. Perhaps you'll be able to
validate my thinking below.

So, for OVN purposes, I would like to have two tunnel ports that
differ in local_ip only (same remote_ip). In OVN test suite, the
following configuration is set up:

(br-int<->br-phys) - <n1> - (br-phys<->br-int)

Each (...) block is a separate (emulated) host with its own OVS/OVN
stack, interconnected through a n1 global bridge that emulates fabric
(everything runs on the same physical host using three sets of OVS
software). Each br-int has a geneve port that has both remote_ip and
local_ip set (point to each other). When br-phys bridges are created,
the following commands are executed to set up userspace IP/routing
stack inside OVS:

ovs-appctl netdev-dummy/ip4addr $bridge $ip/$masklen >/dev/null || return 1
ovs-appctl ovs/route/add $ip/$masklen $bridge >/dev/null || return 1

where $ip on two sides belong to the same /$masklen subnet,
$bridge==br-phys. By doing this, OVS route entries are created for the
subnet.

Without local_ip set for tunnel ports, traffic gets tagged on br-int,
and, as per created route entries, sent over br-phys->n1->br-phys
chain. When local_ip are set on tunnel ports, then packets are lost
(they don't reach br-phys).

After some debugging, I realized this happens because
ovs_router_lookup returns false when looking for a proper routing
entry.

This happens because a code path added by
8e4e45887ec3eb5f5833fd7b415b63ff47fc9642 ("ofproto-dpif-xlate: makes
OVS native tunneling honor tunnel-specified source addresses") is
triggered, where an attempt is made to check that the source IP indeed
belongs to the host. For that, the entry added by the ovs/route/add
command above is extracted and then its corresponding route entry
'local' property is checked. Since all route entries added by the
command are set to local==false, the code assumes it's not a local IP
address and it denies to steer the packet forward.

There's probably a reason why ovs/route/add doesn't support setting
local==true. My understanding is that in real life, local routing
table entries are created *implicitly* when a new IP address is added
to an interface. If one were to try to emulate the same behavior in
OVS routing userspace, one would probably have to add corresponding
routing entry points when adding a IP address to the bridge via
netdev-dummy/ip[46]addr commands.

When I modify netdev_dummy_ip4addr to do the following after setting
the IP address for the bridge, the test in question now works:

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 71df29184..ee01381d2 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -39,6 +39,7 @@
 #include "pcap-file.h"
 #include "openvswitch/poll-loop.h"
 #include "openvswitch/shash.h"
+#include "ovs-router.h"
 #include "sset.h"
 #include "stream.h"
 #include "unaligned.h"
@@ -1941,6 +1942,12 @@ netdev_dummy_ip4addr(struct unixctl_conn *conn,
int argc OVS_UNUSED,
         error = ip_parse_masked(argv[2], &ip.s_addr, &mask.s_addr);
         if (!error) {
             netdev_dummy_set_in4(netdev, ip, mask);
+
+            // insert local route entry for the new address
+            struct in6_addr ip6;
+            in6_addr_set_mapped_ipv4(&ip6, ip.s_addr);
+            ovs_router_insert(0, &ip6, 128, true, argv[1], &in6addr_any);
+
             unixctl_command_reply(conn, "OK");
         } else {
             unixctl_command_reply_error(conn, error);
@@ -1970,6 +1977,10 @@ netdev_dummy_ip6addr(struct unixctl_conn *conn,
int argc OVS_UNUSED,

             mask = ipv6_create_mask(plen);
             netdev_dummy_set_in6(netdev, &ip6, &mask);
+
+            // insert local route entry for the new address
+            ovs_router_insert(0, &ip6, 128, true, argv[1], &in6addr_any);
+
             unixctl_command_reply(conn, "OK");
         } else {
             unixctl_command_reply_error(conn, error);

(I believe the code should also remove the local entries on IP removed.)

Is there a good reason why netdev-dummy/ip4addr doesn't set local
routing entries for the new IP? Or is it just an oversight in
userspace routing implementation?

Thanks for any suggestions,
Ihar

On Fri, Nov 20, 2020 at 2:32 PM Ben Pfaff <blp at ovn.org> wrote:
>
> On Fri, Nov 20, 2020 at 02:23:04PM -0500, Ihar Hrachyshka wrote:
> > On Fri, Nov 20, 2020 at 1:16 PM Ben Pfaff <blp at ovn.org> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 12:44:30PM -0500, Ihar Hrachyshka wrote:
> > > > On Thu, Nov 19, 2020 at 12:19 PM Ben Pfaff <blp at ovn.org> wrote:
> > > > >
> > > > > On Thu, Nov 19, 2020 at 08:31:44AM -0500, Ihar Hrachyshka wrote:
> > > > > > On Thu, Nov 19, 2020 at 12:51 AM Ben Pfaff <blp at ovn.org> wrote:
> > > > > > >
> > > > > > > On Wed, Nov 18, 2020 at 10:35:01PM -0500, Ihar Hrachyshka wrote:
> > > > > > > > It's a legal setup where tunnel ports with the same config are created
> > > > > > > > on different bridges served by Open vSwitch. Specifically, multiple
> > > > > > > > OVN controllers may emulate multiple chassis running on the same
> > > > > > > > physical host, in which case they may need to create separate tunnel
> > > > > > > > ports to connect to the same remote chassis on their respective
> > > > > > > > bridges.
> > > > > > > >
> > > > > > > > Signed-off-by: Ihar Hrachyshka <ihrachys at redhat.com>
> > > > > > >
> > > > > > > That makes sense for sending packets, but which one is supposed to
> > > > > > > receive a packet when one arrives for that tunnel?
> > > > > > >
> > > > > >
> > > > > > If we are talking in OVN context, each virtual ovn-controller chassis
> > > > > > on the same host has to have a different IP / port to distinguish
> > > > > > between chassis. I don't think it's a useful configuration to have two
> > > > > > two incoming tunnel ports with the same config for the same IP that
> > > > > > are not served by separate DST IP addresses.
> > > > >
> > > > > I think that this code rejects tunnel ports with exactly the same
> > > > > configuration, though.  If the two ports were configured with different
> > > > > local IP addresses, then this code would not flag a conflict.  Do I
> > > > > misunderstand?
> > > > >
> > > >
> > > > Local IP addresses are not part of tunnel interface options map. Only
> > > > remote_ip and dst_port are. Does it address your concern?
> > >
> > > I don't see where you're getting that.  Tunnel configuration allows both
> > > local and remote IPs to be specified.  The match structure has slots for
> > > both IP addresses, and both of them may be specified.
> > >
> > > If the remote (or local) IP addresses are different, then this warning
> > > won't come up.  And you said that the remote IP addresses are different:
> > > "I don't think it's a useful configuration to have two two incoming
> > > tunnel ports with the same config for the same IP that are not served by
> > > separate DST IP addresses."
> > >
> > > tnl_port_receive() looks up the tunnel that should receive a packet.  It
> > > calls into tnl_find().  This calls tnl_find_exact() for each of several
> > > lookup tables.  In the case where a warning would be given here, we know
> > > that tnl_find_exact() would find two matching elements in the map,
> > > because that's the function that we use to know to issue the warning.
> > > If we disable the warning and insert the second match anyway, there will
> > > be two elements in the map with exactly the same key, tnl_find_exact()
> > > will only ever return one of them, thus the other will never receive any
> > > packets.
> > >
> > > What am I missing?
> > >
> >
> > Oh I missed that local_ip *may* be configured for a tunnel endpoint,
> > it's just not configured by OVN controller *right now*. We can deal
> > with it in OVN controller instead of OVS then. Sorry for the noise,
> > this patch can be dropped.
>
> Oh, wonderful, we understand each other now, and there's nothing
> mysterious going on in the code that I didn't know about.  Cool!
>



More information about the dev mailing list