[ovs-discuss] Tunneling to the same machine

Wed Jun 12 14:30:13 UTC 2013

On Jun 10, 2013, at 1:41 PM, Jesse Gross wrote:

> On Fri, Jun 7, 2013 at 7:07 PM, Murphy McCauley
> <murphy.mccauley at gmail.com> wrote:
>> So I'm doing something that's probably a bit strange and (perhaps unsurprisingly) getting results that seem a bit strange.
>> 
>> What I've got is two bridges on a single machine, and I'd like to have GRE/VXLAN tunnels between them.  The reason for this is that while ultimately the controller code is meant to control bridges across multiple physical machines or VMs, I'd like to be able to test it in a single Mininet VM.  In this case, the ports attached to the bridges are all veth pairs which run into separate network namespaces, but I need the bridges to communicate between each other in the root namespace.  I realize that a more common way to link the bridges would be with patch interfaces, but that's not applicable to the configuration this will be in when running "for real", and moreover, it won't work with the NXM_NX_TUN_IPV4_DST approach I'm taking.
>> 
>> .. which is all to say that it'd be nice if I could get this working.
>> 
>> So what I'm doing is setting up two interfaces with IPs of, say, 172.16.0.1 and 172.16.0.2.  I'm then adding tunnels along the lines of:
>> ovs-vsctl add-port s1 tun0 -- set interface tun0 type=gre options:remote_ip=172.16.0.1 options:local_ip=172.16.0.2
>> ovs-vsctl add-port s2 tun1 -- set interface tun1 type=gre options:remote_ip=172.16.0.2 options:local_ip=172.16.0.1
>> 
>> For very loose definitions of "works", this works.  If I try to ping across the tunnel, I get *one* successful ping.  Snooping the traffic, I see a successful ARP, the first echo request and reply, and then… lots of requests with no replies.  If I kill ping and try to ping again immediately, I get nothing.  If I kill ping and wait a while or try pinging another address -- it works.
>> 
>> Investigating a bit further, I find there's something along the lines of a five or six second flow timeout at play here.  If I keep up activity, further packets never go through (neither ARP nor ICMP).  But after 5 (or 6?) seconds of silence, the whole thing is repeatable.  So a ping -i 6 will appear to work just fine.  It seems weird to me that the initial ARP and ping go through, but subsequent ones don't until the 5/6 seconds elapse, but there it is.
> 
> This is likely because the packets that work are being send up to
> userspace as part of a flow setup. When they are sent back down, they
> are essentially new packets, cleaned of previously metadata. Anything
> that matches an existing flow will be carried through the kernel
> directly.
> 
> My guess is that there is some information from the sending IP stack
> that is causing problems when it is received as a tunnel. You could
> look through the skb since nothing immediately comes to mind (we
> already clear out the obvious fields).

Thanks, Jesse, this is exactly the direction-pointing I needed, and it was exactly right.  The problem is pkt_type.

I believe what's happening is since the packets originally came in via promiscuous capture to the wrong MAC, the pkt_type is set to PACKET_OTHERHOST.  This is fine when they're just being shoved out another interface, but when actually trying to deliver them locally after all, I believe they're getting thrown away by ip_rcv() in net/ipv4/ip_input.c, which specifically checks for PACKET_OTHERHOST.

I'm out of my domain far enough that I'm not sure what the proper solution is, though.