[ovs-dev] Flow-based tunneling design ([PATCH] support NXAST_SET_TUNNEL_DST action)

Mon Dec 17 14:52:29 UTC 2012

On Dec 13, 2012, at 21:26 , ext Jesse Gross wrote:

> On Wed, Dec 12, 2012 at 11:08 PM, Jarno Rajahalme
> <jarno.rajahalme at nsn.com> wrote:
>> On Dec 13, 2012, at 8:22 , ext Rich Lane wrote:
>> On Mon, Dec 10, 2012 at 6:42 PM, Jesse Gross <jesse at nicira.com> wrote:
>>> On Mon, Dec 10, 2012 at 5:16 PM, Rich Lane <rich.lane at bigswitch.com>
>>> wrote:
>>> Another issue is that this doesn't help on the receive side,
>>> particularly because packets are matched to ports with their
>>> associated IP addresses.  In general, I'm not sure that current port
>>> configuration makes sense in this context.
>> 
>> 
>> I thought the "null_port" change in the kernel datapath already allowed
>> packets from any remote IP to match, if userspace didn't supply
>> OVS_TUNNEL_ATTR_DST_IPV4 when creating the tunnel port. Or do you mean that
>> we should remove OVS_TUNNEL_ATTR_* entirely now? I'd think you'd want to do
>> that last, after all the flow-based tunneling patches are merged.
> 
> Yes, NULL ports definitely need to be created in some form here
> although userspace doesn't currently do this.  The work that is
> currently ongoing will essentially replicate the current kernel
> behavior in userspace on top of the new flow infrastructure.  However,
> this means that without some further work, you'll have the same
> problems with packets being dropped before you see them because they
> don't match.
> 
> I think the first step is to figure out what the external view of this
> should be from the OpenFlow/configuration database level.  It might be
> a version of NULL ports but then you have similar issues to the
> kernel, where you don't know which datapath to map a packet on to.
> 
>> The OVS_TUNNEL_ATTR_SRC_IPV4 configuration or equivalent would still need to
>> exist for flow-based tunneling, right? If there are multiple datapaths and
>> each one has a tunnel port we need some way to choose between them.
>> 
>> 
>> 
>> Aren't we now with a single datapath? Does that make a difference here?
> 
> Yes, at the kernel level there is no mapping from tunnel IP
> information to either port or datapath - it just extracts the
> information and stores it in the flow.  This is why the single
> datapath changes were necessary.
> 
> Almost everything in include/openvswitch/tunnel.h is going away since
> it all relates to kernel port level configuration.  The only exception
> is OVS_TUNNEL_ATTR_DST_PORT.
> 

So the planned final outcome from the kernel module point of view is
that only one null port, void of any policy, is configured for each
tunneling mechanism? If this is the case, then it might be helpful to
specify well known ODP port numbers for those (right now there is only
one: OVSP_NONE). That would allow the kernel module code to
recognize those autonomously, so that userspace would not need to
configure them at all (apart from the VXLAN destination port number, at
least until it is standardized).

If I have understood it right, an exact match (ODP) flow that is to be sent
to a tunnel will need to have OVS_KEY_ATTR_IPV4_TUNNEL
within a OVS_ACTION_ATTR_SET, and an OVS_ACTION_ATTR_OUTPUT
with the (null) port number for the desired tunneling mechanism. Since
there will be only one null port number for each tunneling mechanism, the
OUTPUT action will in essence just specify which tunneling protocol is to
be used (and implicitly asking the kernel to worry about the actual output).
So, the tunnel output related information is split between two different
OVS/ODP actions.  This could be reconciled a bit by including a tunneling
mechanism id (like TNL_T_PROTO_*) within the IPV4_TUNNEL
metadata, so as to keep the tunneling related information in a single action.
This would also open the possibility to specify a real port in the output
action, or a pseudo port, like "OVSP_KERNEL" to indicate that the
encapsulated packet needs to be routed (and possibly fragmented) by the
kernel.

What I have described above would be a bit clearer if we renamed
OVS_KEY_ATTR_IPV4_TUNNEL as OVS_ACTION_ATTR_PUSH_TUNNEL
instead. This would make it explicit that for a flow to have TUNNEL metadata
on output, a specific action is needed. Now it is not so explicit, except here:

datapath/actions.c: ovs_execute_actions():

	OVS_CB(skb)->tun_key = NULL;
	error = do_execute_actions(dp, skb, acts->actions,
					 acts->actions_len, &tun_key, false);

I.e. by default tun_key is NULL when action execution is started, and it
needs to be explicitly set to "exist". All other "settable" KEY attributes have
some values regardless (from packet headers, or otherwise carried from
input to output (priority, skb_mark)). So, even when
OVS_KEY_ATTR_IPV4_TUNNEL is meaningful for matching, as key
material it is essentially read only (like IN_PORT). The same way as the
IN_PORT key value does not get carried over as a (default) OUTPUT port,
but a specific OUTPUT action is needed, the incoming IPV4_TUNNEL key
value does not get carried over to the output side, but an explicit action is
needed. And using SET action to set the IPV4_TUNNEL feels like a
confusing choice from this viewpoint: We do not use
SET(IN_PORT) to select the OUTPUT port either!

To summarize, in this alternative future things would look like this:

- OVS_KEY_ATTR_IPV4_TUNNEL is read only (like IN_PORT)
- A new action OVS_ACTION_ATTR_PUSH_TUNNEL is used to
  specify the outer header tunnel fields for output
 - Both include the identification of one of the supported tunneling
   mechanisms
- OUTPUT action is used to specify the outgoing port, which can be
  OVSP_KERNEL pseudo port to indicate that the packet should be
  given to (Linux) kernel for further processing (e.g. forwarding)
- OVSP_KERNEL would also be used as IN_PORT for kernel
  provided incoming tunneled packets

To make this somewhat symmetric we could also specify
OVS_ACTION_ATTR_POP_TUNNEL that could be used to decapsulate
packets; further actions would then operate on the (formerly inner)
packet, like output the packet on any port, or resubmit the inner packet
to the exact match flow table for further exact matching. 

An sketchy example of use (with some yet to be specified OF level actions):

OF level: 
flow1: in_port=1 udp tp_dst=8472 actions=pop_tunnel(vxlan),resubmit(TUNNEL)
flow2: in_port=TUNNEL in_phys_port=1 icmp actions=output:CONTROLLER

ODP level (exact match):
key(tun_key(0),...,in_port(1), ..., ip_proto(udp),...,tp_dst(8472), ...) actions=pop_tunnel(vxlan),output(OVSP_TABLE)
...
key(tun_key(ipv4_src(..),ipv4_dst(..),tun_type(vxlan),tun_id(..)),...in_port(1), ..., ip_proto(icmp), ...) actions=USERSPACE(...)

This would allow selective tunnel switching (forwarding of tunneled
traffic without decapsulation, keeping the exact match table from
exploding due to inner header flows) and termination. Currently this
is impossible, since packets are first seen in the kernel flow table only
after they have been decapsulated by one of the tunnel vports.
It seems to me that this would be the case also after the ongoing/planned
flow-based tunneling mechanisms are done.

The behavior shown in the example above can be produced with a special
vport that does the decapsulation and then provides the inner packet as
input to the kernel flow table. I have tried it and it works. Indeed, it might
be an easier alternative implementation on the kernel datapath level. But
the most straightforward implementation would need a separate "pop_tunnel"
port for each tunneling protocol. This semantical overloading of port numbers
is not pretty.

Userspace could still provide the notion of logical ports and standard OF
behavior on top of all of this. IMO this does not change any of that.

  Jarno