[ovs-dev] Flow-based tunneling design ([PATCH] support NXAST_SET_TUNNEL_DST action)

Mon Dec 17 22:10:09 UTC 2012

On Mon, Dec 17, 2012 at 6:52 AM, Jarno Rajahalme
<jarno.rajahalme at nsn.com> wrote:
> On Dec 13, 2012, at 21:26 , ext Jesse Gross wrote:
>
>> On Wed, Dec 12, 2012 at 11:08 PM, Jarno Rajahalme
>> <jarno.rajahalme at nsn.com> wrote:
>>> On Dec 13, 2012, at 8:22 , ext Rich Lane wrote:
>>> On Mon, Dec 10, 2012 at 6:42 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>> On Mon, Dec 10, 2012 at 5:16 PM, Rich Lane <rich.lane at bigswitch.com>
>>>> wrote:
>>>> Another issue is that this doesn't help on the receive side,
>>>> particularly because packets are matched to ports with their
>>>> associated IP addresses.  In general, I'm not sure that current port
>>>> configuration makes sense in this context.
>>>
>>>
>>> I thought the "null_port" change in the kernel datapath already allowed
>>> packets from any remote IP to match, if userspace didn't supply
>>> OVS_TUNNEL_ATTR_DST_IPV4 when creating the tunnel port. Or do you mean that
>>> we should remove OVS_TUNNEL_ATTR_* entirely now? I'd think you'd want to do
>>> that last, after all the flow-based tunneling patches are merged.
>>
>> Yes, NULL ports definitely need to be created in some form here
>> although userspace doesn't currently do this.  The work that is
>> currently ongoing will essentially replicate the current kernel
>> behavior in userspace on top of the new flow infrastructure.  However,
>> this means that without some further work, you'll have the same
>> problems with packets being dropped before you see them because they
>> don't match.
>>
>> I think the first step is to figure out what the external view of this
>> should be from the OpenFlow/configuration database level.  It might be
>> a version of NULL ports but then you have similar issues to the
>> kernel, where you don't know which datapath to map a packet on to.
>>
>>> The OVS_TUNNEL_ATTR_SRC_IPV4 configuration or equivalent would still need to
>>> exist for flow-based tunneling, right? If there are multiple datapaths and
>>> each one has a tunnel port we need some way to choose between them.
>>>
>>>
>>>
>>> Aren't we now with a single datapath? Does that make a difference here?
>>
>> Yes, at the kernel level there is no mapping from tunnel IP
>> information to either port or datapath - it just extracts the
>> information and stores it in the flow.  This is why the single
>> datapath changes were necessary.
>>
>> Almost everything in include/openvswitch/tunnel.h is going away since
>> it all relates to kernel port level configuration.  The only exception
>> is OVS_TUNNEL_ATTR_DST_PORT.
>>
>
> So the planned final outcome from the kernel module point of view is
> that only one null port, void of any policy, is configured for each
> tunneling mechanism? If this is the case, then it might be helpful to
> specify well known ODP port numbers for those (right now there is only
> one: OVSP_NONE). That would allow the kernel module code to
> recognize those autonomously, so that userspace would not need to
> configure them at all (apart from the VXLAN destination port number, at
> least until it is standardized).

I'm not sure that I see the benefit in special casing tunnel ports in
this manner.  I think the ability to configure VXLAN destination port
will stick around even after standardization and it's possible that
there will be other configuration parameters that we'll want in the
future so we'd have to reintroduce a new configuration mechanism.

> If I have understood it right, an exact match (ODP) flow that is to be sent
> to a tunnel will need to have OVS_KEY_ATTR_IPV4_TUNNEL
> within a OVS_ACTION_ATTR_SET, and an OVS_ACTION_ATTR_OUTPUT
> with the (null) port number for the desired tunneling mechanism. Since
> there will be only one null port number for each tunneling mechanism, the
> OUTPUT action will in essence just specify which tunneling protocol is to
> be used (and implicitly asking the kernel to worry about the actual output).
> So, the tunnel output related information is split between two different
> OVS/ODP actions.  This could be reconciled a bit by including a tunneling
> mechanism id (like TNL_T_PROTO_*) within the IPV4_TUNNEL
> metadata, so as to keep the tunneling related information in a single action.
> This would also open the possibility to specify a real port in the output
> action, or a pseudo port, like "OVSP_KERNEL" to indicate that the
> encapsulated packet needs to be routed (and possibly fragmented) by the
> kernel.

Using OVSP_KERNEL doesn't seem significantly semantically different
from what already exists and has the same configuration problems that
I mentioned above.  I also don't think this design can be easily
extended to support direct output since there are still consideration
like ARP that aren't handled.

> What I have described above would be a bit clearer if we renamed
> OVS_KEY_ATTR_IPV4_TUNNEL as OVS_ACTION_ATTR_PUSH_TUNNEL
> instead. This would make it explicit that for a flow to have TUNNEL metadata
> on output, a specific action is needed. Now it is not so explicit, except here:
>
> datapath/actions.c: ovs_execute_actions():
>
>         OVS_CB(skb)->tun_key = NULL;
>         error = do_execute_actions(dp, skb, acts->actions,
>                                          acts->actions_len, &tun_key, false);
>
> I.e. by default tun_key is NULL when action execution is started, and it
> needs to be explicitly set to "exist". All other "settable" KEY attributes have
> some values regardless (from packet headers, or otherwise carried from
> input to output (priority, skb_mark)). So, even when
> OVS_KEY_ATTR_IPV4_TUNNEL is meaningful for matching, as key
> material it is essentially read only (like IN_PORT). The same way as the
> IN_PORT key value does not get carried over as a (default) OUTPUT port,
> but a specific OUTPUT action is needed, the incoming IPV4_TUNNEL key
> value does not get carried over to the output side, but an explicit action is
> needed. And using SET action to set the IPV4_TUNNEL feels like a
> confusing choice from this viewpoint: We do not use
> SET(IN_PORT) to select the OUTPUT port either!

PUSH generally implies the headers are stackable, which they aren't in
this case, so I'm not sure that it's a great choice either.  Changing
this name will require extra code and care for compatibility so I
think that it would have to be a pretty compelling case.

> To summarize, in this alternative future things would look like this:
>
> - OVS_KEY_ATTR_IPV4_TUNNEL is read only (like IN_PORT)
> - A new action OVS_ACTION_ATTR_PUSH_TUNNEL is used to
>   specify the outer header tunnel fields for output
>  - Both include the identification of one of the supported tunneling
>    mechanisms
> - OUTPUT action is used to specify the outgoing port, which can be
>   OVSP_KERNEL pseudo port to indicate that the packet should be
>   given to (Linux) kernel for further processing (e.g. forwarding)
> - OVSP_KERNEL would also be used as IN_PORT for kernel
>   provided incoming tunneled packets
>
> To make this somewhat symmetric we could also specify
> OVS_ACTION_ATTR_POP_TUNNEL that could be used to decapsulate
> packets; further actions would then operate on the (formerly inner)
> packet, like output the packet on any port, or resubmit the inner packet
> to the exact match flow table for further exact matching.
>
> An sketchy example of use (with some yet to be specified OF level actions):
>
> OF level:
> flow1: in_port=1 udp tp_dst=8472 actions=pop_tunnel(vxlan),resubmit(TUNNEL)
> flow2: in_port=TUNNEL in_phys_port=1 icmp actions=output:CONTROLLER
>
> ODP level (exact match):
> key(tun_key(0),...,in_port(1), ..., ip_proto(udp),...,tp_dst(8472), ...) actions=pop_tunnel(vxlan),output(OVSP_TABLE)
> ...
> key(tun_key(ipv4_src(..),ipv4_dst(..),tun_type(vxlan),tun_id(..)),...in_port(1), ..., ip_proto(icmp), ...) actions=USERSPACE(...)
>
> This would allow selective tunnel switching (forwarding of tunneled
> traffic without decapsulation, keeping the exact match table from
> exploding due to inner header flows) and termination. Currently this
> is impossible, since packets are first seen in the kernel flow table only
> after they have been decapsulated by one of the tunnel vports.
> It seems to me that this would be the case also after the ongoing/planned
> flow-based tunneling mechanisms are done.

It's not clear to me how the kernel IP stack fits into this picture.
Essentially the choice of whether to do decapsulation comes down to
whether or not you hand the packet to the IP stack.  Currently, you
can have two bridges, one at the physical layer connected to the
Ethernet adapter and one at the tunnel level.  If you replace TUNNEL
by LOCAL in your example above then you'll achieve the same result.

> The behavior shown in the example above can be produced with a special
> vport that does the decapsulation and then provides the inner packet as
> input to the kernel flow table. I have tried it and it works. Indeed, it might
> be an easier alternative implementation on the kernel datapath level. But
> the most straightforward implementation would need a separate "pop_tunnel"
> port for each tunneling protocol. This semantical overloading of port numbers
> is not pretty.

The current design is that there are special vports do decapsulation
and provide the correct header information.