[ovs-dev] MPLS and VLAN QinQ patch

ravi kerur rkerur at gmail.com
Wed Jun 13 15:04:26 UTC 2012


Inline <rk>

On Wed, Jun 13, 2012 at 5:55 AM, Jesse Gross <jesse at nicira.com> wrote:
> On Jun 13, 2012, at 12:12 PM, ravi kerur <rkerur at gmail.com> wrote:
>
>> Thanks Pravin for the review comments. Inline <rk>
>>
>> On Tue, Jun 12, 2012 at 5:31 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>>> Hi Ravi,
>>>
>>> I see following issues with datapath mpls support:
>>>
>>> Actions:
>>> I do not see skb->csum updated when mpls action is performed on packet.
>>
>> <rk> csum calculation is usually done in the kernel stack and I leave
>> like that. Moreover, there could be multiple mpls actions for a single
>> header and hence I thought it would not be efficient to do it
>> everytime in ovs but rather rely on kernel during transmit. Please let
>> me know otherwise.
>>
>> Note:(I have verified via tcpdump that packets are transmitted with
>> correct checksum)
>
> skb->csum is used on the receive side by NICs that just generically compute a checksum over the entire packet rather than giving a yes/no on the validity. You must update it as you change the packet.

<rk> ok will fix it.
>
>
>>>
>>> TTL actions:
>>> As Jesse has mentioned before there could be one Set action which
>>> should be able to handle three different ttl related actions added.
>>> e.g. OVS_ACTION_ATTR_COPY_TTL_IN can generate set-ipv4 action from
>>> userspace. OVS_ACTION_ATTR_COPY_TTL_OUT and
>>> OVS_ACTION_ATTR_DEC_MPLS_TTL can be set mpls action. This works for
>>> single label case. But ttl_in and ttl_out also needs mpls inner packet
>>> parsing in flow_extract to get exact flow in kernel datapath.
>>>
>>
>> <rk> I had disagreed with Jesse for this approach and unfortunately
>> our discussion didn't continue since he was busy. My understanding is
>> that you use set actions when you are modifying multiple fields in a
>> single header and it makes perfect sense to me. In case of mpls we are
>> dealing with 2 headers ip and mpls
>
> It does save some time if you modify multiple header fields but that's not the main goal. The primary objective is to push as much complexity as possible to userspace and flow setup time. The kernel should just expose the simplest building blocks possible.
>
>> for e.g.
>>
>> copy_ttl_in could be mpls-outer-header to mpls-inner-header (in case
>> of stacked labels) and mpls-header to ip for single label.
>>
>> copy_ttl_out could be mpls-inner-header to mpls-outer-header(stacked
>> labels) and ip to mpls-header for signle label
>>
>> dec_mpls_ttl, single header operation.
>>
>> hence I think it should be left as is. Let me know if I have
>> misunderstood set operations.
>
> I spent a fair amount of time thinking about this today and concluded that neither option is very attractive as is.
>
> If we go with what you have now, I'm fairly confident that we will regret it in the future.  The kernel code used to more directly implement various features and prior to upstreaming we broke many of them down.  I'm happy with the results of that but this time we won't have the benefit of revising things later. This is particularly bad because it deviates from our usual model of userspace controlling everything and here userspace won't even know what the flow looks like. The effects of this tend to metastasize because when userspace doesn't know what the packet looks like it can't implement things that it might overwise be able to do and more and more ends up in the kernel. The other thing, which is specific to MPLS, is that there is no inherent way to know the type of the payload. Userspace is vastly more likely to have this information in the event that we want to do something with the inner packet.  In your patch the kernel is basically assuming that the type is IP (OpenFlow doesn't give us any additional information but it doesn't seem like a good idea in general).

<rk> No both kernel and userspace takes care of ipv4/ipv6 and non-ip
as well. Motivation for MPLS is to be unaware of underneath l2/l3
protocol and allow fast switching based on just top mpls label.

>
> Using the approach that I had suggested largely avoids these problems and if we could go that path by adding say, the ability to match on another tag, then I think that would be reasonable. However, it still has a variation of the last problem, which is that although it may know what the inner type is, unless it can tell that to the kernel there is no way to get to the inner flow. This makes copy inwards to IP header difficult. Perhaps we could add a mechanism for userspace to tell the kernel the type of the inner packet, although it might not know either.

<rk> Note that we are also increasing the size of flow-key and flow
here which is not required in the first case. Is it good to increase
memory footprint if majority of the flows handled by ovs are non-MPLS.
It can be justified if ovs is in the core and handles only MPLS
traffic.

>
> Potentially a third approach is to add a more native way to recirculate through the pipeline (somewhat like a patch port but lighter weight and with a different intention). In addition to the TTL issues it could be useful in situations where you want to parse deeper into the packet than the number of MPLS or vlan tags supported.  Often times people might want to do L3 operations after popping off an MPLS tag and in that case you almost certainly end up doing something like this.

<rk> this is the case when packet has single mpls label and action
configured is pop_mpls, in addition to mpls label lookup, l3 lookup
happens as well.

>
> Ben, do you have any thoughts?
>
>> I am not sure whether we should do take inner mpls header parsing
>> approach in kernel because it deviates from the spec. The only
>> deviation so far in the code v/s spec is handling of ttl and tos bits
>> for inner-most mpls header and it is done that way because of
>> practical deployed use-case.
>
> The kernel doesn't implement OpenFlow, so it doesn't matter what the spec says. The only thing that needs to be consistent is the interface that userspace exposes to the outside world. Everything else is an internal implementation detail.

<rk> I was referring to struct sw_flow_key and atleast I thought it is
motivated by openflow.



More information about the dev mailing list