[ovs-dev] MPLS and VLAN QinQ patch

ravi kerur rkerur at gmail.com
Thu Jun 14 05:42:36 UTC 2012


On Wed, Jun 13, 2012 at 8:38 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Thu, Jun 14, 2012 at 9:37 AM, Ben Pfaff <blp at nicira.com> wrote:
>> On Wed, Jun 13, 2012 at 09:55:36PM +0900, Jesse Gross wrote:
>>> If we go with what you have now, I'm fairly confident that we will
>>>  regret it in the future.  The kernel code used to more directly
>>>  implement various features and prior to upstreaming we broke many
>>>  of them down.  I'm happy with the results of that but this time we
>>>  won't have the benefit of revising things later. This is
>>>  particularly bad because it deviates from our usual model of
>>>  userspace controlling everything and here userspace won't even know
>>>  what the flow looks like. The effects of this tend to metastasize
>>>  because when userspace doesn't know what the packet looks like it
>>>  can't implement things that it might overwise be able to do and
>>>  more and more ends up in the kernel. The other thing, which is
>>>  specific to MPLS, is that there is no inherent way to know the type
>>>  of the payload. Userspace is vastly more likely to have this
>>>  information in the event that we want to do something with the
>>>  inner packet.  In your patch the kernel is basically assuming that
>>>  the type is IP (OpenFlow doesn't give us any additional information
>>>  but it doesn't seem like a good idea in general).
>>>
>>> Using the approach that I had suggested largely avoids these
>>> problems and if we could go that path by adding say, the ability to
>>> match on another tag, then I think that would be
>>> reasonable. However, it still has a variation of the last problem,
>>> which is that although it may know what the inner type is, unless it
>>> can tell that to the kernel there is no way to get to the inner
>>> flow. This makes copy inwards to IP header difficult. Perhaps we
>>> could add a mechanism for userspace to tell the kernel the type of
>>> the inner packet, although it might not know either.
>>>
>>> Potentially a third approach is to add a more native way to
>>> recirculate through the pipeline (somewhat like a patch port but
>>> lighter weight and with a different intention). In addition to the
>>> TTL issues it could be useful in situations where you want to parse
>>> deeper into the packet than the number of MPLS or vlan tags
>>> supported.  Often times people might want to do L3 operations after
>>> popping off an MPLS tag and in that case you almost certainly end up
>>> doing something like this.
>>>
>>> Ben, do you have any thoughts?
>>
>> I think I understand options 1 and 3 but I wasn't able to quickly
>> figure out what option 2 is.
>
> Option 2 would be to do something similar to what we did with IP TTL
> where you have a set operation rather than copy in/out/decrement/etc.
> It gives userspace much more control because it always knows what the
> packet looks like.

<rk> this will increase data structure size for both struct flow and
sw_flow_key, but do-able. Also note that skb's don't have fields for
mpls offsets and it needs to be calculated when modifying, although
kernel code might get simplified a little since it just needs to
calculate offset and update. Haven't thought through it thoroughly but
I think that's the essence. Correct me if I am wrong.
>
>> Anyway, I mostly like option 3.  One wrinkle that occurs to me (maybe
>> it's obvious?) is to separate defining the inner protocol from the
>> recirculation.  In other words, you'd have an action,
>> e.g. OVS_ACTION_ATTR_SET_MPLS_INNER_PROTO, that tells the datapath
>> what to find beyond the MPLS label(s).  Following that, you could
>> usefully tell the kernel module to copy ttl/in out, or you could
>> usefully do a separate recirculate action.
>
> I was thinking that the first pass would pop off the tags and then the
> second pass would process the inner packet.  Since when popping off
> the last tag with MPLS you already set the EtherType, nothing more
> needs to be done specific to MPLS.  Probably you would model these
> passes as connected to tables in userspace, so a common use case might
> be the first table does MPLS lookup/pop and the second tables does IP
> processing.  For that you just need a way to recirculate and know what
> pass you're on.
>
> It's somewhat more complicated to have userspace setup recirculation
> passes implicitly where it doesn't map directly to OpenFlow but in
> theory you can model it the same way for TTL copy or additional levels
> of tags.


<rk> note that this is case for a pure mpls and egress only. One
disadvantage in this scenario is that customers with overlapping ip
cannot be supported, this will not be the case for mpls/vpn and
doesn't need to parse both mpls and ip.



More information about the dev mailing list