[ovs-dev] [PATCH v2 3/3] datapath: add layer 3 flow/port support

Lori Jakab lojakab at cisco.com
Mon May 12 20:19:13 UTC 2014


On 5/10/14, 12:48 AM, Jesse Gross wrote:
> On Fri, Apr 25, 2014 at 2:55 AM, Lori Jakab <lojakab at cisco.com> wrote:
>> On 4/25/14, 5:19 AM, Jesse Gross wrote:
>>>>>>>>>> +noethernet:
>>>>>>>>>>              if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE,
>>>>>>>>>> output->eth.type))
>>>>>>>>>>                      goto nla_put_failure;
>>>>>>>>> Does it still make sense to send the EtherType? It's not present in
>>>>>>>>> the packet and I believe that the information is contained in the
>>>>>>>>> attributes that we do send (i.e. IPv4 or v6 attributes).
>>>>>>>>
>>>>>>>> We had a discussion about this in August last year:
>>>>>>>>
>>>>>>>>>>>>>> One such decision is how to handle the flow key.  I set all
>>>>>>>>>>>>>> fields
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> key->eth to 0, except the type, because we still need to know
>>>>>>>>>>>>>> what
>>>>>>>>>>>>>> kind
>>>>>>>>>>>>>> of L3 packet do we have.  Since a lot of code is accessing
>>>>>>>>>>>>>> key->eth.type, this is easier than having this information in a
>>>>>>>>>>>>>> different place, although it would be more elegant to set this
>>>>>>>>>>>>>> field
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> 0 as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can we use skb->protocol for the cases where we currently access
>>>>>>>>>>>>> the
>>>>>>>>>>>>> EtherType? Are there cases that you are particularly concerned
>>>>>>>>>>>>> about?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The only case I'm concerned about is in the action validation
>>>>>>>>>>>> code,
>>>>>>>>>>>> particularly the validate_set() and validate_tp_port() functions,
>>>>>>>>>>>> since
>>>>>>>>>>>> they
>>>>>>>>>>>> don't have access to the skb, and need to know the layer 3
>>>>>>>>>>>> protocol
>>>>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>> flow.  In validate_set() we could relax the check for eth.type
>>>>>>>>>>>> for
>>>>>>>>>>>> OVS_KEY_ATTR_IPV4 and OVS_KEY_ATTR_IPV6, but I'm not sure what to
>>>>>>>>>>>> do
>>>>>>>>>>>> about
>>>>>>>>>>>> validate_tp_port().
>>>>>>>>>>>>
>>>>>>>>>>>> In general, I think it would be a good idea for the flow key to
>>>>>>>>>>>> have
>>>>>>>>>>>> a
>>>>>>>>>>>> field
>>>>>>>>>>>> specifying the layer 3 protocol, when an ethernet header is not
>>>>>>>>>>>> present.
>>>>>>>>>>>
>>>>>>>>>>> That makes sense to me.
>>>>>>>>>>
>>>>>>>>>> You mean that we keep eth.type as the L3 protocol field, or define
>>>>>>>>>> a
>>>>>>>>>> new
>>>>>>>>>> one, separate from the eth union?
>>>>>>>>> I think it's fine to keep using eth.type as the protocol field. I
>>>>>>>>> think we can probably some holes to move the is_layer3 flag into the
>>>>>>>>> non-tunnel portion of the flow though.
>>>>>>>>
>>>>>>>> Should we revisit this discussion?
>>>>>>> I was just referring to the Netlink encoding here. Can we populate the
>>>>>>> flow key in the kernel when we translate the flow?
>>>>>>
>>>>>> Not sure I understand the question.
>>>>>>
>>>>>> Going through the code, I see that omitting OVS_KEY_ATTR_ETHERTYPE
>>>>>> currently
>>>>>> means an 802.2 packet, if the mask is set to 0xffff.  Are you
>>>>>> suggesting
>>>>>> to
>>>>>> omit OVS_KEY_ATTR_ETHERTYPE for layer 3 packets both in the flow key
>>>>>> and
>>>>>> the
>>>>>> mask?
>>>>> All I was trying to say is that is that the Netlink and struct
>>>>> sw_flow_key representations don't necessarily have to be exactly the
>>>>> same as long as we can translate back and forth. I'm not sure that the
>>>>> previous discussion applies - this is more about Netlink and that
>>>>> seemed to apply to struct sw_flow_key.
>>>>
>>>> Ok, understand now.  But validation would become even more complicated if
>>>> we
>>>> didn't send the OVS_KEY_ATTR_ETHERTYPE over Netlink for layer 3 packets.
>>> I think we would basically have to scan for L3 attributes and use that
>>> to fill in the EtherType, right? Is there anything else?
>>>
>>> It would be nice to fully break the link to Ethernet and it seems like
>>> a half measure if we keep EtherType in. There have been various
>>> requests for non-Ethernet protocols in the past so it seems like it
>>> could get messy if we assume some elements of Ethernet.
>>
>> I had a discussion about this with Thomas Morin, who proposed the
>> IP-over-GRE patches and MPLS-over-GRE patches based on this work, and we
>> think it would still be useful to keep the EtherType, even in the Netlink
>> attributes, to be able to support generic packet types in the future.
>>
>> Ben was involved in EXT-112 at the ONF, which has the objective to "Support
>> non-Ethernet packets throughout the pipeline".  The current version of the
>> proposal, recently moved from incubation to prototyping, defines a new
>> "packet type" match field, which is used to determine the type of the first
>> header.  There are 5 namespaces that can be used to define the packet type,
>> one of which is EtherType.  With this match type, the canonical
>> representation for an IPv4 packet would be (namespace=1,ns_type=0x800),
>> namespace 1 being the EtherType.
> Actually, that's pretty much what I'm trying to avoid :) I don't
> really like that we have to introduce a new namespace for EtherType
> and then yet another new namespace for the type of the type.
>
>> Because of the potential future implementation of EXT-112 in OVS I think
>> keeping the EtherType around would be beneficial.
> I've been thinking about this (and also talked to Ben) and I'm still
> not convinced that keeping the EtherType explicitly in the netlink
> protocol really adds much of value or would really impede the
> implementation of EXT-112. I'm really specifically talking about the
> netlink protocol - I think that's fine if the internal implementation
> of userspace or kernel wants to keep it around for efficiency or
> simplicity but I don't think that it should leak out beyond the
> implementation.
>
> I think for the two main reasons for including an explicit type (in
> either OpenFlow or netlink) are to be able to enforce prerequisites
> and being able to determine whether a field should be wildcarded or
> doesn't apply. It's not actually needed to describe the protocol
> because in both cases TLVs are used that provide actual typing.
>
> One significant difference between OpenFlow and the OVS netlink
> protocol is that netlink is designed to always be reactive whereas
> OpenFlow is not necessarily. This means that when communicating with
> the kernel, the packet is always fully described as it came in through
> the key and the wildcarded fields are separate. This completely
> eliminates the issue with wildcard ambiguity, which I think is the
> more significant one. It does slightly reduce the ability to detect
> errors by loosening prerequisite checking but I think that is also
> less significant in the reactive model.
>
> Given that, I still think that we can consider a packet to be "rooted"
> in the type of the TLV for the lowest layer that appears. If there is
> more than one, we can just reject that as invalid.

Thanks Jesse for sharing your reasoning in detail.  You're right, not 
sending the Ethertype over Netlink doesn't preclude keeping it in the 
user space.  I'll change the implementation and see if I get 
prerequisite checking right.

-Lori

>
>>>>> I'm starting to think that the path we went down with the megaflow
>>>>> Netlink encoding is not particularly sustainable because it means that
>>>>> while the keys can do something fairly reasonable, the masks must
>>>>> always list all the protocols that it is not, such as Ethernet and
>>>>> EtherType in this case.
>>>>>
>>>>> I wonder if it actually makes more sense to switch over to something
>>>>> similar to what Jarno is planning for the actions - a key and mask
>>>>> paired together and then we can also fix some left over oddities from
>>>>> the exact match days.
>>>>
>>>> I wonder if we could refine this in later patches?
>>> My hope is that we can at least get some other opinions on this first.
>>
>> Sure, looking forward to that!
> OK, I'm retracting this idea - based on the above analysis, I think
> the current way we are doing things is fine and maybe actually
> preferable for this use case.



More information about the dev mailing list