[ovs-dev] [PATCH v4 3/3] datapath: add layer 3 flow/port support

Tue Aug 5 20:45:46 UTC 2014

Hi Jesse,

I'll be on vacation starting tomorrow, it looks like this won't be 
merged for at least another two weeks.

Replies on discussion points in-line below:

On 8/2/14, 4:11 AM, Jesse Gross wrote:
> On Fri, Aug 1, 2014 at 3:19 PM, Lori Jakab <lojakab at cisco.com> wrote:
>> On 8/1/14, 3:25 AM, Jesse Gross wrote:
>>> On Thu, Jul 31, 2014 at 5:45 AM, Lori Jakab <lojakab at cisco.com> wrote:
>>>> On 7/31/14, 10:09 AM, Jesse Gross wrote:
>>>>> On Fri, Jun 27, 2014 at 6:21 AM, Lorand Jakab <lojakab at cisco.com> wrote:
>>>>>> diff --git a/datapath/actions.c b/datapath/actions.c
>>>>>> index cb26ad5..20c66f5 100644
>>>>>> --- a/datapath/actions.c
>>>>>> +++ b/datapath/actions.c
>>>>>> +static int pop_eth(struct sk_buff *skb)
>>>>>> +{
>>>>>> +       skb_pull_rcsum(skb, skb_network_offset(skb));
>>>>>> +       skb_reset_mac_header(skb);
>>>>>> +       vlan_set_tci(skb, 0);
>>>>>> +
>>>>>> +       OVS_CB(skb)->is_layer3 = true;
>>>>> I think we should probably also set skb->mac_len to 0 here (and on
>>>>> push as well).
>>>>
>>>> OK.  I made the following changes:
>>>>
>>>> --- a/datapath/actions.c
>>>> +++ b/datapath/actions.c
>>>> @@ -245,6 +245,7 @@ static int pop_eth(struct sk_buff *skb)
>>>>    {
>>>>           skb_pull_rcsum(skb, skb_network_offset(skb));
>>>>           skb_reset_mac_header(skb);
>>>> +       skb->mac_len = 0;
>>>>           vlan_set_tci(skb, 0);
>>> I realized that setting the mac_len to zero isn't correct in the
>>> presence of MPLS. I think we can just use skb_reset_mac_len() here as
>>> well.
>>>
>>> This also means that we shouldn't pull up to the network offset since
>>> this will also pull off any MPLS labels, which I don't think that we
>>> want.
>>>
>>> Going further, what happens if packet comes in with multiple Ethernet
>>> headers on it (such as PBB)? This will pull off all of the headers but
>>> that doesn't seem right. If we were to only pull off one, what are the
>>> semantics with respect to VLANs?
>>
>> I have a feeling we may need to rename this action, since the my intention
>> actually was to remove everything up to the layer 3 header.  That includes
>> MPLS header(s) and any number of Ethernet headers.  The reason is, I need an
>> action that guarantees that the resulting packet is layer 3, and can be sent
>> through a layer 3 port.  I think you're right that the expectation from a
>> pop_eth() action would be what you describe above: if more then one Ethernet
>> header is present, remove the outer header only, keep MPLS headers.
>>
>> How about renaming this action to extract_l3_packet() or similar?
> I'm not excited about this. Thomas Morin was already talking about
> using this code to do MPLS over GRE, which would need exactly the
> semantics of popping an Ethernet header. I would like to find a way to
> do this now, otherwise it is likely that we will have to solve the
> same problem with another new action very soon.

Actually, that's what I'm proposing.  We do pop_eth() as you describe 
above, which can be used for MPLS over GRE among other things, and do 
the extract_l3_packet() action as well, for the cases when we need to 
send a packet out on an L3 port, regardless of how many L2 or L2.5 
headers we have.

>
>>>>>> +static void push_eth(struct sk_buff *skb, const struct
>>>>>> ovs_action_push_eth *ethh)
>>>>>> +{
>>>>>> +       skb_push(skb, ETH_HLEN);
>>>>>> +       skb_reset_mac_header(skb);
>>>>>> +
>>>>>> +       ether_addr_copy(eth_hdr(skb)->h_source,
>>>>>> ethh->addresses.eth_src);
>>>>>> +       ether_addr_copy(eth_hdr(skb)->h_dest, ethh->addresses.eth_dst);
>>>>>> +
>>>>>> +       eth_hdr(skb)->h_proto = ethh->eth_type;
>>>>>> +       skb->protocol = ethh->eth_type;
>>>>>> +
>>>>>> +       ovs_skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
>>>>>> +
>>>>>> +       OVS_CB(skb)->is_layer3 = false;
>>>>> What happens if there is something in skb->vlan_tci? I think the
>>>>> effect will be that it still on the outer Ethernet header, which
>>>>> doesn't seem correct.
>>>>
>>>> I missed that.  Addressed with:
>>>>
>>>> --- a/datapath/actions.c
>>>> +++ b/datapath/actions.c
>>>> @@ -258,6 +258,7 @@ static void push_eth(struct sk_buff *skb, const
>>>> struct
>>>> ovs_action_push_eth *ethh
>>>>           skb_push(skb, ETH_HLEN);
>>>>           skb_reset_mac_header(skb);
>>>>           skb_reset_mac_len(skb);
>>>> +       vlan_set_tci(skb, 0);
>>> I don't think it's right to throw away the vlan tag. My assumption is
>>> that it should be placed into the packet before we push on the
>>> Ethernet header (this actually should never happen since we disallow
>>> pushing multiple Ethernet headers but it still seems like the right
>>> thing to do).
>>
>> That's not what I would expect from a simple push_eth() action.  If someone
>> wanted to push a VLAN tag, that should be an explicit action.  So I would
>> just throw away whatever there is in skb->vlan_tci.
> The contents of skb->vlan_tci were already pushed explicitly (look at
> the implementation of push_vlan()). From a user's perspective, there
> is no difference - it is just an internal optimization to pull the tag
> out of the packet, so I can't see any justification for throwing away
> that tag.

OK.  I will remove that line.

>
>>>>>> diff --git a/datapath/datapath.h b/datapath/datapath.h
>>>>>> index fcd8e86..07ae0c8 100644
>>>>>> --- a/datapath/datapath.h
>>>>>> +++ b/datapath/datapath.h
>>>>>> @@ -107,6 +107,7 @@ struct ovs_skb_cb {
>>>>>>            struct sw_flow_key      *pkt_key;
>>>>>>            struct ovs_tunnel_info  *tun_info;
>>>>>>            struct vport    *input_vport;
>>>>>> +       bool is_layer3;
>>>>> We have run into a problem with ovs_skb_cb: it is currently full on
>>>>> kernels < 3.11 due to compatibility code.
>>>>
>>>> What can we do about it?
>>> It may actually be possible to remove 'flow' from ovs_skb_cb. It is
>>> really only used in ovs_execute_actions() and it should be easy enough
>>> to pass in the actions as an argument there.
>>
>> Should I propose a separate patch to do this?
> Sure.
>
>>>>>> @@ -600,8 +601,10 @@ static int ovs_key_from_nlattrs(struct
>>>>>> sw_flow_match
>>>>>> *match, u64 attrs,
>>>>>>                                    eth_key->eth_src, ETH_ALEN, is_mask);
>>>>>>                    SW_FLOW_KEY_MEMCPY(match, eth.dst,
>>>>>>                                    eth_key->eth_dst, ETH_ALEN, is_mask);
>>>>>> +               SW_FLOW_KEY_PUT(match, phy.noeth, false, is_mask);
>>>>>>                    attrs &= ~(1ULL << OVS_KEY_ATTR_ETHERNET);
>>>>>> -       }
>>>>>> +       } else if (!is_mask)
>>>>>> +               SW_FLOW_KEY_PUT(match, phy.noeth, true, is_mask);
>>>>> This will never set a mask for flows without an Ethernet header, which
>>>>> means that an IP flow will match things both with and without a
>>>>> header. Is that the intention? (It seems to me that it would be better
>>>>> to force a match on whatever form we are using - this could result in
>>>>> additional flow setups for mixed traffic but it seems unlikely that
>>>>> this wouldn't be forced already by something else, like the port
>>>>> number.)
>>>>
>>>> I removed the "if".
>>> I still have a question to some degree. If I have a packet that came
>>> in on a physical Ethernet interface and I wildcard the addresses,
>>
>> You mean the src/dst MAC addresses?
> Yes.
>
>>> should the resulting flow match a packet that came in on a pure L3
>>> port? It conceivable could but are there any issues that might come
>>> up? (in particular, are there any issues with multiple Ethernet
>>> headers, etc.?)
>>
>> I think if the flow key has wildcarded Ethernet addresses, the flow should
>> match only on L2 packets.
> In that case, I think we need to always unwildcard 'noeth' rather than
> making it conditional on whether you have a match on the MAC
> addresses.

OK.

>
>> BTW, for a packet with multiple Ethernet headers, is there already support
>> for matching on anything after the outer header?  Does that need
>> recirculation?
> There isn't currently any support on matching on multiple headers but
> it seems like a natural possibility if we introduce the capability to
> push/pop Ethernet headers. Once those are there, it seems like you
> could do it with recirculation.
>
>>> Somewhat related, what happens if someone specifies just an EtherType
>>> but no Ethernet addresses? Maybe we should just reject this?
>>
>> In a flow key?  I don't have a strong opinion on this, but I think I
>> wouldn't reject it.
> Is there any situation where it could be used (what meaning would it
> have)?

Not that I can think of right now.

> Normally we reject things that can never make sense.

If that's the policy, I'll change the implementation.