[ovs-dev] [PATCH v2 3/3] datapath: add layer 3 flow/port support

Lori Jakab lojakab at cisco.com
Thu Mar 20 11:37:51 UTC 2014


Hi Jesse,

I hope you had a great vacation.

Now that you're back, please take a look at suggested fixes before I 
formally submit v3 of the L3 patch for review.

On 1/6/14, 7:55 PM, Jesse Gross wrote:
> On Tue, Dec 24, 2013 at 9:02 AM, Lorand Jakab<lojakab at cisco.com>  wrote:
>> Implementation of the pop_eth and push_eth actions in the kernel, also
>> layer 3 flow support.  Jesse Gross provided feedback on a previous
>> version of this RFC patch, all of those comments are resolved here.
>>
>> Signed-off-by: Lorand Jakab<lojakab at cisco.com>
> Minor thing on the commit message: the history of the review process
> doesn't usually belong there since nobody really cares once the patch
> has been committed. You can put things like that using three dashes
> after the commit message, which will cause git to automatically trim
> them out when the patch is applied.

Right, I forgot that, will rephrase.

>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index 30ea1d2..b90e715 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> +static int pop_eth(struct sk_buff *skb)
>> +{
>> +       skb_pull(skb, skb_network_offset(skb));
>> +       return 0;
>> +}
> I think there needs to be some additional validation in this area. If
> there's no Ethernet header then this will be a no-op but generally we
> reject such actions at flow installation time rather than ignoring at
> runtime. Furthermore, what does it mean for the validation of existing
> L2 actions like set Ethernet or push vlan, which assume that an
> Ethernet header is always present?
>
> Shouldn't we also update skb->csum?

How about this incremental?

diff --git a/datapath/actions.c b/datapath/actions.c
index 802abce..adddadc 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -145,6 +145,9 @@ static int set_eth_addr(struct sk_buff *skb,

  static int pop_eth(struct sk_buff *skb)
  {
+       if (skb->ip_summed == CHECKSUM_COMPLETE)
+               skb->csum = csum_sub(skb->csum, csum_partial(skb->data,
+                               skb_network_offset(skb), 0));
         skb_pull(skb, skb_network_offset(skb));
         return 0;
  }
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 7d193d8..8f0d5ab 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -1289,6 +1289,14 @@ static int validate_tp_port(const struct 
sw_flow_key *flow_key)
         return -EINVAL;
  }

+static int validate_eth_present(const struct sw_flow_key *flow_key)
+{
+       if (flow_key->phy.noeth)
+               return -EINVAL;
+
+       return 0;
+}
+
  void ovs_match_init(struct sw_flow_match *match,
                     struct sw_flow_key *key,
                     struct sw_flow_mask *mask)
@@ -1352,7 +1360,12 @@ static int validate_set(const struct nlattr *a,

         case OVS_KEY_ATTR_PRIORITY:
         case OVS_KEY_ATTR_SKB_MARK:
+               break;
+
         case OVS_KEY_ATTR_ETHERNET:
+               err = validate_eth_present(flow_key);
+               if (err)
+                       return err;
                 break;

         case OVS_KEY_ATTR_TUNNEL:
@@ -1507,6 +1520,9 @@ int ovs_nla_copy_actions(const struct nlattr *attr,


                 case OVS_ACTION_ATTR_POP_ETH:
+                       err = validate_eth_present(key);
+                       if (err)
+                               return err;
                         break;

                 case OVS_ACTION_ATTR_PUSH_ETH:
@@ -1516,6 +1532,9 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
                         break;

                 case OVS_ACTION_ATTR_PUSH_VLAN:
+                       err = validate_eth_present(key);
+                       if (err)
+                               return err;
                         vlan = nla_data(a);
                         if (vlan->vlan_tpid != htons(ETH_P_8021Q))
                                 return -EINVAL;

>> +static int push_eth(struct sk_buff *skb, const struct ovs_action_push_eth *ethh)
>> +{
>> +       int err;
>> +
>> +       skb_push(skb, ETH_HLEN);
>> +       skb_reset_mac_header(skb);
>> +
>> +       err = set_eth_addr(skb, &ethh->addresses);
>> +       if (unlikely(err))
>> +               return err;
> I would just do the memcpy directly here. set_eth_addr() has a bunch
> of extra things that make it not very efficient for what we are trying
> to do. memcpy would also remove the error path.

OK.

> Should we update skb->protocol here?

Right, will do that, thanks!

>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index eafcfd8..df2fb05 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>>   struct sw_flow_key {
>>          struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>> +       bool noeth;                     /* Packet has no Ethernet header */
>>          struct {
>>                  u32     priority;       /* Packet QoS priority. */
>>                  u32     skb_mark;       /* SKB mark. */
> I would put the noeth bool inside of struct phy so that we don't
> introduce additional holes in the flow structure. I think it also
> logically fits in there.

Will fix.

>> diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
>> index 9b26528..8f71c49 100644
>> --- a/datapath/flow_netlink.c
>> +++ b/datapath/flow_netlink.c
>> @@ -521,6 +522,8 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  bool *exact_5tuple
>>                  SW_FLOW_KEY_MEMCPY(match, eth.dst,
>>                                  eth_key->eth_dst, ETH_ALEN, is_mask);
>>                  attrs &= ~(1ULL << OVS_KEY_ATTR_ETHERNET);
>> +       } else {
>> +               SW_FLOW_KEY_PUT(match, noeth, true, is_mask);
>>          }
> This macro sets not only the value but also ranges of the key that are
> significant. Therefore, we should explicitly set the value in all
> cases, rather than assuming the default is false.
>
> I think there's a potential for ambiguity here when megaflows are
> used. If there is no Ethernet attribute does that mean that it must be
> an L3 packet or does it mean that it is wildcarded as 'don-t care'? As
> currently implemented, I believe that it would result in the former
> but that would change behavior.

How about the following instead:

                 SW_FLOW_KEY_MEMCPY(match, eth.dst,
                                 eth_key->eth_dst, ETH_ALEN, is_mask);
+               SW_FLOW_KEY_PUT(match, phy.noeth, false, is_mask);
                 attrs &= ~(1ULL << OVS_KEY_ATTR_ETHERNET);
-       } else {
+       } else if (!is_mask)
                 SW_FLOW_KEY_PUT(match, phy.noeth, true, is_mask);
-       }

>> @@ -951,7 +954,7 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
>>                       const struct sw_flow_key *output, struct sk_buff *skb)
>>   {
>>          struct ovs_key_ethernet *eth_key;
>> -       struct nlattr *nla, *encap;
>> +       struct nlattr *nla, *encap = NULL;
> I think we can remove the explicit setting of encap to NULL later on
> in the code now, it's a little cleaner.

OK.

>> +noethernet:
>>          if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
>>                  goto nla_put_failure;
> Does it still make sense to send the EtherType? It's not present in
> the packet and I believe that the information is contained in the
> attributes that we do send (i.e. IPv4 or v6 attributes).

We had a discussion about this in August last year:

>>>>>> One such decision is how to handle the flow key.  I set all fields in
>>>>>> key->eth to 0, except the type, because we still need to know what kind
>>>>>> of L3 packet do we have.  Since a lot of code is accessing
>>>>>> key->eth.type, this is easier than having this information in a
>>>>>> different place, although it would be more elegant to set this field to
>>>>>> 0 as well.
>>>>>
>>>>>
>>>>> Can we use skb->protocol for the cases where we currently access the
>>>>> EtherType? Are there cases that you are particularly concerned about?
>>>>
>>>>
>>>> The only case I'm concerned about is in the action validation code,
>>>> particularly the validate_set() and validate_tp_port() functions, since
>>>> they
>>>> don't have access to the skb, and need to know the layer 3 protocol of
>>>> the
>>>> flow.  In validate_set() we could relax the check for eth.type for
>>>> OVS_KEY_ATTR_IPV4 and OVS_KEY_ATTR_IPV6, but I'm not sure what to do
>>>> about
>>>> validate_tp_port().
>>>>
>>>> In general, I think it would be a good idea for the flow key to have a
>>>> field
>>>> specifying the layer 3 protocol, when an ethernet header is not present.
>>>
>>> That makes sense to me.
>>
>> You mean that we keep eth.type as the L3 protocol field, or define a new
>> one, separate from the eth union?
> I think it's fine to keep using eth.type as the protocol field. I
> think we can probably some holes to move the is_layer3 flag into the
> non-tunnel portion of the flow though.

Should we revisit this discussion?

>> @@ -1543,6 +1552,13 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
>>                          break;
>>
>>
>> +               case OVS_ACTION_ATTR_POP_ETH:
>> +                       break;
>> +
>> +               case OVS_ACTION_ATTR_PUSH_ETH:
>> +                       /* TODO May need to validate eth_type? */
> I think it's not strictly required for correctness of operation in OVS
> (meaning it won't cause us to crash and anything else that looks at
> the packet should do validation first like any packet coming off the
> wire). Simon did some work in this area with MPLS (where it is
> required for correctness) and it's pretty complicated so I would be
> inclined to not do anything here unless we find a compelling reason.

I'll remove the comment.

>> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
>> index 8737b63..2737cd2 100644
>> --- a/datapath/vport-gre.c
>> +++ b/datapath/vport-gre.c
>> @@ -112,6 +112,7 @@ static int gre_rcv(struct sk_buff *skb,
>>          key = key_to_tunnel_id(tpi->key, tpi->seq);
>>          ovs_flow_tun_key_init(&tun_key, ip_hdr(skb), key, filter_tnl_flags(tpi->flags));
>>
>> +       OVS_CB(skb)->is_layer3 = false;
>>          ovs_vport_receive(vport, skb, &tun_key);
> I would make is_layer3 an argument to ovs_vport_receive(). That way
> it's impossible to forget to initialize it (I think it's missing from
> vport-internal_dev.c).

Yeap, it was missing.  I'll follow your recommendation.

>> diff --git a/datapath/vport-lisp.c b/datapath/vport-lisp.c
>> index c2698ae..b56eec7 100644
>> --- a/datapath/vport-lisp.c
>> +++ b/datapath/vport-lisp.c
> On the transmit path, we still assume that there might be an L2 header
> and pull it off. Can we enforce a restriction that it must be an L3
> packet at this point?

Good point, we should definitely do that.  I was thinking that we can 
use OVS_CB(skb)->pkt_key.phy.noeth in lisp_send() (and lisp_rcv()) for 
detecting packet type, is that ok?

> Conversely, what about L2 ports that get packets
> without an L2 header?

That should not normally happen, as I replied to Ben in the user space 
part.  However, in case it does, since it's unexpected behavior, should 
we drop?



More information about the dev mailing list