[ovs-dev] [PATCH v2 3/3] datapath: add layer 3 flow/port support
Lori Jakab
lojakab at cisco.com
Thu Mar 20 11:37:51 UTC 2014
Hi Jesse,
I hope you had a great vacation.
Now that you're back, please take a look at suggested fixes before I
formally submit v3 of the L3 patch for review.
On 1/6/14, 7:55 PM, Jesse Gross wrote:
> On Tue, Dec 24, 2013 at 9:02 AM, Lorand Jakab<lojakab at cisco.com> wrote:
>> Implementation of the pop_eth and push_eth actions in the kernel, also
>> layer 3 flow support. Jesse Gross provided feedback on a previous
>> version of this RFC patch, all of those comments are resolved here.
>>
>> Signed-off-by: Lorand Jakab<lojakab at cisco.com>
> Minor thing on the commit message: the history of the review process
> doesn't usually belong there since nobody really cares once the patch
> has been committed. You can put things like that using three dashes
> after the commit message, which will cause git to automatically trim
> them out when the patch is applied.
Right, I forgot that, will rephrase.
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index 30ea1d2..b90e715 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> +static int pop_eth(struct sk_buff *skb)
>> +{
>> + skb_pull(skb, skb_network_offset(skb));
>> + return 0;
>> +}
> I think there needs to be some additional validation in this area. If
> there's no Ethernet header then this will be a no-op but generally we
> reject such actions at flow installation time rather than ignoring at
> runtime. Furthermore, what does it mean for the validation of existing
> L2 actions like set Ethernet or push vlan, which assume that an
> Ethernet header is always present?
>
> Shouldn't we also update skb->csum?
How about this incremental?
diff --git a/datapath/actions.c b/datapath/actions.c
index 802abce..adddadc 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -145,6 +145,9 @@ static int set_eth_addr(struct sk_buff *skb,
static int pop_eth(struct sk_buff *skb)
{
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->csum = csum_sub(skb->csum, csum_partial(skb->data,
+ skb_network_offset(skb), 0));
skb_pull(skb, skb_network_offset(skb));
return 0;
}
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 7d193d8..8f0d5ab 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -1289,6 +1289,14 @@ static int validate_tp_port(const struct
sw_flow_key *flow_key)
return -EINVAL;
}
+static int validate_eth_present(const struct sw_flow_key *flow_key)
+{
+ if (flow_key->phy.noeth)
+ return -EINVAL;
+
+ return 0;
+}
+
void ovs_match_init(struct sw_flow_match *match,
struct sw_flow_key *key,
struct sw_flow_mask *mask)
@@ -1352,7 +1360,12 @@ static int validate_set(const struct nlattr *a,
case OVS_KEY_ATTR_PRIORITY:
case OVS_KEY_ATTR_SKB_MARK:
+ break;
+
case OVS_KEY_ATTR_ETHERNET:
+ err = validate_eth_present(flow_key);
+ if (err)
+ return err;
break;
case OVS_KEY_ATTR_TUNNEL:
@@ -1507,6 +1520,9 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
case OVS_ACTION_ATTR_POP_ETH:
+ err = validate_eth_present(key);
+ if (err)
+ return err;
break;
case OVS_ACTION_ATTR_PUSH_ETH:
@@ -1516,6 +1532,9 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
break;
case OVS_ACTION_ATTR_PUSH_VLAN:
+ err = validate_eth_present(key);
+ if (err)
+ return err;
vlan = nla_data(a);
if (vlan->vlan_tpid != htons(ETH_P_8021Q))
return -EINVAL;
>> +static int push_eth(struct sk_buff *skb, const struct ovs_action_push_eth *ethh)
>> +{
>> + int err;
>> +
>> + skb_push(skb, ETH_HLEN);
>> + skb_reset_mac_header(skb);
>> +
>> + err = set_eth_addr(skb, ðh->addresses);
>> + if (unlikely(err))
>> + return err;
> I would just do the memcpy directly here. set_eth_addr() has a bunch
> of extra things that make it not very efficient for what we are trying
> to do. memcpy would also remove the error path.
OK.
> Should we update skb->protocol here?
Right, will do that, thanks!
>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index eafcfd8..df2fb05 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>> struct sw_flow_key {
>> struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */
>> + bool noeth; /* Packet has no Ethernet header */
>> struct {
>> u32 priority; /* Packet QoS priority. */
>> u32 skb_mark; /* SKB mark. */
> I would put the noeth bool inside of struct phy so that we don't
> introduce additional holes in the flow structure. I think it also
> logically fits in there.
Will fix.
>> diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
>> index 9b26528..8f71c49 100644
>> --- a/datapath/flow_netlink.c
>> +++ b/datapath/flow_netlink.c
>> @@ -521,6 +522,8 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match, bool *exact_5tuple
>> SW_FLOW_KEY_MEMCPY(match, eth.dst,
>> eth_key->eth_dst, ETH_ALEN, is_mask);
>> attrs &= ~(1ULL << OVS_KEY_ATTR_ETHERNET);
>> + } else {
>> + SW_FLOW_KEY_PUT(match, noeth, true, is_mask);
>> }
> This macro sets not only the value but also ranges of the key that are
> significant. Therefore, we should explicitly set the value in all
> cases, rather than assuming the default is false.
>
> I think there's a potential for ambiguity here when megaflows are
> used. If there is no Ethernet attribute does that mean that it must be
> an L3 packet or does it mean that it is wildcarded as 'don-t care'? As
> currently implemented, I believe that it would result in the former
> but that would change behavior.
How about the following instead:
SW_FLOW_KEY_MEMCPY(match, eth.dst,
eth_key->eth_dst, ETH_ALEN, is_mask);
+ SW_FLOW_KEY_PUT(match, phy.noeth, false, is_mask);
attrs &= ~(1ULL << OVS_KEY_ATTR_ETHERNET);
- } else {
+ } else if (!is_mask)
SW_FLOW_KEY_PUT(match, phy.noeth, true, is_mask);
- }
>> @@ -951,7 +954,7 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
>> const struct sw_flow_key *output, struct sk_buff *skb)
>> {
>> struct ovs_key_ethernet *eth_key;
>> - struct nlattr *nla, *encap;
>> + struct nlattr *nla, *encap = NULL;
> I think we can remove the explicit setting of encap to NULL later on
> in the code now, it's a little cleaner.
OK.
>> +noethernet:
>> if (nla_put_be16(skb, OVS_KEY_ATTR_ETHERTYPE, output->eth.type))
>> goto nla_put_failure;
> Does it still make sense to send the EtherType? It's not present in
> the packet and I believe that the information is contained in the
> attributes that we do send (i.e. IPv4 or v6 attributes).
We had a discussion about this in August last year:
>>>>>> One such decision is how to handle the flow key. I set all fields in
>>>>>> key->eth to 0, except the type, because we still need to know what kind
>>>>>> of L3 packet do we have. Since a lot of code is accessing
>>>>>> key->eth.type, this is easier than having this information in a
>>>>>> different place, although it would be more elegant to set this field to
>>>>>> 0 as well.
>>>>>
>>>>>
>>>>> Can we use skb->protocol for the cases where we currently access the
>>>>> EtherType? Are there cases that you are particularly concerned about?
>>>>
>>>>
>>>> The only case I'm concerned about is in the action validation code,
>>>> particularly the validate_set() and validate_tp_port() functions, since
>>>> they
>>>> don't have access to the skb, and need to know the layer 3 protocol of
>>>> the
>>>> flow. In validate_set() we could relax the check for eth.type for
>>>> OVS_KEY_ATTR_IPV4 and OVS_KEY_ATTR_IPV6, but I'm not sure what to do
>>>> about
>>>> validate_tp_port().
>>>>
>>>> In general, I think it would be a good idea for the flow key to have a
>>>> field
>>>> specifying the layer 3 protocol, when an ethernet header is not present.
>>>
>>> That makes sense to me.
>>
>> You mean that we keep eth.type as the L3 protocol field, or define a new
>> one, separate from the eth union?
> I think it's fine to keep using eth.type as the protocol field. I
> think we can probably some holes to move the is_layer3 flag into the
> non-tunnel portion of the flow though.
Should we revisit this discussion?
>> @@ -1543,6 +1552,13 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
>> break;
>>
>>
>> + case OVS_ACTION_ATTR_POP_ETH:
>> + break;
>> +
>> + case OVS_ACTION_ATTR_PUSH_ETH:
>> + /* TODO May need to validate eth_type? */
> I think it's not strictly required for correctness of operation in OVS
> (meaning it won't cause us to crash and anything else that looks at
> the packet should do validation first like any packet coming off the
> wire). Simon did some work in this area with MPLS (where it is
> required for correctness) and it's pretty complicated so I would be
> inclined to not do anything here unless we find a compelling reason.
I'll remove the comment.
>> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
>> index 8737b63..2737cd2 100644
>> --- a/datapath/vport-gre.c
>> +++ b/datapath/vport-gre.c
>> @@ -112,6 +112,7 @@ static int gre_rcv(struct sk_buff *skb,
>> key = key_to_tunnel_id(tpi->key, tpi->seq);
>> ovs_flow_tun_key_init(&tun_key, ip_hdr(skb), key, filter_tnl_flags(tpi->flags));
>>
>> + OVS_CB(skb)->is_layer3 = false;
>> ovs_vport_receive(vport, skb, &tun_key);
> I would make is_layer3 an argument to ovs_vport_receive(). That way
> it's impossible to forget to initialize it (I think it's missing from
> vport-internal_dev.c).
Yeap, it was missing. I'll follow your recommendation.
>> diff --git a/datapath/vport-lisp.c b/datapath/vport-lisp.c
>> index c2698ae..b56eec7 100644
>> --- a/datapath/vport-lisp.c
>> +++ b/datapath/vport-lisp.c
> On the transmit path, we still assume that there might be an L2 header
> and pull it off. Can we enforce a restriction that it must be an L3
> packet at this point?
Good point, we should definitely do that. I was thinking that we can
use OVS_CB(skb)->pkt_key.phy.noeth in lisp_send() (and lisp_rcv()) for
detecting packet type, is that ok?
> Conversely, what about L2 ports that get packets
> without an L2 header?
That should not normally happen, as I replied to Ben in the user space
part. However, in case it does, since it's unexpected behavior, should
we drop?
More information about the dev
mailing list