[ovs-dev] [PATCH 2/2] datapath: Hash only the part of struct sw_flow_key populated by flow_extract().

Jesse Gross jesse at nicira.com
Fri Apr 29 18:31:13 UTC 2011


On Fri, Apr 29, 2011 at 1:22 AM, Andrew Evans <aevans at nicira.com> wrote:
> On 4/28/11 1:37 PM, Jesse Gross wrote:
>> On Mon, Apr 25, 2011 at 6:11 PM, Andrew Evans <aevans at nicira.com> wrote:
>>> @@ -293,7 +294,7 @@ void dp_process_received_packet(struct vport *p, struct sk_buff *skb)
>>>
>>>                /* Look up flow. */
>>>                flow_node = tbl_lookup(rcu_dereference(dp->table), &key,
>>> -                                       flow_hash(&key), flow_cmp);
>>> +                                      flow_hash(&key, key_len), flow_cmp);
>>
>> The key length is also useful for the flow comparison.  It's maybe not
>> quite as significant as the hash but it would still be good to get
>> that optimization as well.
>
> Ok. I considered that but stopped short when I realized that
> tbl_lookup() would have to change, which would ripple into the tunnel
> code. Here's an incremental for that:
> diff --git a/datapath/table.c b/datapath/table.c
> index 47fa016..4bc91d7 100644
> --- a/datapath/table.c
> +++ b/datapath/table.c
> @@ -204,8 +204,8 @@ static int search_bucket(const struct tbl_bucket *bucket, void *target, u32 hash
>  * Searches @table for an object identified by @target.  Returns the tbl_node
>  * contained in the object if successful, otherwise %NULL.
>  */
> -struct tbl_node *tbl_lookup(struct tbl *table, void *target, u32 hash,
> -                           int (*cmp)(const struct tbl_node *, void *))
> +struct tbl_node *tbl_lookup(struct tbl *table, void *target, int len, u32 hash,
> +                           int (*cmp)(const struct tbl_node *, void *, int))

Can you update the comment above this function for the new argument?

>>>  static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
>>> -                       int nh_len)
>>> +                       int *key_len, int nh_len)
>>>  {
>>>        struct icmp6hdr *icmp = icmp6_hdr(skb);
>>> +       int saved_key_len = *key_len;
>>>
>>>        /* The ICMPv6 type and code fields use the 16-bit transport port
>>>         * fields, so we need to store them in 16-bit network byte order.
>>>         */
>>>        key->ipv6.tp.src = htons(icmp->icmp6_type);
>>>        key->ipv6.tp.dst = htons(icmp->icmp6_code);
>>> +       *key_len = SW_FLOW_KEY_OFFSET(ipv6.tp.dst);
>>
>> Don't we need to store the key_len at this point, not before?  If we
>> jump to the invalid block then we don't get rid of type and code.
>
> It looks to me like we abort processing of this packet if an error is
> returned from this function. I guess there's no harm in reporting a
> partial result on error though.

Packet processing doesn't get aborted for any reason except a problem
with the system, such as a memory allocation failure.  If there is a
format error then we do not continue parsing the packet but instead go
with the headers that we already have, so we still need the length to
accurately reflect the flow.

Were you intending to fix this in the diff below?  I don't see how it
rolls back the length for invalid packets.

> (I would have used the max() macro to prevent moving key_len backward,
> but the embedded pointer type comparison generated warnings that I
> couldn't easily work around.)

There's also max_t() which allows you to specify a type and usually
solves those problems.

>>>  int flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
>>> -                bool *is_frag)
>>> +               int *key_len, bool *is_frag)
>>>  {
>>>        struct ethhdr *eth;
>>>
>>>        memset(key, 0, sizeof(*key));
>>>        key->tun_id = OVS_CB(skb)->tun_id;
>>>        key->in_port = in_port;
>>> +       *key_len = SW_FLOW_KEY_OFFSET(in_port);
>>
>> I would probably keep this in a local variable and then assign to the
>> argument on exit.  It's probably better to have a single point of
>> return anyways.
>
> Ok, I like that better. Here's an incremental diff. (I might have taken
> "single point of return" too literally -- please tell me if so.)

>                        if (nh_len == -EINVAL) {
>                                skb->transport_header = skb->network_header;
> -                               return 0;
> +                       } else {
> +                               error = nh_len;
>                        }

We can drop the braces now.

>>> @@ -675,12 +703,12 @@ int flow_from_nlattrs(struct sw_flow_key *swkey, const struct nlattr *attr)
>>>                        if (swkey->dl_type != htons(ETH_P_IPV6))
>>>                                return -EINVAL;
>>>                        ipv6_key = nla_data(nla);
>>> +                       swkey->nw_proto = ipv6_key->ipv6_proto;
>>> +                       swkey->nw_tos = ipv6_key->ipv6_tos;
>>>                        memcpy(&swkey->ipv6.src, ipv6_key->ipv6_src,
>>>                                        sizeof(swkey->ipv6.src));
>>>                        memcpy(&swkey->ipv6.dst, ipv6_key->ipv6_dst,
>>>                                        sizeof(swkey->ipv6.dst));
>>> -                       swkey->nw_proto = ipv6_key->ipv6_proto;
>>> -                       swkey->nw_tos = ipv6_key->ipv6_tos;
>>
>> Is there a reason why these two lines were rearranged?  I noticed it
>> before as well.
>
> I did it only to make the order of assignments match the order of the
> fields in the struct so I could more easily keep track of how much of
> the flow key had been populated. I'll change it back if you want me to.

OK, that's reasonable.

>>> @@ -744,6 +777,7 @@ int flow_from_nlattrs(struct sw_flow_key *swkey, const struct nlattr *attr)
>>>                        swkey->nw_proto = ntohs(arp_key->arp_op);
>>>                        memcpy(swkey->ipv4.arp.sha, arp_key->arp_sha, ETH_ALEN);
>>>                        memcpy(swkey->ipv4.arp.tha, arp_key->arp_tha, ETH_ALEN);
>>> +                       *key_len = SW_FLOW_KEY_OFFSET(ipv4.arp.tha);
>>
>> Can't we just use SW_FLOW_KEY_OFFSET(ipv4.arp) rather than relying on
>> remembering the last element?
>
> We could. That does make sense, though I think that it might make it
> harder to notice the need to change the field name in the future if new
> fields are added or fields are rearranged. Do you think that's not worth
> worrying about?

I would just do in places where all the fields are always going to be
assigned at the same time, for example here with the ARP SHA and THA.
For that it seems like it would make it more robust against new fields
or reordering.



More information about the dev mailing list