[ovs-dev] [PATCH 2/3] datapath: Mega flow implementation

Jesse Gross jesse at nicira.com
Fri Jun 7 23:43:08 UTC 2013


OK, here's the second half:

On Wed, Jun 5, 2013 at 10:46 PM, Andy Zhou <azhou at nicira.com> wrote:
> diff --git a/datapath/flow.c b/datapath/flow.c
> index 7f897bd..fc8fb86 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
>  static struct kmem_cache *flow_cache;
> +static LIST_HEAD(mask_list);

We should add a comment indicating that this is protected by ovs_mutex.

> +static void update_desc__(struct sw_flow_match *match,
> +                         size_t offset, size_t size, bool is_mask)
> +{
> +       struct sw_flow_desc *desc = NULL;
> +       size_t start= offset;
> +       size_t end= offset + size;

Can you put spaces on both sides of the equals sign?

> +bool ovs_match_validate(const struct sw_flow_match *match,
> +               u64 key_attrs, u64 mask_attrs)

I think the halves of this function (expectations and prerequisites)
are basically two aspects of the same problem and should be handled
together. For example, if the ICMPv6 type is not
NDISC_NEIGHBOUR_SOLICITATION or NDSIC_NEIGHBOUR_ADVERTISEMENT then we
shouldn't have a corresponding attribute for that. Doing it in a
single pass would make it less likely that we miss something. I think
we could handle it this way:
 - When extracting match values, record attributes that are all
wildcards and remove then from the value attributes completely.
 - When computing expected attributes during validation, when we have
a protocol value check, also verify that the mask is exact match.
 - AND the key_attributes with the mask values so we get a list of
fields that have some matches in them.
 - Check that the attributes with matches is exactly the same as the
expected attributes (not a subset in either direction).

One downside of this is that it won't validate attributes that aren't
used for matching, which doesn't really matter to the kernel but if we
get an invalid flow then we won't detect it at setup time. Instead
we'll just silently return a different ID when we serialize back to
Netlink. To avoid this, we may want to continue to do a check on the
unmasked attributes.

> +static void flow_key_mask(struct sw_flow_key *dst,
> +                         const struct sw_flow_key *src,
> +                         const struct sw_flow_mask *mfm)
> +{
> +       u8 *m = (u8 *)&mfm->key + mfm->desc.start;
> +       u8 *s = (u8 *)src + mfm->desc.start;
> +       u8 *d = (u8 *)dst + mfm->desc.start;

You could improve the performance here by using longs instead of chars.

> +static bool __flow_cmp_key_mask(struct sw_flow *flow, u32 hash, u8 *key,
> +                               int key_start, int key_len,
> +                               struct sw_flow_mask *mfm)
> +{
> +       return (flow->hash == hash && (flow->mfm == mfm) &&
> +               !memcmp((u8 *)&flow->key + key_start, key, (key_len - key_start)));
> +}

Since the flows are non-overlapping, do we need to compare the masks?

> +struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *table,
> +                                   const struct sw_flow_key *flow_key,
> +                                   struct sw_flow_mask *mfm,
> +                                   int key_len)
> +{
> +       const struct sw_flow_key *key;
> +       struct sw_flow *flow;
> +       struct hlist_head *head;
> +       u8 *_key;
> +       int key_start = mfm->desc.start;
> +       u32 hash;
> +       struct sw_flow_key masked_key;
> +       int end_roundup = mfm->desc.start
> +                       + sw_flow_mask_roundup_size(mfm);

Do we have to round up the size at this point? It seems like the
individual components might be able to do this based on their own
needs (for example, ovs_flow_hash() already rounds up to 4 bytes).

> +
> +       if (end_roundup < key_len)
> +               key_len = end_roundup;

Can't we just always use end_roundup?

I'm not sure that it makes sense to calculate key_len in
ovs_flow_extract() anymore. At this point, it's only used as a quick
check to see whether a mask could potentially ever match. However,
it's a pretty coarse filter though and I suspect that we could come up
with something (if necessary) that is less complicated.

> +       flow_key_mask(&masked_key, flow_key, mfm);
> +       key = &masked_key;
> +       hash = ovs_flow_hash(key, key_start, key_len);
> +       _key = (u8 *) key + key_start;
>         head = find_bucket(table, hash);
>         hlist_for_each_entry_rcu(flow, head, hash_node[table->node_ver]) {
> -
> -               if (flow->hash == hash &&
> -                   !memcmp((u8 *)&flow->key + key_start, _key, key_len - key_start)) {
> +               if (__flow_cmp_key_mask(flow, hash, _key, key_start, key_len, mfm))

I'm not sure that we need to pass in both _key and key_start - they
are more or less equivalent.

>                         return flow;
> -               }
> +
>         }
>         return NULL;
>  }
>
> +

Extra blank line here.

>  void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
> -                        struct sw_flow_key *key, int key_len)
> +                        const struct sw_flow_key *key, int key_len)
>  {
> -       flow->hash = ovs_flow_hash(key, flow_key_start(key), key_len);
> -       memcpy(&flow->key, key, sizeof(flow->key));
> -       __flow_tbl_insert(table, flow);
> +       flow->user_id = *key;
> +       flow->id_hash = ovs_flow_hash(key, flow_key_start(key), key_len);
> +       flow_key_mask(&flow->key, &flow->user_id, flow->mfm);

We should annotate the definition of flow->mfm with __rcu so that
sparse would complain here (since we're not using ovsl_dereference,
which we should be).

> @@ -1012,37 +1159,48 @@ int ipv4_tun_from_nlattr(const struct nlattr *attr,
[...]
>         }
> +       if (is_mask) {
> +               SW_FLOW_KEY_PUT(match, tun_key.tun_flags, 0xffff, is_mask);
> +       }else {
> +               SW_FLOW_KEY_PUT(match, tun_key.tun_flags, tun_flags, is_mask);
> +       }
> +

Do we need to force the tunnel flags to be exact match? Can't we allow
userspace to specify which flags to mask using the usual logic?

You could also drop the braces above and add a blank line before the
if statement.

>  int ipv4_tun_to_nlattr(struct sk_buff *skb,
[...]
> +       if (tun_key == output) {

Similar to the other direction, I think it's OK to just directly
translate the flags even for masks.

> +static int __metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
> +               const struct nlattr **a, bool is_mask)

I'm not sure that the underscores at the beginning are necessary since
this isn't the internal version of any function.

> +/**
> + * ovs_match_from_nlattrs - parses Netlink attributes into a flow key and
> + * mask. In case the mask is specified (mask == NULL), the flow is treated

I think this should say if the mask is NOT specified.

> +int ovs_match_from_nlattrs(struct sw_flow_match *match,
> +                          const struct nlattr *key,
> +                          const struct nlattr *mask)
> +{
> +       const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
> +       const struct nlattr *m[OVS_KEY_ATTR_MAX + 1];
> +       const struct nlattr *encap;
> +       u64 key_attrs = 0;
> +       u64 mask_attrs = 0;
> +       bool encap_valid = false;
> +       int err;
> +
> +       err = parse_flow_nlattrs(key, a, &key_attrs);
> +       if (err)
> +               return err;
> +
> +       if (key_attrs & 1ULL << OVS_KEY_ATTR_ENCAP) {
> +               encap = a[OVS_KEY_ATTR_ENCAP];
> +               key_attrs &= ~(1ULL << OVS_KEY_ATTR_ENCAP);
> +               if (nla_len(encap)) {
> +                       __be16 tci = 0;
> +                       __be16 eth_type = 0; /* ETH_P_8021Q */
> +
> +                       if (a[OVS_KEY_ATTR_ETHERTYPE])
> +                               eth_type = nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]);

It might be good to mask out this EtherType because we've effectively
consumed it and are expecting a new one.

> +                       if (a[OVS_KEY_ATTR_VLAN])
> +                               tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);

We may want to start enforcing the condition that we don't have
duplicate attributes in parse_flow_nlattrs(). Otherwise, an inner vlan
(which we don't support currently) could replace the value from the
outer tag here and we would silently accept it.

> +                       if  ( (eth_type == htons(ETH_P_8021Q))
> +                                       && (tci & htons(VLAN_TAG_PRESENT)) ) {

I would probably move the check for VLAN_TAG_PRESENT to
ovs_key_from_nlattrs() and then just check for OVS_KEY_ATTR_VLAN here.

It's somewhat unusual in kernel style to have extra spaces around the
parentheses.

> +                               encap_valid = true;
> +                               err = parse_flow_nlattrs(encap, a, &key_attrs);
> +                       }
> +                       else
> +                               err = -EINVAL;

Similarly, it's normal to have the else on the same line as the closing brace.

> +       if (ovs_match_validate(match, key_attrs, mask_attrs) == false)
>                 return -EINVAL;

My usual comment about NULL/false checks.

> +struct sw_flow_mask *sw_flow_mask_alloc(void)
> +{
> +       struct sw_flow_mask *mfm;
> +
> +       mfm = kmalloc(sizeof *mfm, GFP_KERNEL);
> +       if (mfm) {
> +               kref_init(&mfm->kref);
> +               INIT_LIST_HEAD(&mfm->list);

You should have to init this member, since it's not a list head.

> +/**
> + * Set 'desc' fileds in the mask to the value of 'val'.
> + */
> +void sw_flow_mask_set(struct sw_flow_mask *mfm,
> +               struct sw_flow_desc *desc, u8 val)

Can this be static?

In ovs_flow_cmd_new_or_set(), I think we should have a check to make
sure that the mask is not changed when updating an existing flow.



More information about the dev mailing list