[ovs-dev] [PATCH 2/3] datapath: Mega flow implementation

Jesse Gross jesse at nicira.com
Fri Jun 7 02:01:30 UTC 2013


On Wed, Jun 5, 2013 at 10:46 PM, Andy Zhou <azhou at nicira.com> wrote:
> Add mega flow support in kernel datapath.
>
> Pravin has made significant contributions to this patch. Including
> the mega flow id look up scheme, API clean ups, and bug fixes.
>
> Co-authored-by: Pravin B Shelar <pshelar at nicira.com>
> Signed-off-by: Pravin B Shelar <pshelar at nicira.com>
> Signed-off-by: Andy Zhou <azhou at nicira.com>

I would probably just squash the previous patch and this one together
since they really come together and the first one is so small.

I received a sparse error after applying this patch:
  CHECK   /home/jgross/openvswitch/datapath/linux/flow.c
/home/jgross/openvswitch/datapath/linux/flow.c:179:40: warning:
restricted __be16 degrades to integer

It looks like a legitimate problem, although one that will not
actually affect things in practice because of the value.

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 42af315..98d78a8 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> @@ -372,13 +372,13 @@ static size_t key_attr_size(void)
>  {
>         return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
>                 + nla_total_size(0)   /* OVS_KEY_ATTR_TUNNEL */
> -                 + nla_total_size(8)   /* OVS_TUNNEL_KEY_ATTR_ID */
> -                 + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
> -                 + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
> -                 + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TOS */
> -                 + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TTL */
> -                 + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
> -                 + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_CSUM */
> +               + nla_total_size(8)   /* OVS_TUNNEL_KEY_ATTR_ID */
> +               + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
> +               + nla_total_size(4)   /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
> +               + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TOS */
> +               + nla_total_size(1)   /* OVS_TUNNEL_KEY_ATTR_TTL */
> +               + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
> +               + nla_total_size(0)   /* OVS_TUNNEL_KEY_ATTR_CSUM */
>                 + nla_total_size(4)   /* OVS_KEY_ATTR_IN_PORT */
>                 + nla_total_size(4)   /* OVS_KEY_ATTR_SKB_MARK */
>                 + nla_total_size(12)  /* OVS_KEY_ATTR_ETHERNET */

These are actually intentionally indented to show a grouping of the
tunnel sub-attributes. Regardless, we should also try to keep
unrelated changes out of a patch.

> @@ -1138,14 +1142,27 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
[...]
> +    nla = nla_nest_start(skb, OVS_FLOW_ATTR_MASK);
> +    if (!nla)
> +        goto nla_put_failure;
> +
> +    err = ovs_flow_to_nlattrs(&flow->key,
> +                              &flow->mfm->key, skb);
> +    if (err)
> +        goto error;
> +
> +    nla_nest_end(skb, nla);

It looks to me like we don't allocate any extra space in the buffer
for the mask when called ovs_flow_cmd_build_info().

The indentation here also uses spaces instead of tabs.

> @@ -1241,23 +1261,33 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
>         error = -EINVAL;
>         if (!a[OVS_FLOW_ATTR_KEY])
>                 goto error;
> -       error = ovs_flow_from_nlattrs(&key, &key_len, a[OVS_FLOW_ATTR_KEY]);
> -       if (error)
> -               goto error;
> +
> +       mfm = sw_flow_mask_alloc();

We should make sure that all symbols that are shared across
compilation units are prefixed with ovs_ so that we don't pollute the
global kernel namespace.

I suspect that doing the allocation for the mask up front will result
in a lot of extra mallocs and frees on the flow setup path since most
of the time we will be hitting an existing match. Is there a way to
eliminate that?

> +       if (mfm == NULL)
> +               return -ENOMEM;

Just checking !mfm is probably more canonical.

> +       ovs_match_init(&match, &key, mfm);
> +       error = ovs_match_from_nlattrs(&match,
> +                       a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);
> +       if (error) {
> +               goto err_kfree;
> +       }

Kernel style is not have brackets around since line if statements.

> @@ -1267,7 +1297,23 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
>                 goto err_unlock_ovs;
>
>         table = ovsl_dereference(dp->table);
> -       flow = ovs_flow_tbl_lookup(table, &key, key_len);
> +
> +       /* Deduplicate the mega flow mask. Note ovs_mutex is held. */
> +       existing_mfm = sw_flow_mask_find(mfm);
> +       if (existing_mfm) {
> +               flow = ovs_flow_tbl_lookup(table, &key, existing_mfm, key_len);
> +               if (flow) {
> +                       if (!ovs_flow_cmp_id(flow, &key, key_len)) {
> +                               error = -EINVAL;
> +                               goto err_unlock_ovs;
> +                       }

I believe this allows for inserting the same flow with multiple masks
but this isn't really good because the order of the masks (and
therefore priority) is somewhat arbitrary. If we just do an lookup
similar to how a packet is processed on the exact flow then it seems a
little more robust and means that we can eliminate the need to have a
second hash table.

> @@ -1294,14 +1340,17 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
>                 }
>                 clear_stats(flow);
>
> +               rcu_assign_pointer(flow->mfm, mfm);
>                 rcu_assign_pointer(flow->sf_acts, acts);
>
>                 /* Put flow in bucket. */
>                 ovs_flow_tbl_insert(table, flow, &key, key_len);
>
> +               if (existing_mfm == NULL)
> +                       sw_flow_mask_insert(mfm);

Same (minor) comment here about the NULL check.

> diff --git a/datapath/flow.h b/datapath/flow.h
> index dba66cf..9fb5a1c 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -117,9 +119,13 @@ struct sw_flow_key {
>  struct sw_flow {
>         struct rcu_head rcu;
>         struct hlist_node hash_node[2];
> +       struct hlist_node hash_node_id[2];

I don't think there is a need to have two instances of hash_node_id
because it is only accessed under lock. The reason for having two in
packet lookup is to avoid a hiccup in traffic when we move a flow from
one table to another. However, there are no such concurrent accesses
in the lookup table.

> +struct sw_flow_desc {
> +       size_t start;
> +       size_t end;
> +};

I found the name "desc" to not be very intuitive - is the another name
that is more self-explanatory?

> +static inline u16 sw_flow_desc_roundup_size(const struct sw_flow_desc *desc)
> +{
> +       u16 actual = sw_flow_desc_actual_size(desc);
> +       u16 n32s = actual / sizeof(u32);
> +       u16 roundup;
> +
> +       roundup = (actual & 0x3) ? (n32s + 1) * sizeof(u32) : actual;
> +
> +       return roundup;
> +}

Is this the same as DIV_ROUND_UP?

> +bool ovs_match_validate(const struct sw_flow_match *match,
> +               u64 key_attrs, u64 mask_attrs);
> +

It looks like we don't use this function outside of flow.c, in which
case I think it could be made static.

>  struct flow_table {
>         struct flex_array *buckets;
> +       struct flex_array *buckets_ids;

As I look at this further, I've become more and more convinced that we
really only want a single hash table. Otherwise, there are a lot of
questions about sizing, the need to expand, rehash, etc.

> +struct sw_flow_mask {
> +       struct sw_flow_desc desc;
> +       struct sw_flow_key key;
> +       struct list_head list;
> +       struct rcu_head rcu;
> +       struct kref kref;

I probably wouldn't use kref here since the use of atomics makes it
look like there is some extra locking going on. In our case, the lock
is provided by ovs_mutex, so we could just use an int.

> +struct sw_flow_mask *sw_flow_mask_alloc(void);
> +void sw_flow_mask_add_ref(struct sw_flow_mask *);
> +int sw_flow_mask_del_ref(struct sw_flow_mask *);
> +void sw_flow_mask_insert(struct sw_flow_mask *);
> +void sw_flow_mask_set(struct sw_flow_mask *, struct sw_flow_desc *, u8 val);
> +struct sw_flow_mask *sw_flow_mask_find(const struct sw_flow_mask *);

All of these should be prefixed with "ovs_".

I'm still working on flow.c - in particular, I want to go through the
usage of key length and encap attributes carefully. I should have the
rest of the comments tomorrow.



More information about the dev mailing list