[ovs-dev] [PATCH 2/3] datapath: Mega flow implementation
Andy Zhou
azhou at nicira.com
Fri Jun 7 10:35:34 UTC 2013
Thanks for the review.
On Thu, Jun 6, 2013 at 7:01 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Wed, Jun 5, 2013 at 10:46 PM, Andy Zhou <azhou at nicira.com> wrote:
> > Add mega flow support in kernel datapath.
> >
> > Pravin has made significant contributions to this patch. Including
> > the mega flow id look up scheme, API clean ups, and bug fixes.
> >
> > Co-authored-by: Pravin B Shelar <pshelar at nicira.com>
> > Signed-off-by: Pravin B Shelar <pshelar at nicira.com>
> > Signed-off-by: Andy Zhou <azhou at nicira.com>
>
> I would probably just squash the previous patch and this one together
> since they really come together and the first one is so small.
>
> I received a sparse error after applying this patch:
> CHECK /home/jgross/openvswitch/datapath/linux/flow.c
> /home/jgross/openvswitch/datapath/linux/flow.c:179:40: warning:
> restricted __be16 degrades to integer
>
>
It looks like a legitimate problem, although one that will not
> actually affect things in practice because of the value.
>
Agreed, will fix
>
> > diff --git a/datapath/datapath.c b/datapath/datapath.c
> > index 42af315..98d78a8 100644
> > --- a/datapath/datapath.c
> > +++ b/datapath/datapath.c
> > @@ -372,13 +372,13 @@ static size_t key_attr_size(void)
> > {
> > return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */
> > + nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */
> > - + nla_total_size(8) /* OVS_TUNNEL_KEY_ATTR_ID */
> > - + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC
> */
> > - + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_DST
> */
> > - + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */
> > - + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */
> > - + nla_total_size(0) /*
> OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
> > - + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */
> > + + nla_total_size(8) /* OVS_TUNNEL_KEY_ATTR_ID */
> > + + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */
> > + + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */
> > + + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */
> > + + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */
> > + + nla_total_size(0) /*
> OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */
> > + + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */
> > + nla_total_size(4) /* OVS_KEY_ATTR_IN_PORT */
> > + nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */
> > + nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */
>
> These are actually intentionally indented to show a grouping of the
> tunnel sub-attributes. Regardless, we should also try to keep
> unrelated changes out of a patch.
>
Did not know this is intentional. Will roll back the change.
>
> > @@ -1138,14 +1142,27 @@ static int ovs_flow_cmd_fill_info(struct sw_flow
> *flow, struct datapath *dp,
> [...]
> > + nla = nla_nest_start(skb, OVS_FLOW_ATTR_MASK);
> > + if (!nla)
> > + goto nla_put_failure;
> > +
> > + err = ovs_flow_to_nlattrs(&flow->key,
> > + &flow->mfm->key, skb);
> > + if (err)
> > + goto error;
> > +
> > + nla_nest_end(skb, nla);
>
> It looks to me like we don't allocate any extra space in the buffer
> for the mask when called ovs_flow_cmd_build_info().
>
Do you mean ovs_dp_cmd_msg_size() does not include the size of mask (and
key) attributes?
>
> The indentation here also uses spaces instead of tabs.
>
Will fix.
>
> > @@ -1241,23 +1261,33 @@ static int ovs_flow_cmd_new_or_set(struct
> sk_buff *skb, struct genl_info *info)
> > error = -EINVAL;
> > if (!a[OVS_FLOW_ATTR_KEY])
> > goto error;
> > - error = ovs_flow_from_nlattrs(&key, &key_len,
> a[OVS_FLOW_ATTR_KEY]);
> > - if (error)
> > - goto error;
> > +
> > + mfm = sw_flow_mask_alloc();
>
> We should make sure that all symbols that are shared across
> compilation units are prefixed with ovs_ so that we don't pollute the
> global kernel namespace.
>
Will fix.
>
> I suspect that doing the allocation for the mask up front will result
> in a lot of extra mallocs and frees on the flow setup path since most
> of the time we will be hitting an existing match. Is there a way to
> eliminate that?
>
Good point, I will implement this change.
>
> > + if (mfm == NULL)
> > + return -ENOMEM;
>
> Just checking !mfm is probably more canonical.
>
Will fix.
>
> > + ovs_match_init(&match, &key, mfm);
> > + error = ovs_match_from_nlattrs(&match,
> > + a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);
> > + if (error) {
> > + goto err_kfree;
> > + }
>
> Kernel style is not have brackets around since line if statements.
>
Will fix.
>
> > @@ -1267,7 +1297,23 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff
> *skb, struct genl_info *info)
> > goto err_unlock_ovs;
> >
> > table = ovsl_dereference(dp->table);
> > - flow = ovs_flow_tbl_lookup(table, &key, key_len);
> > +
> > + /* Deduplicate the mega flow mask. Note ovs_mutex is held. */
> > + existing_mfm = sw_flow_mask_find(mfm);
> > + if (existing_mfm) {
> > + flow = ovs_flow_tbl_lookup(table, &key, existing_mfm,
> key_len);
> > + if (flow) {
> > + if (!ovs_flow_cmp_id(flow, &key, key_len)) {
> > + error = -EINVAL;
> > + goto err_unlock_ovs;
> > + }
>
> I believe this allows for inserting the same flow with multiple masks
> but this isn't really good because the order of the masks (and
> therefore priority) is somewhat arbitrary. If we just do an lookup
> similar to how a packet is processed on the exact flow then it seems a
> little more robust and means that we can eliminate the need to have a
> second hash table.
>
So just look for any existing flow with key only. The mask attribute is
only be used
for creating a new flow?
>
> > @@ -1294,14 +1340,17 @@ static int ovs_flow_cmd_new_or_set(struct
> sk_buff *skb, struct genl_info *info)
> > }
> > clear_stats(flow);
> >
> > + rcu_assign_pointer(flow->mfm, mfm);
> > rcu_assign_pointer(flow->sf_acts, acts);
> >
> > /* Put flow in bucket. */
> > ovs_flow_tbl_insert(table, flow, &key, key_len);
> >
> > + if (existing_mfm == NULL)
> > + sw_flow_mask_insert(mfm);
>
> Same (minor) comment here about the NULL check.
>
Will fix.
>
> > diff --git a/datapath/flow.h b/datapath/flow.h
> > index dba66cf..9fb5a1c 100644
> > --- a/datapath/flow.h
> > +++ b/datapath/flow.h
> > @@ -117,9 +119,13 @@ struct sw_flow_key {
> > struct sw_flow {
> > struct rcu_head rcu;
> > struct hlist_node hash_node[2];
> > + struct hlist_node hash_node_id[2];
>
> I don't think there is a need to have two instances of hash_node_id
> because it is only accessed under lock. The reason for having two in
> packet lookup is to avoid a hiccup in traffic when we move a flow from
> one table to another. However, there are no such concurrent accesses
> in the lookup table.
>
It seems we are going down the path of the removing the hash_node_id
entirely, right?
>
> > +struct sw_flow_desc {
> > + size_t start;
> > + size_t end;
> > +};
>
> I found the name "desc" to not be very intuitive - is the another name
> that is more self-explanatory?
>
How about sw_flow_key_range? Any other suggestions?
>
> > +static inline u16 sw_flow_desc_roundup_size(const struct sw_flow_desc
> *desc)
> > +{
> > + u16 actual = sw_flow_desc_actual_size(desc);
> > + u16 n32s = actual / sizeof(u32);
> > + u16 roundup;
> > +
> > + roundup = (actual & 0x3) ? (n32s + 1) * sizeof(u32) : actual;
> > +
> > + return roundup;
> > +}
>
> Is this the same as DIV_ROUND_UP?
>
I did not use it because DIV_ROUND_UP gives the roundup of u32, I needed
the roundup in bytes. However, I could rewrite this function using
DIV_ROUND_UP.
>
> > +bool ovs_match_validate(const struct sw_flow_match *match,
> > + u64 key_attrs, u64 mask_attrs);
> > +
>
> It looks like we don't use this function outside of flow.c, in which
> case I think it could be made static.
>
Agreed.
>
> > struct flow_table {
> > struct flex_array *buckets;
> > + struct flex_array *buckets_ids;
>
> As I look at this further, I've become more and more convinced that we
> really only want a single hash table. Otherwise, there are a lot of
> questions about sizing, the need to expand, rehash, etc.
>
> I agree single hash bucket will work.
> +struct sw_flow_mask {
> > + struct sw_flow_desc desc;
> > + struct sw_flow_key key;
> > + struct list_head list;
> > + struct rcu_head rcu;
> > + struct kref kref;
>
> I probably wouldn't use kref here since the use of atomics makes it
> look like there is some extra locking going on. In our case, the lock
> is provided by ovs_mutex, so we could just use an int.
>
Agreed, as long as packet path does not change the reference count.
>
> > +struct sw_flow_mask *sw_flow_mask_alloc(void);
> > +void sw_flow_mask_add_ref(struct sw_flow_mask *);
> > +int sw_flow_mask_del_ref(struct sw_flow_mask *);
> > +void sw_flow_mask_insert(struct sw_flow_mask *);
> > +void sw_flow_mask_set(struct sw_flow_mask *, struct sw_flow_desc *, u8
> val);
> > +struct sw_flow_mask *sw_flow_mask_find(const struct sw_flow_mask *);
>
> All of these should be prefixed with "ovs_".
>
Sure.
> I'm still working on flow.c - in particular, I want to go through the
> usage of key length and encap attributes carefully. I should have the
> rest of the comments tomorrow.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20130607/4aa14039/attachment-0003.html>
More information about the dev
mailing list