<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">Thanks for the review. <br><br>On Thu, Jun 6, 2013 at 7:01 PM, Jesse Gross <span dir="ltr"><<a href="mailto:jesse@nicira.com" target="_blank">jesse@nicira.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Wed, Jun 5, 2013 at 10:46 PM, Andy Zhou <<a href="mailto:azhou@nicira.com">azhou@nicira.com</a>> wrote:<br>
> Add mega flow support in kernel datapath.<br>
><br>
> Pravin has made significant contributions to this patch. Including<br>
> the mega flow id look up scheme, API clean ups, and bug fixes.<br>
><br>
> Co-authored-by: Pravin B Shelar <<a href="mailto:pshelar@nicira.com">pshelar@nicira.com</a>><br>
> Signed-off-by: Pravin B Shelar <<a href="mailto:pshelar@nicira.com">pshelar@nicira.com</a>><br>
> Signed-off-by: Andy Zhou <<a href="mailto:azhou@nicira.com">azhou@nicira.com</a>><br>
<br>
</div>I would probably just squash the previous patch and this one together<br>
since they really come together and the first one is so small.<br>
<br>
I received a sparse error after applying this patch:<br>
CHECK /home/jgross/openvswitch/datapath/linux/flow.c<br>
/home/jgross/openvswitch/datapath/linux/flow.c:179:40: warning:<br>
restricted __be16 degrades to integer<br>
<br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
It looks like a legitimate problem, although one that will not<br>
actually affect things in practice because of the value. <br></blockquote><div><br></div><div>Agreed, will fix <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> diff --git a/datapath/datapath.c b/datapath/datapath.c<br>
> index 42af315..98d78a8 100644<br>
> --- a/datapath/datapath.c<br>
> +++ b/datapath/datapath.c<br>
</div><div class="im">> @@ -372,13 +372,13 @@ static size_t key_attr_size(void)<br>
> {<br>
> return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */<br>
> + nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */<br>
> - + nla_total_size(8) /* OVS_TUNNEL_KEY_ATTR_ID */<br>
> - + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */<br>
> - + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */<br>
> - + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */<br>
> - + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */<br>
> - + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */<br>
> - + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */<br>
> + + nla_total_size(8) /* OVS_TUNNEL_KEY_ATTR_ID */<br>
> + + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */<br>
> + + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */<br>
> + + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */<br>
> + + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */<br>
> + + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */<br>
> + + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */<br>
> + nla_total_size(4) /* OVS_KEY_ATTR_IN_PORT */<br>
> + nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */<br>
> + nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */<br>
<br>
</div>These are actually intentionally indented to show a grouping of the<br>
tunnel sub-attributes. Regardless, we should also try to keep<br>
unrelated changes out of a patch.<br></blockquote><div>Did not know this is intentional. Will roll back the change. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> @@ -1138,14 +1142,27 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,<br>
</div>[...]<br>
<div class="im">> + nla = nla_nest_start(skb, OVS_FLOW_ATTR_MASK);<br>
> + if (!nla)<br>
> + goto nla_put_failure;<br>
> +<br>
> + err = ovs_flow_to_nlattrs(&flow->key,<br>
> + &flow->mfm->key, skb);<br>
> + if (err)<br>
> + goto error;<br>
> +<br>
> + nla_nest_end(skb, nla);<br>
<br>
</div>It looks to me like we don't allocate any extra space in the buffer<br>
for the mask when called ovs_flow_cmd_build_info().<br></blockquote><div>Do you mean ovs_dp_cmd_msg_size() does not include the size of mask (and key) attributes? <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
The indentation here also uses spaces instead of tabs.<br></blockquote><div>Will fix. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> @@ -1241,23 +1261,33 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)<br>
> error = -EINVAL;<br>
> if (!a[OVS_FLOW_ATTR_KEY])<br>
> goto error;<br>
> - error = ovs_flow_from_nlattrs(&key, &key_len, a[OVS_FLOW_ATTR_KEY]);<br>
> - if (error)<br>
> - goto error;<br>
> +<br>
> + mfm = sw_flow_mask_alloc();<br>
<br>
</div>We should make sure that all symbols that are shared across<br>
compilation units are prefixed with ovs_ so that we don't pollute the<br>
global kernel namespace.<br></blockquote><div>Will fix. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I suspect that doing the allocation for the mask up front will result<br>
in a lot of extra mallocs and frees on the flow setup path since most<br>
of the time we will be hitting an existing match. Is there a way to<br>
eliminate that?<br></blockquote><div>Good point, I will implement this change. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> + if (mfm == NULL)<br>
> + return -ENOMEM;<br>
<br>
</div>Just checking !mfm is probably more canonical.<br></blockquote><div>Will fix. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> + ovs_match_init(&match, &key, mfm);<br>
> + error = ovs_match_from_nlattrs(&match,<br>
> + a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);<br>
> + if (error) {<br>
> + goto err_kfree;<br>
> + }<br>
<br>
</div>Kernel style is not have brackets around since line if statements.<br></blockquote><div>Will fix. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> @@ -1267,7 +1297,23 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)<br>
> goto err_unlock_ovs;<br>
><br>
> table = ovsl_dereference(dp->table);<br>
> - flow = ovs_flow_tbl_lookup(table, &key, key_len);<br>
> +<br>
> + /* Deduplicate the mega flow mask. Note ovs_mutex is held. */<br>
> + existing_mfm = sw_flow_mask_find(mfm);<br>
> + if (existing_mfm) {<br>
> + flow = ovs_flow_tbl_lookup(table, &key, existing_mfm, key_len);<br>
> + if (flow) {<br>
> + if (!ovs_flow_cmp_id(flow, &key, key_len)) {<br>
> + error = -EINVAL;<br>
> + goto err_unlock_ovs;<br>
> + }<br>
<br>
</div>I believe this allows for inserting the same flow with multiple masks<br>
but this isn't really good because the order of the masks (and<br>
therefore priority) is somewhat arbitrary. If we just do an lookup<br>
similar to how a packet is processed on the exact flow then it seems a<br>
little more robust and means that we can eliminate the need to have a<br>
second hash table.<br></blockquote><div>So just look for any existing flow with key only. The mask attribute is only be used <br>for creating a new flow? <br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> @@ -1294,14 +1340,17 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)<br>
> }<br>
> clear_stats(flow);<br>
><br>
> + rcu_assign_pointer(flow->mfm, mfm);<br>
> rcu_assign_pointer(flow->sf_acts, acts);<br>
><br>
> /* Put flow in bucket. */<br>
> ovs_flow_tbl_insert(table, flow, &key, key_len);<br>
><br>
> + if (existing_mfm == NULL)<br>
> + sw_flow_mask_insert(mfm);<br>
<br>
</div>Same (minor) comment here about the NULL check.<br></blockquote><div>Will fix. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> diff --git a/datapath/flow.h b/datapath/flow.h<br>
> index dba66cf..9fb5a1c 100644<br>
> --- a/datapath/flow.h<br>
> +++ b/datapath/flow.h<br>
</div><div class="im">> @@ -117,9 +119,13 @@ struct sw_flow_key {<br>
> struct sw_flow {<br>
> struct rcu_head rcu;<br>
> struct hlist_node hash_node[2];<br>
> + struct hlist_node hash_node_id[2];<br>
<br>
</div>I don't think there is a need to have two instances of hash_node_id<br>
because it is only accessed under lock. The reason for having two in<br>
packet lookup is to avoid a hiccup in traffic when we move a flow from<br>
one table to another. However, there are no such concurrent accesses<br>
in the lookup table.<br></blockquote><div>It seems we are going down the path of the removing the hash_node_id entirely, right? <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> +struct sw_flow_desc {<br>
> + size_t start;<br>
> + size_t end;<br>
> +};<br>
<br>
</div>I found the name "desc" to not be very intuitive - is the another name<br>
that is more self-explanatory?<br></blockquote><div>How about sw_flow_key_range? Any other suggestions? <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> +static inline u16 sw_flow_desc_roundup_size(const struct sw_flow_desc *desc)<br>
> +{<br>
> + u16 actual = sw_flow_desc_actual_size(desc);<br>
> + u16 n32s = actual / sizeof(u32);<br>
> + u16 roundup;<br>
> +<br>
> + roundup = (actual & 0x3) ? (n32s + 1) * sizeof(u32) : actual;<br>
> +<br>
> + return roundup;<br>
> +}<br>
<br>
</div>Is this the same as DIV_ROUND_UP?<br></blockquote><div>I did not use it because DIV_ROUND_UP gives the roundup of u32, I needed the roundup in bytes. However, I could rewrite this function using DIV_ROUND_UP. <br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> +bool ovs_match_validate(const struct sw_flow_match *match,<br>
> + u64 key_attrs, u64 mask_attrs);<br>
> +<br>
<br>
</div>It looks like we don't use this function outside of flow.c, in which<br>
case I think it could be made static.<br></blockquote><div>Agreed. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> struct flow_table {<br>
> struct flex_array *buckets;<br>
> + struct flex_array *buckets_ids;<br>
<br>
</div>As I look at this further, I've become more and more convinced that we<br>
really only want a single hash table. Otherwise, there are a lot of<br>
questions about sizing, the need to expand, rehash, etc.<br>
<div class="im"><br></div></blockquote><div>I agree single hash bucket will work. <br><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
> +struct sw_flow_mask {<br>
> + struct sw_flow_desc desc;<br>
> + struct sw_flow_key key;<br>
> + struct list_head list;<br>
> + struct rcu_head rcu;<br>
> + struct kref kref;<br>
<br>
</div>I probably wouldn't use kref here since the use of atomics makes it<br>
look like there is some extra locking going on. In our case, the lock<br>
is provided by ovs_mutex, so we could just use an int.<br></blockquote><div>Agreed, as long as packet path does not change the reference count. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> +struct sw_flow_mask *sw_flow_mask_alloc(void);<br>
> +void sw_flow_mask_add_ref(struct sw_flow_mask *);<br>
> +int sw_flow_mask_del_ref(struct sw_flow_mask *);<br>
> +void sw_flow_mask_insert(struct sw_flow_mask *);<br>
> +void sw_flow_mask_set(struct sw_flow_mask *, struct sw_flow_desc *, u8 val);<br>
> +struct sw_flow_mask *sw_flow_mask_find(const struct sw_flow_mask *);<br>
<br>
</div>All of these should be prefixed with "ovs_".<br>
</blockquote><div>Sure. <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm still working on flow.c - in particular, I want to go through the<br>
usage of key length and encap attributes carefully. I should have the<br>
rest of the comments tomorrow.<br>
</blockquote></div><br></div></div>