On Mon, Apr 12, 2010 at 2:30 PM, Ben Pfaff <span dir="ltr">&lt;<a href="mailto:blp@nicira.com">blp@nicira.com</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im">On Sun, Apr 11, 2010 at 02:09:18PM -0400, Jesse Gross wrote:<br>

&gt; On Thu, Apr 8, 2010 at 7:09 PM, Ben Pfaff &lt;<a href="mailto:blp@nicira.com">blp@nicira.com</a>&gt; wrote:<br>

&gt;<br>

&gt; &gt; I pushed a small change to the netdev library to the &quot;wdp&quot; branch this<br>

&gt; &gt; morning.  Since this is not a &quot;core&quot; branch (yet) and already has a<br>

&gt; &gt; bunch of badly broken stuff in it, I did not see a need to have it<br>

&gt; &gt; reviewed in advance, but it seems like a good idea to get it reviewed<br>

&gt; &gt; now.  Here&#39;s the commit.<br>

&gt;<br>

&gt; The code looks correct to me, though what do you think about calling it<br>

&gt; something like netdev_clone() instead?  netdev_reopen() isn&#39;t the most<br>

&gt; descriptive name to me.<br>

<br>

</div>What do you think of netdev_ref() or netdev_incref()?  To me, &quot;clone&quot;<br>

implies data copying.<br></blockquote><div><br></div><div>Either of those is fine.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">

&gt; Finally, I haven&#39;t really looked at the wdp branch so I don&#39;t know the<br>

&gt; context but do you think the performance benefits are worth it?  It took me<br>

&gt; a while to understand all the levels to which struct sk_buff in Linux can be<br>

&gt; shared or independent.  This isn&#39;t nearly as bad but this is now the second<br>

&gt; refcount that we have on netdevs.  How does this compare to struct wdp_port<br>

&gt; that contains the netdev?<br>

<br>

</div>netdev_(reopen/clone/ref) is used as one step as part of copying<br>

&quot;wdp_port&quot; objects around.  If we can&#39;t copy wdp_ports cheaply then we<br>

have to copy them expensively or avoid copying them at all.  We already<br>

rejected not copying them (see the ovs-hw discussion).  It probably<br>

doesn&#39;t matter whether it is expensive to copy wdp_ports, because they<br>

are not used anywhere performance critical.<br>

<br>

What I do like about using a refcount:<br></blockquote><div><br></div><div>I&#39;m not sure that I agree with all of your arguments:</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<br>

        * The netdev implementations themselves don&#39;t have to care.<br>

          It&#39;s transparent to them.  If we add a &quot;copy&quot; operation<br>

          etc. then presumably they would have to implement it.<br></blockquote><div><br></div><div>I think a copy could be implemented entirely in the netdev core.  It just needs to call the device open function and bump the netdev_dev refcount.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

        * It can&#39;t fail.  I guess that a copy operation could return an<br>

          error if something goes wrong.<br></blockquote><div><br></div><div>That is true.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

        * There are very few contexts where even the netdev core code<br>

          has to care (i.e. the patch is very short).<br></blockquote><div><br></div><div>Given a choice of where to put complexity, I&#39;d prefer to put it in the netdev core since it only has to be implemented once.  While the patch is short, I think it adds complexity to users of the netdev library because it implicitly adds a new type of netdev (a shared shallow copy) that the caller needs to be aware of.  It&#39;s also not just the caller that needs to know since after you pass the pointer around a few times it&#39;s hard to know where it came from.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>

I&#39;m leaning toward using the refcounts, as I guess you can see.  If you<br>

think my arguments are bogus, please let me know.<br></blockquote><div><br></div><div>To me the strongest argument in favor of refcounting here is when it is used to handle datapath miss events.  Potentially opening a netdev could be somewhat expensive and it might slow down flow setup times.  It also causes churn (allocating memory and file descriptors) during the steady state operation of the switch.  Though copying the wdp_port involves allocating memory anyways, independent of the netdev changes.</div>

<div><br></div><div>Of course all of this probably doesn&#39;t matter that much in practice because reading from a netdev in userspace is pretty rare, it&#39;s more just the principle.  If we do implement refcounting I would like to see a netdev_is_cloned (obviously whatever we decide to call it, clone just seems to stick in my mind) and then assert on that in the places where we actually read or even just check in netdev_recv().  The tap netdev always shared FDs so it should probably also hook into that mechanism.</div>

</div>