[ovs-dev] [netlink v4 47/52] datapath: Adopt Generic Netlink-compatible locking.

Jesse Gross jesse at nicira.com
Thu Jan 27 04:41:28 UTC 2011


On Wed, Jan 26, 2011 at 10:42 AM, Ben Pfaff <blp at nicira.com> wrote:
> On Wed, Jan 26, 2011 at 09:15:35AM -0800, Jesse Gross wrote:
>> On Tue, Jan 25, 2011 at 9:30 PM, Ben Pfaff <blp at nicira.com> wrote:
>> > On Tue, Jan 25, 2011 at 04:48:08PM -0800, Jesse Gross wrote:
>> >> On Tue, Jan 25, 2011 at 11:20 AM, Ben Pfaff <blp at nicira.com> wrote:
>> >> DPs somewhat straddle the line as far as being protected by genl vs
>> >> RTNL. ?We could just document that it is safe to access it with either
>> >> and both are required to modify but I'm tempted to just enforce
>> >> RTNL/RCU. ?There are only a few places that don't do this already, so
>> >> we could just take RCU in these cases.
>> >
>> > I guess I don't see the benefit to this change. ?RTNL is required to
>> > create and delete datapaths only because those operations create and
>> > destroy network devices as side effects. ?RTNL isn't needed for anything
>> > else related to datapaths (and I see now that I was overzealous about
>> > taking RTNL in set_datapath(); that isn't needed).
>>
>> I would actually much prefer that genl_lock is used to protect DPs (I
>> agree that the fact that it takes RTNL in the add/destroy cases isn't
>> that important because it's not protecting the dp data structure
>> itself).
>>
>> The primary concern that I have is with the things in dp_notify.c
>> because they only have RTNL and make modifications.  The worst
>> offender is NETDEV_UNREGISTER, which unfortunately is the most useful.
>>  First, dp_detach_port() is documented as requiring genl_lock and RTNL
>> but only RTNL is held here.  I believe that we currently access all of
>> the fields that it modifies while holding either RTNL or RCU but that
>> conflicts with DP being protected by genl_lock.
>
> Oh sure, I'm happy to mark those fields as protected by RTNL.
>
> I thought you were saying that any access to a datapath would have to be
> protected by RCU or RTNL, which would have meant a big change, since
> almost every odp_cmd_*() function in datapath.c would then need to run
> mostly under the RCU read lock.  But I see now that you are suggesting a
> much smaller change.

I actually was suggesting using RTNL exclusively originally.  I was
trying to reduce the amount of overlap between genl and RTNL and for
this RTNL was the only choice.  However, the more I looked at it, the
more I just wanted to avoid the problem entirely by minimizing the
pieces that are touched by dp_notifier.c.  So for now, that just means
documentation.

>
> I applied the following, plus some more from your later comments (see
> below).  The list_del_rcu() -> list_del() change was a leftover that I
> think we both missed from earlier.

Looks good.

>> One consolation is that it looks like dp->n_ports is write-only now so
>> we can drop it and avoid a problem there.
>
> Good spotting.  I added the following to commit "datapath: Drop port
> information from odp_stats", which dropped the last reader of n_ports.

Looks good.

>> The other notifications in dp_device_event() are at least read-only
>> from the perspective of the DP, so we could in theory hold
>> rcu_read_lock to take care of that, except that the sysfs functions
>> can sleep.  I'm pretty convinced at this point that the MTU stuff
>> should go away (handled by userspace if actually needed), which takes
>> care of that notifier and would allow getting rid of the port_list,
>> helping the unregister case.
>
> Are these unsafe holding RTNL?

No, it's not actually unsafe, it just interferes with my desire to
make the locking simpler.  If we say that the datapath is protected by
genl_lock/RCU then we should really be holding RCU here.  But we can't
do that here, so it just becomes a caveat.

Actually, I see that the comment at the top of datapath.c says that
datapaths are protected by RTNL, not genl.  I think that's not really
true, or at least we hope genl is the rule rather than the exception.

>> >> > +static void free_vport_rcu(struct rcu_head *rcu)
>> >> > +{
>> >> > + ? ? ? struct vport *vport = container_of(rcu, struct vport, rcu);
>> >> > +
>> >> > + ? ? ? if (vport->ops->flags & VPORT_F_GEN_STATS)
>> >> > + ? ? ? ? ? ? ? free_percpu(vport->percpu_stats);
>> >> > +
>> >> > + ? ? ? kobject_put(&vport->kobj);
>> >> > +}
>> >> > +
>> >> > ?/**
>> >> > ?* ? ? vport_free - uninitialize and free vport
>> >> > ?*
>> >> > @@ -212,21 +226,16 @@ struct vport *vport_alloc(int priv_size, const struct vport_ops *ops, const stru
>> >> > ?*/
>> >> > ?void vport_free(struct vport *vport)
>> >> > ?{
>> >> > - ? ? ? if (vport->ops->flags & VPORT_F_GEN_STATS)
>> >> > - ? ? ? ? ? ? ? free_percpu(vport->percpu_stats);
>> >> > -
>> >> > - ? ? ? kobject_put(&vport->kobj);
>> >> > + ? ? ? call_rcu(&vport->rcu, free_vport_rcu);
>> >> > ?}
>> >>
>> >> vport_free() is only called from the vport implementation after it has
>> >> waited for a grace period, so we don't need to wait for another one
>> >> here.
>> >
>> > Are we missing a synchronize_rcu() in internal_dev_destroy()? ?It's the
>> > only ->destroy callback where it isn't obvious to me that vport_free()
>> > is called only after a grace period.
>>
>> There's one in unregister_netdevice().  I suppose we should add a comment.
>
> OK, I added a comment:
>
>        unregister_netdevice(netdev_vport->dev);
>        /* unregister_netdevice() waits for an RCU grace period. */
>        vport_free(vport);

Looks good.

>
>> > Should we document that the ->destroy function must wait a grace period
>> > before destroying the vport?
>>
>> Sure.
>
> OK, I updated the comment:
>
>  * @destroy: Destroys a vport.  Must call vport_free() on the vport but not
>  * before an RCU grace period has elapsed.

Looks good.


>> Also, I noticed that the queue stuff is still there.  We should be
>> able to get rid of all of it now.
>
> OK, I added the following to commit "datapath: Convert upcalls and
> ODP_EXECUTE to use AF_NETLINK socket layer":

>  static int queue_control_packets(struct datapath *dp, struct sk_buff *skb,
>                                 const struct dp_upcall_info *upcall_info)

I suppose this function name does make much sense any more but
otherwise looks good.

>> The vport RCU callback is still here but I'm guessing you were just
>> waiting to hear back about it.
>
> Yes, I've reverted it now and added a comment, like so:

Looks good.




More information about the dev mailing list