[ovs-dev] [netlink 03/16] datapath: Merge "struct dp_port" into "struct vport".

Jesse Gross jesse at nicira.com
Fri Sep 24 02:11:43 UTC 2010

On Fri, Sep 10, 2010 at 3:55 PM, Ben Pfaff <blp at nicira.com> wrote:
> After the previous commit, which changed the datapath to always create and
> attach a vport at the same time, and to always detach and delete a vport
> at the same time, there is no longer any real distinction between a dp_port
> and a vport.  This commit, therefore, merges the two together to simplify
> code.  It might even improve performance, although I have not checked.
> I wasn't sure at first whether the merged structure should be "struct
> dp_port" or "struct vport".  I went with the latter since the "v" prefix
> sounds cool.
> Signed-off-by: Ben Pfaff <blp at nicira.com>

Thanks, this makes a lot of sense.  I wonder whether the name vport is
a good one since it is really just a port but it's short and easy to

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index b073a70..30117d2 100644
> @@ -382,31 +367,20 @@ static int new_dp_port(struct datapath *dp, struct odp_port *odp_port, int port_
>        if (IS_ERR(vport))
>                return PTR_ERR(vport);
> -       p = kzalloc(sizeof(*p), GFP_KERNEL);
> -       if (!p)
> -               return -ENOMEM;
> -
> -       p->port_no = port_no;
> -       p->dp = dp;
> -       p->vport = vport;
> -       atomic_set(&p->sflow_pool, 0);
> +       vport->dp = dp;
> +       vport->port_no = port_no;

You need a write memory barrier here.  Once the port is attached we
can start receiving packets from the vport.  There's no guarantee that
the cacheline containing the dp will make it before the pointer.  This
was previously done by the link from vport to dp_port.
rcu_assign_pointer() will do the job.

> @@ -538,7 +509,7 @@ void dp_process_received_packet(struct dp_port *p, struct sk_buff *skb)

You need a data dependency read memory barrier when going from vport
to dp.  rcu_dereference works.  This was previously done in

> diff --git a/datapath/dp_sysfs_if.c b/datapath/dp_sysfs_if.c
> index b40523a..ac86ac3 100644
>  #define to_brport_attr(_at) container_of(_at, struct brport_attribute, attr)
> -#define to_brport(obj) container_of(obj, struct dp_port, kobj)
> +#define to_brport(obj) container_of(obj, struct vport, kobj)

We should rename this to to_vport.

> diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c
> index a6e7262..b077a8a 100644
> -       dp_port = vport_get_dp_port(vport);
> -       if (dp_port)
> -               sprintf(info->bus_info, "%d.%d", dp_port->dp->dp_idx, dp_port->port_no);
> +       sprintf(info->bus_info, "%d.%d", vport->dp->dp_idx, vport->port_no);

There's a race here between device creation and dp getting assigned
(and this access is unlocked, I believe).

> diff --git a/datapath/vport-patch.c b/datapath/vport-patch.c
> index 3f2215c..5ea5e38 100644
> -       if (vport_get_dp_port(vport)) {
> -               hlist_del(&patch_vport->hash_node);
> -               rcu_assign_pointer(patch_vport->peer, vport_locate(patch_vport->peer_name));
> -               hlist_add_head(&patch_vport->hash_node, hash_bucket(patch_vport->peer_name));
> -       }
> +       hlist_del(&patch_vport->hash_node);
> +       rcu_assign_pointer(patch_vport->peer, vport_locate(patch_vport->peer_name));
> +       hlist_add_head(&patch_vport->hash_node, hash_bucket(patch_vport->peer_name));

This isn't quite right - the purpose of the vport_get_dp_port() call
isn't to actually get the dp_port (obviously it is never used).  It is
to distinguish between the create (unattached), modify (unattached),
and modify (attached) cases because the linking together of ports
happens at attach and we need to update it if we are modified and
already attached.

> diff --git a/datapath/vport.c b/datapath/vport.c
> index 2ae0053..2363802 100644
> @@ -47,13 +47,12 @@ static struct hlist_head *dev_table;
>  * one of these locks if you don't want the vport to be deleted out from under
>  * you.
>  *
> - * If you get a reference to a vport through a dp_port, it is protected
> + * If you get a reference to a vport through a datapath, it is protected
>  * by RCU and you need to hold rcu_read_lock instead when reading.
>  *
>  * If multiple locks are taken, the hierarchy is:
>  * 1. RTNL
> - * 2. DP
> - * 3. vport
> + * 2. vport
>  */
>  static DEFINE_MUTEX(vport_mutex);

It would be nice if we can simply this a little bit.  vport_mutex was
introduced because vports existed outside the context of a particular
datapath and therefore it wasn't possible to rely on dp_mutex to
protect them from disappearing.  However, now vports are also bound to
a single datapath so this is no longer the case.  Users of
vport_locate() are the only places where vport_mutex is the sole
source of protection.  They are:

* Places where we match a device name to the vport based on a
userspace request.  These are always scoped to a single datapath.  I
think we can convert the linked list in struct datapath to a hash
table and use that, protected by dp_mutex.
* Patch ports.  These actually cross datapaths and are somewhat
trickier.  However, we could create a global hash table of patch ports
inside of vport-patch.c, where it will avoid complicating other
things.  All changes here are protected by RTNL lock anyways.

If we dumped vport_mutex and the global vport table that would make
the locking much simpler.

>  /**
> - *     vport_attach - attach a vport to a datapath
> + *     vport_attach - notify a vport that it has been attached to a datapath
>  *
>  * @vport: vport to attach.
> - * @dp_port: Datapath port to attach the vport to.
>  *
> - * Attaches a vport to a specific datapath so that packets may be exchanged.
> - * Both ports must be currently unattached.  @dp_port must be successfully
> - * attached to a vport before it is connected to a datapath and must not be
> - * modified while connected.  RTNL lock and the appropriate DP mutex must be held.
> + * Performs vport-specific actions so that packets may be exchanged.  RTNL lock
> + * and the appropriate DP mutex must be held.
>  */
> -int vport_attach(struct vport *vport, struct dp_port *dp_port)
> +int vport_attach(struct vport *vport)

The value of attaching after creation seems pretty marginal at this
point.  The only thing that changes between vport_add() and
vport_attach() is the dp and port number are assigned.  I don't think
any vport cares about these so it would be easy to combine or make
additional arguments to the add call if necessary.

>  * @vport: vport to detach.
>  *
> - * Detaches a vport from a datapath.  May fail for a variety of reasons,
> - * including lack of memory.  RTNL lock and the appropriate DP mutex must be held.
> + * Performs vport-specific actions before a vport is detached from a datapath.
> + * May fail for a variety of reasons, including lack of memory.  RTNL lock and
> + * the appropriate DP mutex must be held.
>  */
>  int vport_detach(struct vport *vport)

Similar situation as vport_attach().  In this case the only difference
is the synchronize_rcu() call between vport_detach() and vport_del(),
which vports could do themselves without an issue.  It would also
simply things in tunnel.c since it would avoid some logic that is due
to the possibility that a port could be repeatedly attached and
detached without being reinitialized.  The only benefit that I see by
separating them is the possibility to reduce the number of
synchronize_rcu() calls: for example, on datapath deletion if we have
a lot of ports to delete we could detach all of them, do one
synchronize_rcu(), and then delete all of them.  On the other hand, if
the vport had direct control (and knowledge) it could probably replace
it with a call_rcu() instead and completely eliminate the

> diff --git a/datapath/vport.h b/datapath/vport.h
> index 8c9bb5f..18ea6d3 100644
> +/**
> + * struct vport - one port within a datapath
> + * @port_no: Index into @dp's @ports array.
> + * @dp: Datapath to which this port belongs.
> + * @kobj: Represents /sys/class/net/<devname>/brport.
> + * @linkname: The name of the link from /sys/class/net/<datapath>/brif to this
> + * &struct vport.  (We keep this around so that we can delete it if the
> + * device gets renamed.)  Set to the null string when no link exists.
> + * @node: Element in @dp's @port_list.
> + * @sflow_pool: Number of packets that were candidates for sFlow sampling,
> + * regardless of whether they were actually chosen and sent down to userspace.
> + */

While you're in here, it would be good to document the other members as well...

More information about the dev mailing list