[ovs-dev] [PATCH 1/3] netdev: Fully handle netdev lifecycle through refcounting.

Ben Pfaff blp at nicira.com
Wed Jan 13 00:02:14 UTC 2010


On Tue, Jan 12, 2010 at 05:29:13PM -0500, Jesse Gross wrote:
> This builds on earlier work that implemented netdev object refcounting.
> However, rather than requiring explicit create and destroy calls,
> these operations are now performed automatically based on the referenece
> count.  This is important because in certain situations it is not
> possible to know whether a netdev has already been created.  A
> workaround existed (which looked fairly similar to this paradigm) but
> introduced it's own issues.  This simplifies and unifies the API.

Thanks for doing this, Jesse.  Comments below, file-by-file.

----------------------------------------------------------------------

lib/netdev.c:

In create_device(), I think that this error message is bound to
eventually confuse a user.  If it tells me that "eth0" does not exist,
then my first thought is that there's something wrong with my system's
network stack, not that some bit of code neglected to set the
"may_create" flag:

    if (!options->may_create) {
        VLOG_WARN("attempted to open a device that does not exist: %s",
                  options->name);
        return ENODEV;
    }

Also in create_device(), I think that the first options->type here
should be options->name:

    VLOG_WARN("could not create netdev %s of unknown type: %s", options->type,
                                                                options->type);
    return EINVAL;

> static uint32_t
> shash_hash(const struct shash *shash)

Could this be renamed?  Its current name makes me think it should be in
lib/shash.c.

> {
>     int hash = 0;
>     struct shash_node *node;
>     uint32_t entry_hash;
>
>     SHASH_FOR_EACH(node, shash) {
>         entry_hash = hash_string(node->name, 0);
>         entry_hash ^= hash_string(node->data, 10);
>         hash ^= hash_int(entry_hash, 0);

There's no need to use hash_int() here, since 'entry_hash' is already
pseudorandom and scrambling it further doesn't help.

>     }
>
>     return hash;
> }

Some of the error paths in netdev_open() do not set *netdevp to NULL.
I'd suggest setting it to NULL unconditionally as the first action in
the function and then only ever setting it nonnull on success.  That
tends to be less error-prone in my experience.

> void
> netdev_dev_uninit(struct netdev_dev *netdev_dev)
> {
>     struct shash_node *node;
>     char *name = (char *)netdev_dev_get_name(netdev_dev);

I'd prefer the simpler "char *name = netdev_dev->name;" here.  Using a
cast makes it look like some kind of trickery is going on.

>     node = shash_find(&netdev_dev_shash, name);
>     assert(node);

This is kind of awkward.  Did you consider either using an hmap directly
(instead of an shash) or storing a code of the "shash_node *" in the
netdev_dev structure, so that you didn't have to do another search here?

>     shash_delete(&netdev_dev_shash, node);
>     free(name);
> }

Most netdev functions take a 'struct netdev', is there a reason why
netdev_reconfigure() takes a name instead:

> int
> netdev_reconfigure(const char *name, const struct shash *args)

I'm also a little nervous about assuming that if the hash is the same
then there's no reason to reconfigure at all:

>     if (netdev_dev->class->reconfigure) {
>         uint32_t args_hash = shash_hash(args);
>
>         if (netdev_dev->args_hash != args_hash) {
>             netdev_dev->args_hash = args_hash;

Also, if 'netdev_dev->class->reconfigure' returns an error code, we
still update the hash.  I'm not sure whether that's correct or not.

>             return netdev_dev->class->reconfigure(netdev_dev, args);
>         }
>     }

I think that netdev_close() got overlooked a bit:

> /* Closes and destroys 'netdev'. */
> void
> netdev_close(struct netdev *netdev)
> {
>     if (netdev) {
>         struct shash_node *node;
>         struct netdev_dev *netdev_dev;
>         char *name = (char *)netdev_get_name(netdev);
>
>         node = shash_find(&netdev_dev_shash, name);
>         assert(node);
>         netdev_dev = node->data;

Can't the previous 3 lines just be "netdev_dev = netdev->netdev_dev;"?
Or use netdev_get_obj(netdev)?

>         netdev_dev->ref_cnt--;
>         netdev_uninit(netdev);
>         netdev_get_obj(netdev)->class->close(netdev);

Isn't the netdev_get_obj(netdev) call here redundant with the netdev_dev
local variable?

>         /* If the reference count for the netdev device is zero, destroy it. */
>         if (!netdev_dev->ref_cnt) {
>             shash_delete(&netdev_dev_shash, node);
>             netdev_dev->class->destroy(netdev_dev);
>             free(name);

I think that the shash_delete() and free(name) could be done by calling
netdev_dev_uninit() instead (and then 'name' could be const).

>         }
>     }
> }

It's a slightly unusual style to initialize a refcount to 0, as in
netdev_dev_init(), and then increment it later to 1.

netdev_dev_uninit() and netdev_uninit() are exported in case a netdev
class needs to initialize and then uninitialize a netdev(_dev) without
letting the netdev library get a hold on it in the meantime.  But do you
foresee a situation where that would be necessary?  I think that
normally a class could simply call netdev(_dev)_init as its very last
step, so that this could not come up.

There are still several instances of "obj" in this file, even though the
struct was renamed.

----------------------------------------------------------------------

netdev-provider.h:

This comment now seems wrong, since there is no longer any
netdev_create() function:

    struct netdev_class {
        /* Type of netdevs in this class, e.g. "system", "tap", "gre", etc.
         *
         * One of the providers should supply a "system" type, since this is
         * the type assumed when a device name was not bound through the 
         * netdev_create() call.  The "system" type corresponds to an 
         * existing network device on the system. */

The commit deletes parameter names from function prototypes, even though
those parameter names are used in the comments that describe those
functions.

It's not your fault, but the get_vlan_vid function comment refers to a
'netdev_name' parameter even though its parameter is actually a netdev
these days.

----------------------------------------------------------------------

netdev-linux.c:

The cache_map now seems to be a subset of the netdev_dev_shash.  Is that
right, and if so would it make sense to have a way to avoid the
duplication?

Would it make sense to fold the members of struct netdev_linux_cache
into netdev_dev_linux?  We have a lot of structs floating around now, so
it might be nice to get rid of one.

> static int
> netdev_linux_create_system(const char *name, const char *type UNUSED,
>                     const struct shash *args)
> {
>     struct netdev_dev_linux *netdev_dev;
>     int error;
>
>     if (!shash_is_empty(args)) {
>         VLOG_WARN("arguments for system devices should be empty");
>     }

Could you include 'name' in the log message, please?

>     netdev_dev = xzalloc(sizeof *netdev_dev);
>
>     if (shash_is_empty(&cache_map)) {
>         error = rtnetlink_notifier_register(&netdev_linux_cache_notifier,
>                                             netdev_linux_cache_cb, NULL);
>         if (error) {
>             free(netdev_dev);
>             return error;
>         }
>     }

netdev_dev isn't used until after the "if" block, so you can move the
allocation past it and then skip the "free" if there's an error.

----------------------------------------------------------------------

ofproto.c:

Why did the init_ports() call move?  Why is it now called once per
program execution instead of once per ofproto?  (A vswitch might have
lots of ofprotos.)

----------------------------------------------------------------------

bridge.c:

> @@ -1075,10 +1114,11 @@ bridge_create(const char *name)
>      error = dpif_create_and_open(name, &br->dpif);
>      if (error) {
>          free(br);
>          return NULL;
>      }
> +    flush_ports(name, br->dpif);
>      dpif_flow_flush(br->dpif);
 
>      error = ofproto_create(name, &bridge_ofhooks, br, &br->ofproto);
>      if (error) {
>          VLOG_ERR("failed to create switch %s: %s", name, strerror(error));

Deleting all of the ports on creating a bridge is pretty nasty.  Why do
we have to do that?




More information about the dev mailing list