[ovs-dev] [netlink 02/16] datapath: Make adding and attaching a vport a single step.

Fri Sep 24 02:10:58 UTC 2010

On Fri, Sep 10, 2010 at 3:55 PM, Ben Pfaff <blp at nicira.com> wrote:
> For some time now, Open vSwitch datapaths have internally made a
> distinction between adding a vport and attaching it to a datapath.  Adding
> a vport just means to create it, as an entity detached from any datapath.
> Attaching it gives it a port number and a datapath.  Similarly, a vport
> could be detached and deleted separately.
>
> After some study, I think I understand why this distinction exists.  It is
> because ovs-vswitchd tries to open all the datapath ports before it tries
> to create them.  However, changing it to create them before it tries to
> open them is not difficult, so this commit does this.
>
> The bulk of this commit, however, changes the datapath interface to one
> that always creates a vport and attaches it to a datapath in a single step,
> and similarly detaches a vport and deletes it in a single step.

Thanks for unifying this.  It's definitely the right way to do it.

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 5ee9157..b073a70 100644
> -       vport = vport_locate(odp_port->devname);
> -       if (!vport) {
> -               vport_lock();
> -
> -               if (odp_port->flags & ODP_PORT_INTERNAL)
> -                       vport = vport_add(odp_port->devname, "internal", NULL);
> -               else
> -                       vport = vport_add(odp_port->devname, "netdev", NULL);
> -
> -               vport_unlock();
> +       vport_lock();
> +       vport = vport_add(odp_port);
> +       vport_unlock();

It's great to get rid of this compatibility shim.  The kernel code looks good.

> diff --git a/include/openvswitch/datapath-protocol.h b/include/openvswitch/datapath-protocol.h
> index 5759f1e..95cab47 100644
> -#define ODP_VPORT_ADD           _IOR('O', 21, struct odp_vport_add)
> -#define ODP_VPORT_MOD           _IOR('O', 22, struct odp_vport_mod)
> -#define ODP_VPORT_DEL           _IO('O', 23)
> +#define ODP_VPORT_MOD           _IOR('O', 22, struct odp_port)
>  #define ODP_VPORT_STATS_GET     _IOWR('O', 24, struct odp_vport_stats_req)
>  #define ODP_VPORT_ETHER_GET     _IOWR('O', 25, struct odp_vport_ether)
>  #define ODP_VPORT_ETHER_SET     _IOW('O', 26, struct odp_vport_ether)

I think these names could be unified a little bit.  Now that the steps
to create and attach ports/vports have been merged it's not clear why
some refer to ports and some refer to vports.

> -#define ODP_PORT_INTERNAL (1 << 0) /* This port is simulated. */
> +#define VPORT_TYPE_SIZE     16
> +#define VPORT_CONFIG_SIZE     32

I'm not sure that 32 bytes is sufficient.  What happens if we had an
IPv6 tunnel?

>  struct odp_port {
>     char devname[16];           /* IFNAMSIZ */
> +    char type[VPORT_TYPE_SIZE];
>     uint16_t port;
> -    uint16_t flags;
> +    uint16_t reserved1;
>     uint32_t reserved2;
> +    uint32_t config[VPORT_CONFIG_SIZE / 4]; /* type-specific configuration */

Why is this an array of uint32_t with the size divided by 4 instead of
an array of uint8_t?

> diff --git a/include/openvswitch/tunnel.h b/include/openvswitch/tunnel.h
> index 3737975..bbe5e6f 100644
> --- a/include/openvswitch/tunnel.h
> +++ b/include/openvswitch/tunnel.h
> +/* This goes in the "config" member of struct odp_port for tunnel ("gre" and
> + * "capwap") vports. */
>  struct tnl_port_config {
>        __u32   flags;
>        __be32  saddr;

I would prefer if we didn't enumerate the tunnel types here.  We
shouldn't have to change the interface (even if it is just a comment)
when we add a new type.

> diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
> index 2c688e3..7f5397d 100644
>  dpif_linux_destroy(struct dpif *dpif_)
>  {
> -    struct odp_port *ports;
> -    size_t n_ports;
> -    int err;
> -    int i;
> -
> -    err = dpif_port_list(dpif_, &ports, &n_ports);
> -    if (err) {
> -        return err;
> -    }
> -
> -    for (i = 0; i < n_ports; i++) {
> -        if (ports[i].port != ODPP_LOCAL) {
> -            err = do_ioctl(dpif_, ODP_VPORT_DEL, ports[i].devname);
> -            if (err) {
> -                VLOG_WARN_RL(&error_rl, "%s: error deleting port %s (%s)",
> -                             dpif_name(dpif_), ports[i].devname, strerror(err));
> -            }
> -        }
> -    }
> -
> -    free(ports);
> -

This is great to get rid of this cleanup code.  I think there were
still a few corner cases before in the event of crashes.

>  static int
> -dpif_linux_port_add(struct dpif *dpif_, const char *devname, uint16_t flags,
> -                    uint16_t *port_no)
> +dpif_linux_port_add(struct dpif *dpif, const char *devname, const char *type,
> +                    const struct shash *args, uint16_t *port_nop)

I find the configuration to be somewhat inconsistent with respect to
what is done in the dpif and what is done in the netdev (i.e. initial
configuration is done here, reconfiguration is done through the
netdev).

What if we did this instead:
* All configuration is done through the netdev, like before.
* The netdev is created first and rather than passing in the name to
the dpif, a pointer to the netdev is passed instead.
* In general we would have a tighter coupling between netdevs and
dpifs.  I think this is almost certain to happen anyways as we branch
out into different, more specialized platforms since, for example, a
hardware switching chip isn't going to have interfaces for ports that
are separate from the datapath.
* In the case of vports, the netdev could just parse and store the
config but hold onto it until the port is added, allowing it to be
done in one step.

> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 323f364..fa50311 100644
> +    } else {
> +        VLOG_WARN("%s: unsupported vport type %s", devname, type);
> +        return EINVAL;
> +    }

I would probably just say "unsupported port type" here rather than
vport.  I'm actually not sure whether there is much value to calling
something a vport now that it includes all ports but at that very
least it would seem to only refer to the kernel layer and its
interfaces.

> +
> +    if (args && !shash_is_empty(args)) {
> +        VLOG_WARN("%s: arguments for %s vports should be empty",
> +                  devname, type);
> +    }

Same here.

> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 575b5b2..80712e3 100644
> +    if (type[0] == '\0' || !strcmp(type, "system")) {
> +        strncpy(port->type, "netdev", sizeof port->type);
> +    } else {

I wonder if we can require ports to have a type, that is, drop the
implicit empty equals "system" mapping.  Actual people will use
ovs-vsctl and that can keep the same interface.  For programmatic
access, I don't see much benefit in allowing the implicit behavior.

> diff --git a/lib/netdev-vport.h b/lib/netdev-vport.h
> index e03ea42..582eb7f 100644
> --- a/lib/netdev-vport.h
> +++ b/lib/netdev-vport.h
> +int netdev_vport_parse_config(struct odp_port *, const char *devname,
> +                              const char *type, const struct shash *args);
> +

This seems like a weird thing to me to split out, as I mentioned in
the comments about dpif-linux.c.

> diff --git a/utilities/ovs-dpctl.8.in b/utilities/ovs-dpctl.8.in
> index b7965bc..03420b7 100644
>  .RS
> +.IP "\fBtype=\fItype\fR"
> +Specifies the type of port to add.  The default type is \fBsystem\fR.
>  .IP "\fBinternal\fR"
> -Instead of attaching an existing \fInetdev\fR, creates an internal
> -port (analogous to the local port) with that name.
> +Equivalent to \fBtype=internal\fR.

Does anyone use this shortcut to create internal interfaces?  Can we
just drop it in favor of the more general method?

> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 598b001..4ab66ef 100644
> +                type = (iface_is_internal(br, if_name)
> +                        ? "internal"
> +                        : iface->cfg->type);
> +

Is it possible to require that internal interfaces actually have a
type of "internal"?

> +                shash_init(&options);
> +                for (i = 0; i < iface->cfg->n_options; i++) {
> +                    shash_add(&options, iface->cfg->key_options[i],
> +                              iface->cfg->value_options[i]);
> +                }

There's also a function set_up_iface(), which does a similar
collection of options.  It would be great to combine these (this
probably relates to the differences in creation vs. reconfigure as
well).