[ovs-dev] [bug 2462] [PATCH v2] datapath: Increase maximum number of datapath ports.

Jesse Gross jesse at nicira.com
Mon Feb 13 23:56:06 UTC 2012


On Sun, Feb 12, 2012 at 4:47 AM, Pravin B Shelar <pshelar at nicira.com> wrote:
> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 220c7dd..3248fb4 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> +static struct hlist_head *hash_bucket(const struct datapath *dp, int id)

This name is not very descriptive in the context of this large file.

> +{
> +       unsigned int hash = hash_32(id, DP_VPORT_HASH_BITS);

It's not clear to me that it's actually all that useful to hash the
port number since the low order bits are actually fairly well
distributed (Linux does this with ifindex lookups).  In common cases
where there are a relatively small numbers of ports hashing will
actually increase the likelihood of collision.

> +struct vport *ovs_vport_rcu(const struct datapath *dp, int id)
> +{

It seems that this function really belongs in vport.c along with the
functions to lookup a vport by name (especially with a vport prefix).

>  static void __dp_destroy(struct datapath *dp)
> +       for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++) {
> +               struct vport *vport;
> +               struct hlist_node *n;
> +
> +               hlist_for_each_entry_rcu(vport, n, &dp->ports[i], dp_hash_node)

We're holding genl/RTNL, not RCU here.

> @@ -1754,27 +1786,25 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
[...]
> +               for (port_no = 1; port_no < DP_MAX_PORTS_COUNT; port_no++) {
> +                       vport = ovs_vport_rtnl_protected(dp, port_no);
>                        if (!vport)
>                                break;
>                }
> +               BUG_ON(vport);

I'm not sure that this BUG_ON provides much in the way of value.

> @@ -1936,32 +1966,34 @@ static int ovs_vport_cmd_dump(struct sk_buff *skb, struct netlink_callback *cb)
>  {
>        struct ovs_header *ovs_header = genlmsg_data(nlmsg_data(cb->nlh));
>        struct datapath *dp;
> -       u32 port_no;
> -       int retval;
> +       u32 port_idx = 0, skip = cb->args[0];
> +       int i;
>
>        dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
>        if (!dp)
>                return -ENODEV;
>
>        rcu_read_lock();
> -       for (port_no = cb->args[0]; port_no < DP_MAX_PORTS; port_no++) {
> +       for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++) {

You could store the hash buck in cb->args as well, similar to
ovs_flow_cmd_dump(), to avoid looping over as many things.

> diff --git a/datapath/datapath.h b/datapath/datapath.h
> index b012a76..d76222f 100644
> --- a/datapath/datapath.h
> +++ b/datapath/datapath.h
> +#define DP_MAX_PORTS_COUNT     (1024 * 64)
> +#define DP_VPORT_HASH_BITS     10
> +#define DP_VPORT_HASH_BUCKETS  (2 << DP_VPORT_HASH_BITS)

This should be a 1 left shifted by the number of bits.

> +struct vport *ovs_vport_rcu(const struct datapath *dp, int id);

It doesn't look to me like this function actually checks that RCU is
held (hlist_for_each_entry_rcu() uses rcu_dereference_raw()).

> +static inline struct vport *ovs_vport_rtnl_check(const struct datapath *dp, int id)

How about ovs_vport_rtnl_rcu()?

> +       WARN_ON(!rcu_read_lock_held() && !rtnl_is_locked());

This should probably be WARN_ON_ONCE.

> +static inline struct vport *ovs_vport_rtnl_protected(const struct datapath *dp, int id)

How about ovs_vport_rtnl()?

> diff --git a/datapath/flow.c b/datapath/flow.c
> index 823c6b5..c15f6f0 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -203,10 +203,10 @@ struct sw_flow_actions *ovs_flow_actions_alloc(const struct nlattr *actions)
>        int actions_len = nla_len(actions);
>        struct sw_flow_actions *sfa;
>
> -       /* At least DP_MAX_PORTS actions are required to be able to flood a
> +       /* At least DP_MAX_PORTS_COUNT actions are required to be able to flood a
>         * packet to every port.  Factor of 2 allows for setting VLAN tags,
>         * etc. */
> -       if (actions_len > 2 * DP_MAX_PORTS * nla_total_size(4))
> +       if (actions_len > 2 * DP_MAX_PORTS_COUNT * nla_total_size(4))

I don't think that it's reasonable to support flooding to every port
(plus vlans) as the number of ports increases.  With 64k ports, that
would allow a 1M action list which seems like a bad idea and also has
a decent chance of failing.

> @@ -1190,8 +1188,6 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>                                break;
>
>                        case OVS_KEY_ATTR_IN_PORT:
> -                               if (nla_get_u32(nla) >= DP_MAX_PORTS)
> -                                       return -EINVAL;

We use USHRT_MAX as a port not present indicator here, so we can't
just lift the restrictions on ports numbers in a 32-bit space without
changing it.  In addition, in_port in struct sw_flow_key (and
everywhere else) is 16-bit so you can get truncation.

> diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
> index c06bb89..30cfe2f 100644
> --- a/lib/dpif-linux.c
> +++ b/lib/dpif-linux.c
> +static int
>  dpif_linux_port_add(struct dpif *dpif_, struct netdev *netdev,
>                     uint16_t *port_nop)
[...]
> +    request.port_no = UINT32_MAX;
> +    upcall_pid = dpif_linux_port_get_pid(dpif_, request.port_no);
> +    request.upcall_pid = &upcall_pid;
> +    error = dpif_linux_vport_transact(&request, &reply, &buf);

Having a larger range of port numbers does not by itself address the
issues that the port allocator was trying to solve, which is to
prevent port numbers from being rapidly reused.  Since the kernel will
use the smallest unused port number it will reuse port numbers unless
userspace requests an explicit one.  It also seems wasteful to ask the
kernel to allocate a port number and then have to go back to update
the port based on that port number.

OpenFlow port numbers are 16-bit but the ones allocated by the kernel
are 32-bit, which means that they will get truncated when returned by
this function.  Since the kernel only allows 16-bits of port numbers
and uses dense assignment this won't be an immediate problem.
However, since this behavior is neither documented nor something that
we want to maintain in the future, it's not something that we should
rely on.

Also, please remember that the solution that you come up with must
work with both the old and new versions of the kernel module.



More information about the dev mailing list