[ovs-dev] [PATCH] netdev-dpdk: add sflow support for vhost-user ports

Kavanagh, Mark B mark.b.kavanagh at intel.com
Wed Jun 22 11:57:07 UTC 2016


>
>Hi Mark,
>
>RFC 2863 standard describes in "Interface Numbering" chapter
>(https://tools.ietf.org/html/rfc2863#page-10) that if there's possibility to reuse ifindex
>after reinitialization then it should be reused.

[MK] Very interesting - thanks for the link.

 And in our case it's definitely possible. I
>also agree with you that limitation to 1024 vhostuser interfaces is a major problem here and
>I should avoid such arbitrary values. I think the solution might be to use hashmap or OVS
>list implementation or maybe even dynamic allocation of new vhost ifindex entries.

[MK]  Do you think that the issues relating to ifIndex/ifNumber are only applicable to DPDK-based vhost and ring ports?
	i.e. does OVS handle kernel-based ifIndexes gracefully currently, or is this a larger issue that requires a more general solution? 
>
>I have also analyzed current vhost-cuse initialization and adding support for ifindex
>assignment to this type of port should be a trivial task and it could be easily introduced in
>V2 of this patch.

[MK] Okay great - I look forward to it.

>
>Please share your thoughts.
>
>Thanks,
>Przemek
>
>-----Original Message-----
>From: Kavanagh, Mark B
>Sent: Wednesday, May 11, 2016 11:56 AM
>To: Lal, PrzemyslawX <przemyslawx.lal at intel.com>; dev at openvswitch.org
>Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user ports
>
>>
>>Hi Mark,
>>
>>Replies inline prefixed with [PL].
>>
>>Thanks,
>>Przemek
>>
>>-----Original Message-----
>>From: Kavanagh, Mark B
>>Sent: Thursday, May 5, 2016 2:19 PM
>>To: Lal, PrzemyslawX <przemyslawx.lal at intel.com>; dev at openvswitch.org
>>Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user
>>ports
>>
>>Hi Przemek,
>>
>>Some additional comments/queries inline.
>>
>>Thanks again,
>>Mark
>>
>>>
>>>This patch adds sFlow support for DPDK vHost-user interfaces by
>>>assigning ifindex value. Values of ifindexes for vHost-user interfaces
>>>start with 2000 to avoid overlapping with kernel datapath interfaces.
>>>
>>>Patch also fixes issue with 'dpdk0' interface being ignored by sFlow
>>>agent, because of ifindex 0. Ifindexes values for physical DPDK
>>>interfaces start with 1000 to avoid overlapping with kernel datapath interfaces.
>>>
>>>Signed-off-by: Przemyslaw Lal <przemyslawx.lal at intel.com>
>>>---
>>> lib/netdev-dpdk.c | 70
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 69 insertions(+), 1 deletion(-)
>>>
>>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>208c5f5..1b60c5a 100644
>>>--- a/lib/netdev-dpdk.c
>>>+++ b/lib/netdev-dpdk.c
>>>@@ -115,6 +115,18 @@ static char *vhost_sock_dir = NULL;   /* Location of vhost-user
>sockets
>>>*/
>>>  */
>>> #define VHOST_ENQ_RETRY_USECS 100
>>>
>>>+/* For DPDK ETH interfaces use ifindex values starting with 1000
>>>+ * to avoid overlapping with kernel-space interfaces.
>>>+ * Also starting with 0 would cause sFlow to ignore 'dpdk0' interface.
>>>+ */
>>>+#define DPDK_PORT_ID_TO_IFINDEX(port_id) ((port_id) + 1000)
>>>+
>>>+/* For DPDK vhost-user interfaces use ifindexes starting with 2000.
>>>+ */
>>>+#define VHOST_ID_TO_IFINDEX(port_id) ((port_id) + 2000)
>>>+
>>>+#define VHOST_IDS_MAX_LEN 1024
>>>+
>>> static const struct rte_eth_conf port_conf = {
>>>     .rxmode = {
>>>         .mq_mode = ETH_MQ_RX_RSS,
>>>@@ -149,6 +161,7 @@ enum dpdk_dev_type {  static int rte_eal_init_ret
>>>= ENODEV;
>>>
>>> static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
>>>+static struct ovs_mutex vhost_mutex = OVS_MUTEX_INITIALIZER;
>>>
>>> /* Quality of Service */
>>>
>>>@@ -790,6 +803,50 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
>>>     return err;
>>> }
>>>
>>>+/* Counter for VHOST interfaces as DPDK library doesn't provide
>>>+mechanism
>>>+ * similar to rte_eth_dev_count for vhost-user sockets.
>>>+ */
>>>+static int vhost_counter OVS_GUARDED_BY(vhost_mutex) = 0;
>>>+
>>>+/* Array storing vhost_ids, so their ifindex can be reused after
>>>+socket
>>>+ * recreation.
>>>+ */
>>>+static char vhost_ids[VHOST_IDS_MAX_LEN][PATH_MAX]
>>>+OVS_GUARDED_BY(vhost_mutex);
>>>+
>>>+/* Simple lookup in vhost_ids array.
>>>+ * On success returns id of vhost_id stored in the array,
>>>+ * otherwise returns -1.
>>>+ */
>>>+static int
>>>+netdev_dpdk_lookup_vhost_id(struct netdev_dpdk *dev)
>>>+OVS_REQUIRES(vhost_mutex) {
>>>+    for (int i = 0; i < vhost_counter; i++) {
>>>+        if (strncmp(vhost_ids[i], dev->vhost_id, strlen(dev->vhost_id)) == 0) {
>>>+            return i;
>>>+        }
>>>+    }
>>>+    return -1;
>>>+}
>>>+
>>>+/* Inserts vhost_id at the first free position in the vhost_ids array.
>>>+ */
>>>+static void
>>>+netdev_dpdk_push_vhost_id(struct netdev_dpdk *dev) {
>>>+    ovs_mutex_lock(&vhost_mutex);
>>>+    if (netdev_dpdk_lookup_vhost_id(dev) < 0) {
>>>+        if (vhost_counter < VHOST_IDS_MAX_LEN) {
>>>+            strncpy(vhost_ids[vhost_counter++], dev->vhost_id,
>>>+                    strlen(dev->vhost_id));
>>>+        } else {
>>>+            VLOG_WARN("Could not assign ifindex to \"%s\" port. "
>>>+                      "List of vhost IDs list is full.",
>>>+                      dev->vhost_id);
>>>+        }
>>>+    }
>>>+    ovs_mutex_unlock(&vhost_mutex);
>>>+}
>>>+
>>> static int
>>> netdev_dpdk_vhost_user_construct(struct netdev *netdev)  { @@ -825,6
>>>+882,8 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>         err = vhost_construct_helper(netdev);
>>>     }
>>>
>>>+    netdev_dpdk_push_vhost_id(dev);
>>>+
>>>     ovs_mutex_unlock(&dpdk_mutex);
>>>     return err;
>>> }
>>>@@ -1773,9 +1832,18 @@ netdev_dpdk_get_ifindex(const struct netdev
>>>*netdev)  {
>>>     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>     int ifindex;
>>>+    int ret;
>>>
>>>     ovs_mutex_lock(&dev->mutex);
>>>-    ifindex = dev->port_id;
>>>+    if (dev->type == DPDK_DEV_ETH) {
>>>+        ifindex = DPDK_PORT_ID_TO_IFINDEX(dev->port_id);
>>>+    } else {
>>
>>
>>This 'else' statement is executed in both the case of vhost user
>>devices and also vhost cuse devices.
>>In the latter case, do the port numbers returned by VHOST_ID_IFINDEX
>>make sense? (i.e. can sFlow 'talk' to vhost-cuse ports? If not, then this code has a bug).
>>
>>[PL] At this point for vhost cuse there's always '-1' returned from
>>lookup function, so in the nested if statement you'll always hit 'else'
>>and 0 will be returned as ifindex for vhost cuse devices.
>
>Agreed. But then all vhost-cuse devices share index of 0 - this is surely problematic?
>
>>Anyway setting ifindexes for vhost cuse devices should be a trivial
>>task and if vhost cuse is supported by sflow I can handle it - but then
>>it will require changing patch subject and message.
>
>Have you tested this patch against vhost-cuse? I'd be interested as to how it behaves.
>
>>
>>>+        if ((ret = netdev_dpdk_lookup_vhost_id(dev)) >= 0) {
>>>+            ifindex = VHOST_ID_TO_IFINDEX(ret);
>>>+        } else {
>>>+            ifindex = 0;
>>
>>I don't think that 0 is an appropriate value to return here, since it
>>overlaps with a kernel interface's index.
>>
>>[PL] IMHO it's ok to use 0 here - it will be assigned to ifindex only
>>when lookup for vhost device failed.
>
>Have you seen Ben's comment on this? http://openvswitch.org/pipermail/dev/2016-
>May/070626.html
>
>>
>>>+        }
>>>+    }
>>>     ovs_mutex_unlock(&dev->mutex);
>>>
>>>     return ifindex;
>>>--
>>>2.1.0
>>
>>[MK] There's a gap in this code - when the vhost-user device is
>>destroyed, its entry should be removed from 'vhost_ids'.
>>
>>[PL] Actually the whole idea behind "caching" these vhost_ids is to
>>have them stored to survive destroy and recreation of vhost devices,
>>for instance when VM is hard-rebooted, to keep the same ifindex and not
>>assign a new one, which could be confusing (for example one could have
>>vhu1 port with ifindex 2001 visible in sFlow stats, but then
>>hard-reboots this VM and vhu1 gets ifindex 2002 - it's the same vhost
>>user socket, the same VM, but somehow it gets new ifindex and gets new entry in sflow). But
>if you think I can be improve this, don't hesitate to share your ideas.
>
>I completely understand what your intention is here. There is still an inherent issue with
>the design though, in that once vhost ports have been added to 'vhost_ids', they are never
>removed.
>
>Consider the following scenario:
>- multiple vhost ports have been added to the bridge, such that vhost_counter =
>VHOST_IDS_MAX_LEN (I'm not sure what the use case for adding 1024 ports to a bridge would be,
>but I digress)
>- one or more of the previously-added ports are no longer needed, so we bring the VMs that
>they are attached to down
>- we want to add another vhost port, described by a vhost_id that is not already present in
>'vhost_ids', and connect it to a VM => Now, however, we cannot add the additional port, due
>to the fact that vhost_counter was never decremented, such that it is still equal to
>VHOST_IDS_MAX_LEN: the bounds check in netdev_dpdk_push_vhost_id fails, and the user is
>warned that their port cannot be added.



More information about the dev mailing list