[ovs-dev] [PATCH] netdev-dpdk: add sflow support for vhost-user ports

Lal, PrzemyslawX przemyslawx.lal at intel.com
Thu Jun 16 11:16:00 UTC 2016


Hi Mark,

RFC 2863 standard describes in "Interface Numbering" chapter (https://tools.ietf.org/html/rfc2863#page-10) that if there's possibility to reuse ifindex after reinitialization then it should be reused. And in our case it's definitely possible. I also agree with you that limitation to 1024 vhostuser interfaces is a major problem here and I should avoid such arbitrary values. I think the solution might be to use hashmap or OVS list implementation or maybe even dynamic allocation of new vhost ifindex entries.

I have also analyzed current vhost-cuse initialization and adding support for ifindex assignment to this type of port should be a trivial task and it could be easily introduced in V2 of this patch.

Please share your thoughts.

Thanks,
Przemek

-----Original Message-----
From: Kavanagh, Mark B 
Sent: Wednesday, May 11, 2016 11:56 AM
To: Lal, PrzemyslawX <przemyslawx.lal at intel.com>; dev at openvswitch.org
Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user ports

>
>Hi Mark,
>
>Replies inline prefixed with [PL].
>
>Thanks,
>Przemek
>
>-----Original Message-----
>From: Kavanagh, Mark B
>Sent: Thursday, May 5, 2016 2:19 PM
>To: Lal, PrzemyslawX <przemyslawx.lal at intel.com>; dev at openvswitch.org
>Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user 
>ports
>
>Hi Przemek,
>
>Some additional comments/queries inline.
>
>Thanks again,
>Mark
>
>>
>>This patch adds sFlow support for DPDK vHost-user interfaces by 
>>assigning ifindex value. Values of ifindexes for vHost-user interfaces 
>>start with 2000 to avoid overlapping with kernel datapath interfaces.
>>
>>Patch also fixes issue with 'dpdk0' interface being ignored by sFlow 
>>agent, because of ifindex 0. Ifindexes values for physical DPDK 
>>interfaces start with 1000 to avoid overlapping with kernel datapath interfaces.
>>
>>Signed-off-by: Przemyslaw Lal <przemyslawx.lal at intel.com>
>>---
>> lib/netdev-dpdk.c | 70
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 69 insertions(+), 1 deletion(-)
>>
>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 
>>208c5f5..1b60c5a 100644
>>--- a/lib/netdev-dpdk.c
>>+++ b/lib/netdev-dpdk.c
>>@@ -115,6 +115,18 @@ static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets
>>*/
>>  */
>> #define VHOST_ENQ_RETRY_USECS 100
>>
>>+/* For DPDK ETH interfaces use ifindex values starting with 1000
>>+ * to avoid overlapping with kernel-space interfaces.
>>+ * Also starting with 0 would cause sFlow to ignore 'dpdk0' interface.
>>+ */
>>+#define DPDK_PORT_ID_TO_IFINDEX(port_id) ((port_id) + 1000)
>>+
>>+/* For DPDK vhost-user interfaces use ifindexes starting with 2000.
>>+ */
>>+#define VHOST_ID_TO_IFINDEX(port_id) ((port_id) + 2000)
>>+
>>+#define VHOST_IDS_MAX_LEN 1024
>>+
>> static const struct rte_eth_conf port_conf = {
>>     .rxmode = {
>>         .mq_mode = ETH_MQ_RX_RSS,
>>@@ -149,6 +161,7 @@ enum dpdk_dev_type {  static int rte_eal_init_ret 
>>= ENODEV;
>>
>> static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
>>+static struct ovs_mutex vhost_mutex = OVS_MUTEX_INITIALIZER;
>>
>> /* Quality of Service */
>>
>>@@ -790,6 +803,50 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
>>     return err;
>> }
>>
>>+/* Counter for VHOST interfaces as DPDK library doesn't provide 
>>+mechanism
>>+ * similar to rte_eth_dev_count for vhost-user sockets.
>>+ */
>>+static int vhost_counter OVS_GUARDED_BY(vhost_mutex) = 0;
>>+
>>+/* Array storing vhost_ids, so their ifindex can be reused after 
>>+socket
>>+ * recreation.
>>+ */
>>+static char vhost_ids[VHOST_IDS_MAX_LEN][PATH_MAX]
>>+OVS_GUARDED_BY(vhost_mutex);
>>+
>>+/* Simple lookup in vhost_ids array.
>>+ * On success returns id of vhost_id stored in the array,
>>+ * otherwise returns -1.
>>+ */
>>+static int
>>+netdev_dpdk_lookup_vhost_id(struct netdev_dpdk *dev)
>>+OVS_REQUIRES(vhost_mutex) {
>>+    for (int i = 0; i < vhost_counter; i++) {
>>+        if (strncmp(vhost_ids[i], dev->vhost_id, strlen(dev->vhost_id)) == 0) {
>>+            return i;
>>+        }
>>+    }
>>+    return -1;
>>+}
>>+
>>+/* Inserts vhost_id at the first free position in the vhost_ids array.
>>+ */
>>+static void
>>+netdev_dpdk_push_vhost_id(struct netdev_dpdk *dev) {
>>+    ovs_mutex_lock(&vhost_mutex);
>>+    if (netdev_dpdk_lookup_vhost_id(dev) < 0) {
>>+        if (vhost_counter < VHOST_IDS_MAX_LEN) {
>>+            strncpy(vhost_ids[vhost_counter++], dev->vhost_id,
>>+                    strlen(dev->vhost_id));
>>+        } else {
>>+            VLOG_WARN("Could not assign ifindex to \"%s\" port. "
>>+                      "List of vhost IDs list is full.",
>>+                      dev->vhost_id);
>>+        }
>>+    }
>>+    ovs_mutex_unlock(&vhost_mutex);
>>+}
>>+
>> static int
>> netdev_dpdk_vhost_user_construct(struct netdev *netdev)  { @@ -825,6
>>+882,8 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>         err = vhost_construct_helper(netdev);
>>     }
>>
>>+    netdev_dpdk_push_vhost_id(dev);
>>+
>>     ovs_mutex_unlock(&dpdk_mutex);
>>     return err;
>> }
>>@@ -1773,9 +1832,18 @@ netdev_dpdk_get_ifindex(const struct netdev
>>*netdev)  {
>>     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>     int ifindex;
>>+    int ret;
>>
>>     ovs_mutex_lock(&dev->mutex);
>>-    ifindex = dev->port_id;
>>+    if (dev->type == DPDK_DEV_ETH) {
>>+        ifindex = DPDK_PORT_ID_TO_IFINDEX(dev->port_id);
>>+    } else {
>
>
>This 'else' statement is executed in both the case of vhost user 
>devices and also vhost cuse devices.
>In the latter case, do the port numbers returned by VHOST_ID_IFINDEX 
>make sense? (i.e. can sFlow 'talk' to vhost-cuse ports? If not, then this code has a bug).
>
>[PL] At this point for vhost cuse there's always '-1' returned from 
>lookup function, so in the nested if statement you'll always hit 'else' 
>and 0 will be returned as ifindex for vhost cuse devices.

Agreed. But then all vhost-cuse devices share index of 0 - this is surely problematic?

>Anyway setting ifindexes for vhost cuse devices should be a trivial 
>task and if vhost cuse is supported by sflow I can handle it - but then 
>it will require changing patch subject and message.

Have you tested this patch against vhost-cuse? I'd be interested as to how it behaves.

>
>>+        if ((ret = netdev_dpdk_lookup_vhost_id(dev)) >= 0) {
>>+            ifindex = VHOST_ID_TO_IFINDEX(ret);
>>+        } else {
>>+            ifindex = 0;
>
>I don't think that 0 is an appropriate value to return here, since it 
>overlaps with a kernel interface's index.
>
>[PL] IMHO it's ok to use 0 here - it will be assigned to ifindex only 
>when lookup for vhost device failed.

Have you seen Ben's comment on this? http://openvswitch.org/pipermail/dev/2016-May/070626.html 

>
>>+        }
>>+    }
>>     ovs_mutex_unlock(&dev->mutex);
>>
>>     return ifindex;
>>--
>>2.1.0
>
>[MK] There's a gap in this code - when the vhost-user device is 
>destroyed, its entry should be removed from 'vhost_ids'.
>
>[PL] Actually the whole idea behind "caching" these vhost_ids is to 
>have them stored to survive destroy and recreation of vhost devices, 
>for instance when VM is hard-rebooted, to keep the same ifindex and not 
>assign a new one, which could be confusing (for example one could have 
>vhu1 port with ifindex 2001 visible in sFlow stats, but then 
>hard-reboots this VM and vhu1 gets ifindex 2002 - it's the same vhost 
>user socket, the same VM, but somehow it gets new ifindex and gets new entry in sflow). But if you think I can be improve this, don't hesitate to share your ideas.

I completely understand what your intention is here. There is still an inherent issue with the design though, in that once vhost ports have been added to 'vhost_ids', they are never removed.

Consider the following scenario:
- multiple vhost ports have been added to the bridge, such that vhost_counter = VHOST_IDS_MAX_LEN (I'm not sure what the use case for adding 1024 ports to a bridge would be, but I digress)
- one or more of the previously-added ports are no longer needed, so we bring the VMs that they are attached to down
- we want to add another vhost port, described by a vhost_id that is not already present in 'vhost_ids', and connect it to a VM => Now, however, we cannot add the additional port, due to the fact that vhost_counter was never decremented, such that it is still equal to VHOST_IDS_MAX_LEN: the bounds check in netdev_dpdk_push_vhost_id fails, and the user is warned that their port cannot be added.



More information about the dev mailing list