[ovs-dev] [PATCH] netdev-dpdk: add sflow support for vhost-user ports

Kavanagh, Mark B mark.b.kavanagh at intel.com
Wed May 11 09:55:41 UTC 2016


>
>Hi Mark,
>
>Replies inline prefixed with [PL].
>
>Thanks,
>Przemek
>
>-----Original Message-----
>From: Kavanagh, Mark B
>Sent: Thursday, May 5, 2016 2:19 PM
>To: Lal, PrzemyslawX <przemyslawx.lal at intel.com>; dev at openvswitch.org
>Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user ports
>
>Hi Przemek,
>
>Some additional comments/queries inline.
>
>Thanks again,
>Mark
>
>>
>>This patch adds sFlow support for DPDK vHost-user interfaces by
>>assigning ifindex value. Values of ifindexes for vHost-user interfaces
>>start with 2000 to avoid overlapping with kernel datapath interfaces.
>>
>>Patch also fixes issue with 'dpdk0' interface being ignored by sFlow
>>agent, because of ifindex 0. Ifindexes values for physical DPDK
>>interfaces start with 1000 to avoid overlapping with kernel datapath interfaces.
>>
>>Signed-off-by: Przemyslaw Lal <przemyslawx.lal at intel.com>
>>---
>> lib/netdev-dpdk.c | 70
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 69 insertions(+), 1 deletion(-)
>>
>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>208c5f5..1b60c5a 100644
>>--- a/lib/netdev-dpdk.c
>>+++ b/lib/netdev-dpdk.c
>>@@ -115,6 +115,18 @@ static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets
>>*/
>>  */
>> #define VHOST_ENQ_RETRY_USECS 100
>>
>>+/* For DPDK ETH interfaces use ifindex values starting with 1000
>>+ * to avoid overlapping with kernel-space interfaces.
>>+ * Also starting with 0 would cause sFlow to ignore 'dpdk0' interface.
>>+ */
>>+#define DPDK_PORT_ID_TO_IFINDEX(port_id) ((port_id) + 1000)
>>+
>>+/* For DPDK vhost-user interfaces use ifindexes starting with 2000.
>>+ */
>>+#define VHOST_ID_TO_IFINDEX(port_id) ((port_id) + 2000)
>>+
>>+#define VHOST_IDS_MAX_LEN 1024
>>+
>> static const struct rte_eth_conf port_conf = {
>>     .rxmode = {
>>         .mq_mode = ETH_MQ_RX_RSS,
>>@@ -149,6 +161,7 @@ enum dpdk_dev_type {  static int rte_eal_init_ret =
>>ENODEV;
>>
>> static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
>>+static struct ovs_mutex vhost_mutex = OVS_MUTEX_INITIALIZER;
>>
>> /* Quality of Service */
>>
>>@@ -790,6 +803,50 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
>>     return err;
>> }
>>
>>+/* Counter for VHOST interfaces as DPDK library doesn't provide
>>+mechanism
>>+ * similar to rte_eth_dev_count for vhost-user sockets.
>>+ */
>>+static int vhost_counter OVS_GUARDED_BY(vhost_mutex) = 0;
>>+
>>+/* Array storing vhost_ids, so their ifindex can be reused after
>>+socket
>>+ * recreation.
>>+ */
>>+static char vhost_ids[VHOST_IDS_MAX_LEN][PATH_MAX]
>>+OVS_GUARDED_BY(vhost_mutex);
>>+
>>+/* Simple lookup in vhost_ids array.
>>+ * On success returns id of vhost_id stored in the array,
>>+ * otherwise returns -1.
>>+ */
>>+static int
>>+netdev_dpdk_lookup_vhost_id(struct netdev_dpdk *dev)
>>+OVS_REQUIRES(vhost_mutex) {
>>+    for (int i = 0; i < vhost_counter; i++) {
>>+        if (strncmp(vhost_ids[i], dev->vhost_id, strlen(dev->vhost_id)) == 0) {
>>+            return i;
>>+        }
>>+    }
>>+    return -1;
>>+}
>>+
>>+/* Inserts vhost_id at the first free position in the vhost_ids array.
>>+ */
>>+static void
>>+netdev_dpdk_push_vhost_id(struct netdev_dpdk *dev) {
>>+    ovs_mutex_lock(&vhost_mutex);
>>+    if (netdev_dpdk_lookup_vhost_id(dev) < 0) {
>>+        if (vhost_counter < VHOST_IDS_MAX_LEN) {
>>+            strncpy(vhost_ids[vhost_counter++], dev->vhost_id,
>>+                    strlen(dev->vhost_id));
>>+        } else {
>>+            VLOG_WARN("Could not assign ifindex to \"%s\" port. "
>>+                      "List of vhost IDs list is full.",
>>+                      dev->vhost_id);
>>+        }
>>+    }
>>+    ovs_mutex_unlock(&vhost_mutex);
>>+}
>>+
>> static int
>> netdev_dpdk_vhost_user_construct(struct netdev *netdev)  { @@ -825,6
>>+882,8 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>         err = vhost_construct_helper(netdev);
>>     }
>>
>>+    netdev_dpdk_push_vhost_id(dev);
>>+
>>     ovs_mutex_unlock(&dpdk_mutex);
>>     return err;
>> }
>>@@ -1773,9 +1832,18 @@ netdev_dpdk_get_ifindex(const struct netdev
>>*netdev)  {
>>     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>     int ifindex;
>>+    int ret;
>>
>>     ovs_mutex_lock(&dev->mutex);
>>-    ifindex = dev->port_id;
>>+    if (dev->type == DPDK_DEV_ETH) {
>>+        ifindex = DPDK_PORT_ID_TO_IFINDEX(dev->port_id);
>>+    } else {
>
>
>This 'else' statement is executed in both the case of vhost user devices and also vhost cuse
>devices.
>In the latter case, do the port numbers returned by VHOST_ID_IFINDEX make sense? (i.e. can
>sFlow 'talk' to vhost-cuse ports? If not, then this code has a bug).
>
>[PL] At this point for vhost cuse there's always '-1' returned from lookup function, so in
>the nested if statement you'll always hit 'else' and 0 will be returned as ifindex for vhost
>cuse devices.

Agreed. But then all vhost-cuse devices share index of 0 - this is surely problematic?

>Anyway setting ifindexes for vhost cuse devices should be a trivial task and if vhost cuse is
>supported by sflow I can handle it - but then it will require changing patch subject and
>message.

Have you tested this patch against vhost-cuse? I'd be interested as to how it behaves.

>
>>+        if ((ret = netdev_dpdk_lookup_vhost_id(dev)) >= 0) {
>>+            ifindex = VHOST_ID_TO_IFINDEX(ret);
>>+        } else {
>>+            ifindex = 0;
>
>I don't think that 0 is an appropriate value to return here, since it overlaps with a kernel
>interface's index.
>
>[PL] IMHO it's ok to use 0 here - it will be assigned to ifindex only when lookup for vhost
>device failed.

Have you seen Ben's comment on this? http://openvswitch.org/pipermail/dev/2016-May/070626.html 

>
>>+        }
>>+    }
>>     ovs_mutex_unlock(&dev->mutex);
>>
>>     return ifindex;
>>--
>>2.1.0
>
>[MK] There's a gap in this code - when the vhost-user device is destroyed, its entry should
>be removed from 'vhost_ids'.
>
>[PL] Actually the whole idea behind "caching" these vhost_ids is to have them stored to
>survive destroy and recreation of vhost devices, for instance when VM is hard-rebooted, to
>keep the same ifindex and not assign a new one, which could be confusing (for example one
>could have vhu1 port with ifindex 2001 visible in sFlow stats, but then hard-reboots this VM
>and vhu1 gets ifindex 2002 - it's the same vhost user socket, the same VM, but somehow it
>gets new ifindex and gets new entry in sflow). But if you think I can be improve this, don't
>hesitate to share your ideas.

I completely understand what your intention is here. There is still an inherent issue with the design though, in that once vhost ports have been added to 'vhost_ids', they are never removed.

Consider the following scenario:
- multiple vhost ports have been added to the bridge, such that vhost_counter = VHOST_IDS_MAX_LEN (I'm not sure what the use case for adding 1024 ports to a bridge would be, but I digress)
- one or more of the previously-added ports are no longer needed, so we bring the VMs that they are attached to down
- we want to add another vhost port, described by a vhost_id that is not already present in 'vhost_ids', and connect it to a VM
=> Now, however, we cannot add the additional port, due to the fact that vhost_counter was never decremented, such that it is still equal to VHOST_IDS_MAX_LEN: the bounds check in netdev_dpdk_push_vhost_id fails, and the user is warned that their port cannot be added.



More information about the dev mailing list