[ovs-dev] [PATCH] netdev-dpdk: add sflow support for vhost-user ports
Lal, PrzemyslawX
przemyslawx.lal at intel.com
Thu Jun 16 11:16:00 UTC 2016
Hi Mark,
RFC 2863 standard describes in "Interface Numbering" chapter (https://tools.ietf.org/html/rfc2863#page-10) that if there's possibility to reuse ifindex after reinitialization then it should be reused. And in our case it's definitely possible. I also agree with you that limitation to 1024 vhostuser interfaces is a major problem here and I should avoid such arbitrary values. I think the solution might be to use hashmap or OVS list implementation or maybe even dynamic allocation of new vhost ifindex entries.
I have also analyzed current vhost-cuse initialization and adding support for ifindex assignment to this type of port should be a trivial task and it could be easily introduced in V2 of this patch.
Please share your thoughts.
Thanks,
Przemek
-----Original Message-----
From: Kavanagh, Mark B
Sent: Wednesday, May 11, 2016 11:56 AM
To: Lal, PrzemyslawX <przemyslawx.lal at intel.com>; dev at openvswitch.org
Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user ports
>
>Hi Mark,
>
>Replies inline prefixed with [PL].
>
>Thanks,
>Przemek
>
>-----Original Message-----
>From: Kavanagh, Mark B
>Sent: Thursday, May 5, 2016 2:19 PM
>To: Lal, PrzemyslawX <przemyslawx.lal at intel.com>; dev at openvswitch.org
>Subject: RE: [PATCH] netdev-dpdk: add sflow support for vhost-user
>ports
>
>Hi Przemek,
>
>Some additional comments/queries inline.
>
>Thanks again,
>Mark
>
>>
>>This patch adds sFlow support for DPDK vHost-user interfaces by
>>assigning ifindex value. Values of ifindexes for vHost-user interfaces
>>start with 2000 to avoid overlapping with kernel datapath interfaces.
>>
>>Patch also fixes issue with 'dpdk0' interface being ignored by sFlow
>>agent, because of ifindex 0. Ifindexes values for physical DPDK
>>interfaces start with 1000 to avoid overlapping with kernel datapath interfaces.
>>
>>Signed-off-by: Przemyslaw Lal <przemyslawx.lal at intel.com>
>>---
>> lib/netdev-dpdk.c | 70
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 69 insertions(+), 1 deletion(-)
>>
>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>208c5f5..1b60c5a 100644
>>--- a/lib/netdev-dpdk.c
>>+++ b/lib/netdev-dpdk.c
>>@@ -115,6 +115,18 @@ static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets
>>*/
>> */
>> #define VHOST_ENQ_RETRY_USECS 100
>>
>>+/* For DPDK ETH interfaces use ifindex values starting with 1000
>>+ * to avoid overlapping with kernel-space interfaces.
>>+ * Also starting with 0 would cause sFlow to ignore 'dpdk0' interface.
>>+ */
>>+#define DPDK_PORT_ID_TO_IFINDEX(port_id) ((port_id) + 1000)
>>+
>>+/* For DPDK vhost-user interfaces use ifindexes starting with 2000.
>>+ */
>>+#define VHOST_ID_TO_IFINDEX(port_id) ((port_id) + 2000)
>>+
>>+#define VHOST_IDS_MAX_LEN 1024
>>+
>> static const struct rte_eth_conf port_conf = {
>> .rxmode = {
>> .mq_mode = ETH_MQ_RX_RSS,
>>@@ -149,6 +161,7 @@ enum dpdk_dev_type { static int rte_eal_init_ret
>>= ENODEV;
>>
>> static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
>>+static struct ovs_mutex vhost_mutex = OVS_MUTEX_INITIALIZER;
>>
>> /* Quality of Service */
>>
>>@@ -790,6 +803,50 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev)
>> return err;
>> }
>>
>>+/* Counter for VHOST interfaces as DPDK library doesn't provide
>>+mechanism
>>+ * similar to rte_eth_dev_count for vhost-user sockets.
>>+ */
>>+static int vhost_counter OVS_GUARDED_BY(vhost_mutex) = 0;
>>+
>>+/* Array storing vhost_ids, so their ifindex can be reused after
>>+socket
>>+ * recreation.
>>+ */
>>+static char vhost_ids[VHOST_IDS_MAX_LEN][PATH_MAX]
>>+OVS_GUARDED_BY(vhost_mutex);
>>+
>>+/* Simple lookup in vhost_ids array.
>>+ * On success returns id of vhost_id stored in the array,
>>+ * otherwise returns -1.
>>+ */
>>+static int
>>+netdev_dpdk_lookup_vhost_id(struct netdev_dpdk *dev)
>>+OVS_REQUIRES(vhost_mutex) {
>>+ for (int i = 0; i < vhost_counter; i++) {
>>+ if (strncmp(vhost_ids[i], dev->vhost_id, strlen(dev->vhost_id)) == 0) {
>>+ return i;
>>+ }
>>+ }
>>+ return -1;
>>+}
>>+
>>+/* Inserts vhost_id at the first free position in the vhost_ids array.
>>+ */
>>+static void
>>+netdev_dpdk_push_vhost_id(struct netdev_dpdk *dev) {
>>+ ovs_mutex_lock(&vhost_mutex);
>>+ if (netdev_dpdk_lookup_vhost_id(dev) < 0) {
>>+ if (vhost_counter < VHOST_IDS_MAX_LEN) {
>>+ strncpy(vhost_ids[vhost_counter++], dev->vhost_id,
>>+ strlen(dev->vhost_id));
>>+ } else {
>>+ VLOG_WARN("Could not assign ifindex to \"%s\" port. "
>>+ "List of vhost IDs list is full.",
>>+ dev->vhost_id);
>>+ }
>>+ }
>>+ ovs_mutex_unlock(&vhost_mutex);
>>+}
>>+
>> static int
>> netdev_dpdk_vhost_user_construct(struct netdev *netdev) { @@ -825,6
>>+882,8 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>> err = vhost_construct_helper(netdev);
>> }
>>
>>+ netdev_dpdk_push_vhost_id(dev);
>>+
>> ovs_mutex_unlock(&dpdk_mutex);
>> return err;
>> }
>>@@ -1773,9 +1832,18 @@ netdev_dpdk_get_ifindex(const struct netdev
>>*netdev) {
>> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> int ifindex;
>>+ int ret;
>>
>> ovs_mutex_lock(&dev->mutex);
>>- ifindex = dev->port_id;
>>+ if (dev->type == DPDK_DEV_ETH) {
>>+ ifindex = DPDK_PORT_ID_TO_IFINDEX(dev->port_id);
>>+ } else {
>
>
>This 'else' statement is executed in both the case of vhost user
>devices and also vhost cuse devices.
>In the latter case, do the port numbers returned by VHOST_ID_IFINDEX
>make sense? (i.e. can sFlow 'talk' to vhost-cuse ports? If not, then this code has a bug).
>
>[PL] At this point for vhost cuse there's always '-1' returned from
>lookup function, so in the nested if statement you'll always hit 'else'
>and 0 will be returned as ifindex for vhost cuse devices.
Agreed. But then all vhost-cuse devices share index of 0 - this is surely problematic?
>Anyway setting ifindexes for vhost cuse devices should be a trivial
>task and if vhost cuse is supported by sflow I can handle it - but then
>it will require changing patch subject and message.
Have you tested this patch against vhost-cuse? I'd be interested as to how it behaves.
>
>>+ if ((ret = netdev_dpdk_lookup_vhost_id(dev)) >= 0) {
>>+ ifindex = VHOST_ID_TO_IFINDEX(ret);
>>+ } else {
>>+ ifindex = 0;
>
>I don't think that 0 is an appropriate value to return here, since it
>overlaps with a kernel interface's index.
>
>[PL] IMHO it's ok to use 0 here - it will be assigned to ifindex only
>when lookup for vhost device failed.
Have you seen Ben's comment on this? http://openvswitch.org/pipermail/dev/2016-May/070626.html
>
>>+ }
>>+ }
>> ovs_mutex_unlock(&dev->mutex);
>>
>> return ifindex;
>>--
>>2.1.0
>
>[MK] There's a gap in this code - when the vhost-user device is
>destroyed, its entry should be removed from 'vhost_ids'.
>
>[PL] Actually the whole idea behind "caching" these vhost_ids is to
>have them stored to survive destroy and recreation of vhost devices,
>for instance when VM is hard-rebooted, to keep the same ifindex and not
>assign a new one, which could be confusing (for example one could have
>vhu1 port with ifindex 2001 visible in sFlow stats, but then
>hard-reboots this VM and vhu1 gets ifindex 2002 - it's the same vhost
>user socket, the same VM, but somehow it gets new ifindex and gets new entry in sflow). But if you think I can be improve this, don't hesitate to share your ideas.
I completely understand what your intention is here. There is still an inherent issue with the design though, in that once vhost ports have been added to 'vhost_ids', they are never removed.
Consider the following scenario:
- multiple vhost ports have been added to the bridge, such that vhost_counter = VHOST_IDS_MAX_LEN (I'm not sure what the use case for adding 1024 ports to a bridge would be, but I digress)
- one or more of the previously-added ports are no longer needed, so we bring the VMs that they are attached to down
- we want to add another vhost port, described by a vhost_id that is not already present in 'vhost_ids', and connect it to a VM => Now, however, we cannot add the additional port, due to the fact that vhost_counter was never decremented, such that it is still equal to VHOST_IDS_MAX_LEN: the bounds check in netdev_dpdk_push_vhost_id fails, and the user is warned that their port cannot be added.
More information about the dev
mailing list