[ovs-discuss] Mempool issue for OVS 2.9

Ilya Maximets i.maximets at samsung.com
Fri Jan 26 14:16:02 UTC 2018


On 26.01.2018 15:00, Stokes, Ian wrote:
> Hi All,
> 
> Recently an issue was raised regarding the move from a single shared mempool model that was in place up to OVS 2.8, to a mempool per port model introduced in 2.9.
> 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046021.html
> 
> The per port mempool model was introduced in September 5th with commit d555d9bded to allow fine grain control on a per port case of memory usage.
> 
> In the 'common/shared mempool' model, ports sharing a similar MTU would all share the same buffer mempool (e.g. the most common example of this being that all ports are by default created with a 1500B MTU, and as such share the same mbuf mempool).
> 
> This approach had some drawbacks however. For example, with the shared memory pool model a user could exhaust the shared memory pool (for instance by requesting a large number of RXQs for a port), this would cause the vSwitch to crash as any remaining ports would not have the required memory to function. This bug was discovered and reported to the community in late 2016 https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html.
> 
> The per port mempool patch aimed to avoid such issues by allocating a separate buffer mempool to each port.
> 
> An issue has been flagged on ovs-discuss, whereby memory dimensions provided for a given number of ports on OvS 2.6-2.8 may be insufficient to support the same number of ports in OvS 2.9, on account of the per-port mempool model without re-dimensioning extra memory. The effect of this is use case dependent (number of ports, RXQs, MTU settings, number of PMDs etc.) The previous 'common-pool' model was rudimentary in estimating the mempool size and was marked as something that was to be improved upon. The memory allocation calculation for per port model was modified to take the possible configuration factors mentioned into account.
> 
> It's unfortunate that this came to light so close to the release code freeze - but better late than never as it is a valid problem to be resolved.
> 
> I wanted to highlight some options to the community as I don't think the next steps should be taken in isolation due to the impact this feature has.
> 
> There are a number of possibilities for the 2.9 release.
> 
> (i) Revert the mempool per port patches and return to the shared mempool model. There are a number of features and refactoring in place on top of the change so this will not be a simple revert. I'm investigating what exactly is involved with this currently.

This looks like a bad thing to do. Shared memory pools has their own issues and
hides the real memory usage by each port. Also, reverting seems almost impossible,
we'll have to re-implement it from scratch.

> (ii) Leave the per port mempool implementation as is, flag to users that memory requirements have increased. Extra memory may have to be provided on a per use case basis.

That's a good point. I prefer this option if possible.

> (iii) Reduce the amount of memory allocated per mempool per port. An RFC to this effect was submitted by Kevin but on follow up the feeling is that it does not resolve the issue adequately.

IMHO, we should not reduce the mempool size by hardcoding small value. Different
installations has their own requirements because of different numbers of HW NICs
and CPU cores. I beleive that 20-30K is a normal size per-port for now.

How about to add an upper limit for mempool size configurable in boot time?
See below code for example (not tested):
-------------------------------------------------------------------------------
diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index 3602180..ed54e06 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -54,3 +54,9 @@ dpdk_vhost_iommu_enabled(void)
 {
     return false;
 }
+
+uint32_t
+dpdk_mempool_size_limit(void)
+{
+    return UINT32_MAX;
+}
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 6710d10..219e412 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -43,6 +43,9 @@ static FILE *log_stream = NULL;       /* Stream for DPDK log redirection */
 static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
 static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU support */
 
+/* Maximum number of mbufs in a single memory pool. */
+static uint32_t per_port_mempool_size_limit = UINT32_MAX;
+
 static int
 process_vhost_flags(char *flag, const char *default_val, int size,
                     const struct smap *ovs_other_config,
@@ -313,6 +316,7 @@ dpdk_init__(const struct smap *ovs_other_config)
     int err = 0;
     cpu_set_t cpuset;
     char *sock_dir_subcomponent;
+    uint64_t size_u64;
 
     log_stream = fopencookie(NULL, "w+", dpdk_log_func);
     if (log_stream == NULL) {
@@ -351,6 +355,14 @@ dpdk_init__(const struct smap *ovs_other_config)
     VLOG_INFO("IOMMU support for vhost-user-client %s.",
                vhost_iommu_enabled ? "enabled" : "disabled");
 
+    size_u64 = smap_get_ullong(ovs_other_config, "per-port-mempool-size-limit",
+                               UINT32_MAX);
+    if (size_u64 < UINT32_MAX) {
+        per_port_mempool_size_limit = size_u64;
+        VLOG_INFO("Mempool size for a single port limited to %"PRIu32" mbufs.",
+                  per_port_mempool_size_limit);
+    }
+
     argv = grow_argv(&argv, 0, 1);
     argc = 1;
     argv[0] = xstrdup(ovs_get_program_name());
@@ -494,6 +506,12 @@ dpdk_vhost_iommu_enabled(void)
     return vhost_iommu_enabled;
 }
 
+uint32_t
+dpdk_mempool_size_limit(void)
+{
+    return per_port_mempool_size_limit;
+}
+
 void
 dpdk_set_lcore_id(unsigned cpu)
 {
diff --git a/lib/dpdk.h b/lib/dpdk.h
index dc58d96..4d78895 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -38,5 +38,6 @@ void dpdk_init(const struct smap *ovs_other_config);
 void dpdk_set_lcore_id(unsigned cpu);
 const char *dpdk_get_vhost_sock_dir(void);
 bool dpdk_vhost_iommu_enabled(void);
+uint32_t dpdk_mempool_size_limit(void);
 
 #endif /* dpdk.h */
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e834c7e..b2f1072 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -518,7 +518,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
     char mp_name[RTE_MEMPOOL_NAMESIZE];
     const char *netdev_name = netdev_get_name(&dev->up);
     int socket_id = dev->requested_socket_id;
-    uint32_t n_mbufs;
+    uint32_t n_mbufs, max_mbufs;
     uint32_t hash = hash_string(netdev_name, 0);
     struct rte_mempool *mp = NULL;
 
@@ -534,6 +534,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
               + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
               + MIN_NB_MBUF;
 
+    max_mbufs = dpdk_mempool_size_limit();
+    if (max_mbufs < n_mbufs) {
+        VLOG_WARN("Mempool size for port %s limited to %"PRIu32
+                  " mbufs. Original estimation was %"PRIu32" mbufs.",
+                  netdev_name, max_mbufs, n_mbufs);
+    }
+
     ovs_mutex_lock(&dpdk_mp_mutex);
     do {
         /* Full DPDK memory pool name must be unique and cannot be

-------------------------------------------------------------------------------

I think that above approach is a good enough workaround because only
user knows how much memory he/she able to allocate for OVS and how
many ports expected.
I'd like the limit to be in megabytes, but I didn't find a good way
to estimate total size of a mempool.

> (iv) Introduce a feature to allow users to configure mempool as shared or on a per port basis: This would be the best of both worlds but given the proximity to the 2.9 freeze I don't think it's feasible by the end of January.

This, probably, may solve some issues, but yes, it's definitely not for 2.9.
Good implementation and extensive testing will be required.

> 
> I'd appreciate peoples input on this ASAP so we can reach a consensus on the next steps.
> 
> Thanks
> Ian


More information about the discuss mailing list