[ovs-dev] [PATCH] dpdk: Support running PMD threads on cores > RTE_MAX_LCORE.

David Marchand david.marchand at redhat.com
Tue Dec 3 13:00:54 UTC 2019


On Tue, Dec 3, 2019 at 12:34 PM Ilya Maximets <i.maximets at ovn.org> wrote:
>
> On 02.12.2019 17:03, David Marchand wrote:
> > Most DPDK components make the assumption that rte_lcore_id() returns
> > a valid lcore_id in [0..RTE_MAX_LCORE] range (with the exception of
> > the LCORE_ID_ANY special value).
>
> Do you think that makes a practical sense?  In a real virtualization
> environments users are usually using cpus with lower numbers and it's
> most likely possible to change NUMA layout to have CPUs from all the
> nodes enumerated from the low CPU numbers.

That is if you can reconfigure the NUMA layout.
How do you achieve this?
Bios/firmware configuration?
Kernel boot options (did not find)?

I imagine it would be a pain to enforce this on a fleet of servers
with different hw specs.


> It's uncommon also to use all the CPUs for just OVS on a big system
> with huge number of cores.

Yes, obviously, you don't want to use all your cores for OVS :-).


>
> Another thing is can we really do this on a DPDK level?  Maybe it'll
> be a step to dynamic registering/unregistering of non-EAL threads?

I'd like to ultimately get to a dynamic register mechanism.
But I first wanted to fix the existing code...


> Since you're wiping off the meaning of a lcore as a CPU core, it
> becomes just a thread_id in a DPDK point of view.  So, maybe application
> could call a new API function instead of managing these strange
> mappings by itself?

... as it is unlikely we will backport an experimental API to previous
dpdk versions.


>
> One comment inline (not a full review).
>
> Best regards, Ilya Maximets.
>
> >
> > OVS does not currently check which value is set in
> > RTE_PER_LCORE(_lcore_id) which exposes us to potential crashes on DPDK
> > side.
> >
> > Introduce a lcore allocator in OVS for PMD threads and map them to
> > unused lcores from DPDK à la --lcores.
> >
> > The physical cores on which the PMD threads are running still
> > constitutes an important information when debugging, so still keep
> > those in the PMD thread names but add a new debug log when starting
> > them.
> >
> > Synchronize DPDK internals on numa and cpuset for the PMD threads by
> > registering them via the rte_thread_set_affinity() helper.
> >
> > Signed-off-by: David Marchand <david.marchand at redhat.com>
> > ---
> >  lib/dpdk-stub.c   |  8 +++++-
> >  lib/dpdk.c        | 69 +++++++++++++++++++++++++++++++++++++++++++----
> >  lib/dpdk.h        |  3 ++-
> >  lib/dpif-netdev.c |  3 ++-
> >  4 files changed, 75 insertions(+), 8 deletions(-)
> >
> > diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
> > index c332c217c..90473bc8e 100644
> > --- a/lib/dpdk-stub.c
> > +++ b/lib/dpdk-stub.c
> > @@ -39,7 +39,13 @@ dpdk_init(const struct smap *ovs_other_config)
> >  }
> >
> >  void
> > -dpdk_set_lcore_id(unsigned cpu OVS_UNUSED)
> > +dpdk_init_thread_context(unsigned cpu OVS_UNUSED)
> > +{
> > +    /* Nothing */
> > +}
> > +
> > +void
> > +dpdk_uninit_thread_context(void)
> >  {
> >      /* Nothing */
> >  }
> > diff --git a/lib/dpdk.c b/lib/dpdk.c
> > index 21dd47e80..771baa413 100644
> > --- a/lib/dpdk.c
> > +++ b/lib/dpdk.c
> > @@ -33,6 +33,7 @@
> >
> >  #include "dirs.h"
> >  #include "fatal-signal.h"
> > +#include "id-pool.h"
> >  #include "netdev-dpdk.h"
> >  #include "netdev-offload-provider.h"
> >  #include "openvswitch/dynamic-string.h"
> > @@ -55,6 +56,9 @@ static bool dpdk_initialized = false; /* Indicates successful initialization
> >                                         * of DPDK. */
> >  static bool per_port_memory = false; /* Status of per port memory support */
> >
> > +static struct id_pool *lcore_id_pool;
> > +static struct ovs_mutex lcore_id_pool_mutex = OVS_MUTEX_INITIALIZER;
> > +
> >  static int
> >  process_vhost_flags(char *flag, const char *default_val, int size,
> >                      const struct smap *ovs_other_config,
> > @@ -346,7 +350,8 @@ dpdk_init__(const struct smap *ovs_other_config)
> >          }
> >      }
> >
> > -    if (args_contains(&args, "-c") || args_contains(&args, "-l")) {
> > +    if (args_contains(&args, "-c") || args_contains(&args, "-l") ||
> > +        args_contains(&args, "--lcores")) {
> >          auto_determine = false;
> >      }
> >
> > @@ -372,8 +377,8 @@ dpdk_init__(const struct smap *ovs_other_config)
> >               * thread affintity - default to core #0 */
> >              VLOG_ERR("Thread getaffinity failed. Using core #0");
> >          }
> > -        svec_add(&args, "-l");
> > -        svec_add_nocopy(&args, xasprintf("%d", cpu));
> > +        svec_add(&args, "--lcores");
> > +        svec_add_nocopy(&args, xasprintf("0@%d", cpu));
> >      }
> >
> >      svec_terminate(&args);
> > @@ -429,6 +434,23 @@ dpdk_init__(const struct smap *ovs_other_config)
> >          }
> >      }
> >
> > +    ovs_mutex_lock(&lcore_id_pool_mutex);
> > +    lcore_id_pool = id_pool_create(0, RTE_MAX_LCORE);
> > +    /* Empty the whole pool... */
> > +    for (uint32_t lcore = 0; lcore < RTE_MAX_LCORE; lcore++) {
> > +        uint32_t lcore_id;
> > +
> > +        id_pool_alloc_id(lcore_id_pool, &lcore_id);
> > +    }
> > +    /* ...and release the unused spots. */
> > +    for (uint32_t lcore = 0; lcore < RTE_MAX_LCORE; lcore++) {
> > +        if (rte_eal_lcore_role(lcore) != ROLE_OFF) {
> > +             continue;
> > +        }
> > +        id_pool_free_id(lcore_id_pool, lcore);
> > +    }
> > +    ovs_mutex_unlock(&lcore_id_pool_mutex);
> > +
> >      /* We are called from the main thread here */
> >      RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
> >
> > @@ -522,11 +544,48 @@ dpdk_available(void)
> >  }
> >
> >  void
> > -dpdk_set_lcore_id(unsigned cpu)
> > +dpdk_init_thread_context(unsigned cpu)
> >  {
> > +    cpu_set_t cpuset;
> > +    unsigned lcore;
> > +    int err;
> > +
> >      /* NON_PMD_CORE_ID is reserved for use by non pmd threads. */
> >      ovs_assert(cpu != NON_PMD_CORE_ID);
> > -    RTE_PER_LCORE(_lcore_id) = cpu;
> > +
> > +    ovs_mutex_lock(&lcore_id_pool_mutex);
> > +    if (lcore_id_pool == NULL || !id_pool_alloc_id(lcore_id_pool, &lcore)) {
> > +        lcore = NON_PMD_CORE_ID;
>
> If PMD thread will get NON_PMD_CORE_ID it will smash the non-PMD
> mempool caches since it doesn't take a 'non_pmd_mutex'.

Ok, need to look at this again.
I missed something.
I don't understand how you would create PMD threads with cpu !=
NON_PMD_CORE_ID without dpdk enabled but still call the mempool code.


--
David Marchand



More information about the dev mailing list