[ovs-dev] [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to db

Traynor, Kevin kevin.traynor at intel.com
Mon Dec 21 17:49:15 UTC 2015


Hi Aaron,

> -----Original Message-----
> From: Aaron Conole [mailto:aconole at redhat.com]
> Sent: Friday, December 18, 2015 6:28 PM
> To: dev at openvswitch.org
> Cc: Flavio Leitner; Traynor, Kevin
> Subject: [PATCH 2/5] netdev-dpdk: Convert initialization from cmdline to db
> 
> Existing DPDK integration is provided by use of command line options which
> must be split out and passed to librte in a special manner. However, this
> forces any configuration to be passed by way of a special DPDK flag, and
> interferes with ovs+dpdk packaging solutions.
> 
> This commit delays dpdk initialization until after the OVS database
> connection
> is established, and then initializes librte. It pulls all of the config data
> from the OVS database, and assembles a new argv/argc pair to be passed along.

FYI - There is some whitespace warnings showing up when applying the patchset.

> 
> Signed-off-by: Aaron Conole <aconole at redhat.com>
> ---
>  INSTALL.DPDK.md         |  60 ++++++++++++-----
>  lib/netdev-dpdk.c       | 172 +++++++++++++++++++++++++++++++++-------------
> --
>  lib/netdev-dpdk.h       |  19 ++++--
>  vswitchd/bridge.c       |   3 +
>  vswitchd/ovs-vswitchd.c |  25 ++-----
>  5 files changed, 181 insertions(+), 98 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 96b686c..b9d92d0 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -143,22 +143,48 @@ Using the DPDK with ovs-vswitchd:
> 
>  5. Start vswitchd:
> 
> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
> -   argument. This needs to be first argument passed to vswitchd process.
> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
> -   for dpdk initialization.
> +   DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
> +   other_config database. The recognized configuration options are listed.
> 
> +   * dpdk
> +   This is a bolean configuration option. A value of 'true' signals

typo boolean

> +   Open_vSwitch to initialize the DPDK EAL at startup. A set of nominal
> +   defaults are provided so that simply enabling this option will be
> sufficient
> +   to configure DPDK enabled ports.
> +
> +   * dpdk_lcore_mask
> +   This sets the core mask affinity of non-PMD threads that are spawned by
> the
> +   EAL. It will not impact the affinities of the bridge, or other Open
> vSwitch

I think we need to explain a bit more about the behavior of this param and the
default wrt num of cores and which ones are default. 

> +   userspace threads.
> +
> +   * dpdk_mem_channels
> +   This sets the number of memory spread channels in the CPU to be used by
> +   DPDK. It is purely an optimization flag.
> +
> +   * dpdk_hugepage_dir
> +   Directory where hugetlbfs is mounted
> +
> +   * cuse_dev_name
> +   Option to set the vhost_cuse character device name.
> +
> +   * vhost_sock_dir
> +   Option to set the path to the vhost_user unix socket files.
> +
> +   NOTE: Changing any of these options requires restarting the ovs-vswitchd
> +   application.
> +
>     ```
>     export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
> -   ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
> +   ovs-vsctl set Open_vSwitch . other_config:dpdk=true
> +   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
>     ```

To be consistent with the rest of the guide, we should leave in the command to
start vswitchd, even if there is nothing dpdk specific about it anymore. It will
save someone having to go find it.

> 
>     If allocated more than one GB hugepage (as for IVSHMEM), set amount and
>     use NUMA node 0 memory:
> 
>     ```
> -   ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
> -   -- unix:$DB_SOCK --pidfile --detach
> +   ovs-vsctl set Open_vSwitch . other_config:dpdk_socket_mem="1024,0"
> +   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
>     ```
> 
>  6. Add bridge & ports
> @@ -521,11 +547,12 @@ have arbitrary names.
>       `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
>       to your VM on the QEMU command line. More instructions on this can be
>       found in the next section "DPDK vhost-user VM configuration"
> -     Note: If you wish for the vhost-user sockets to be created in a
> -     directory other than `/usr/local/var/run/openvswitch`, you may specify
> -     another location on the ovs-vswitchd command line like so:
> +
> +  - If you wish for the vhost-user sockets to be created in a directory
> other
> +    than `/usr/local/var/run/openvswitch`, you may specify another location
> +    in the ovsdb like:
> 
> -      `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
> +    `./vswitchd/ovs-vsctl set Open_vSwitch .
> other_config:vhost_sock_dir=path`
> 
>  DPDK vhost-user VM configuration:
>  ---------------------------------
> @@ -638,14 +665,13 @@ DPDK vhost-cuse VM configuration:
> 
>  1. This step is only needed if using an alternative character device.
> 
> -   The new character device filename must be specified on the vswitchd
> -   commandline:
> +   The new character device filename must be specified in the ovsdb:
> 
> -        `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1
> ...`
> +   `./utilities/ovs-vsctl set Open_vSwitch . \
> +                          other_config:cuse_dev_name=my-vhost-net`
> 
> -   Note that the `--cuse_dev_name` argument and associated string must be
> the first
> -   arguments after `--dpdk` and come before the EAL arguments. In the
> example
> -   above, the character device to be used will be `/dev/my-vhost-net`.
> +   In the example above, the character device to be used will be
> +   `/dev/my-vhost-net`.
> 
>  2. This step is only needed if reusing the standard character device. It
> will
>     conflict with the kernel vhost character device so the user must first
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 9c302f0..2a81058 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -29,6 +29,7 @@
>  #include <stdio.h>
>  #include <sys/types.h>
>  #include <sys/stat.h>
> +#include <getopt.h>
> 
>  #include "dirs.h"
>  #include "dp-packet.h"
> @@ -49,6 +50,8 @@
>  #include "timeval.h"
>  #include "unixctl.h"
>  #include "openvswitch/vlog.h"
> +#include "smap.h"
> +#include "vswitch-idl.h"
> 
>  #include "rte_config.h"
>  #include "rte_mbuf.h"
> @@ -2107,10 +2110,14 @@ unlock_dpdk:
> 
>  static int
>  process_vhost_flags(char *flag, char *default_val, int size,
> -                    char **argv, char **new_val)
> +                    const struct ovsrec_open_vswitch *ovs_cfg,
> +                    char **new_val)
>  {
> +    const char *val;
>      int changed = 0;
> 
> +    val = smap_get(&ovs_cfg->other_config, flag);
> +
>      /* Depending on which version of vhost is in use, process the vhost-
> specific
>       * flag if it is provided on the vswitchd command line, otherwise resort
> to
>       * a default value.
> @@ -2120,9 +2127,9 @@ process_vhost_flags(char *flag, char *default_val, int
> size,
>       * For vhost-cuse: Process "-cuse_dev_name" to set the custom name of
> the
>       * vhost-cuse character device.
>       */
> -    if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) {
> +    if (val && (strlen(val) <= size)) {
>          changed = 1;
> -        *new_val = strdup(argv[2]);
> +        *new_val = strdup(val);
>          VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
>      } else {
>          VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
> @@ -2132,68 +2139,117 @@ process_vhost_flags(char *flag, char *default_val,
> int size,
>      return changed;
>  }
> 
> -int
> -dpdk_init(int argc, char **argv)
> +static char **
> +grow_argv(char ***argv, size_t cur_siz, size_t grow_by)
> +{
> +    char **new_argv = realloc(*argv, sizeof(char *) * (cur_siz + grow_by));
> +    return new_argv;
> +}
> +
> +static int
> +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
> +{
> +    struct dpdk_options_map {
> +        const char *ovs_configuration;
> +        const char *dpdk_option;
> +        bool default_enabled;
> +        const char *default_value;
> +    } opts[] = {
> +        {"dpdk_lcore_mask", "-c", true, "0x1"},
> +        {"dpdk_mem_channels", "-n", true, "4"},
> +        {"dpdk_alloc_mem", "-m", false, NULL},
> +        {"dpdk_socket_mem", "--socket-mem", true, "1024,0"},
> +        {"dpdk_hugepage_dir", "--huge-dir", false, NULL},
> +    };
> +    int i, ret = 1;
> +
> +    for(i = 0; i < (sizeof(opts) / sizeof(opts[0])); ++i) {
> +        const char *lookup = smap_get(&ovs_cfg->other_config,
> +                                      opts[i].ovs_configuration);
> +        if(!lookup && opts[i].default_enabled)
> +            lookup = opts[i].default_value;
> +
> +        if(lookup) {
> +            char **newargv = grow_argv(argv, ret, 2);
> +
> +            if (!newargv) {
> +                ovs_abort(0, "grow_argv() failed to allocate memory.");
> +            }
> +
> +            *argv = newargv;
> +            (*argv)[ret++] = strdup(opts[i].dpdk_option);
> +            (*argv)[ret++] = strdup(lookup);
> +        }
> +    }
> +
> +    return ret;
> +}
> +
> +static char **dpdk_argv;
> +static int dpdk_argc;
> +
> +static void
> +deferred_argv_release(void)
>  {
>      int result;
> -    int base = 0;
> -    char *pragram_name = argv[0];
> -    int err;
> -    int isset;
> -    cpu_set_t cpuset;
> +    for(result = 0; result < dpdk_argc; ++result)
> +        free(dpdk_argv[result]);

coding stds - curly braces

> 
> -    if (argc < 2 || strcmp(argv[1], "--dpdk"))
> -        return 0;
> +    free(dpdk_argv);
> +}
> 
> -    /* Remove the --dpdk argument from arg list.*/
> -    argc--;
> -    argv++;
> +static void
> +__dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
> +{
> +    char **argv = NULL;
> +    int result;
> +    int argc;
> +    int err;
> +    cpu_set_t cpuset;
> 
> -    /* Reject --user option */
> -    int i;
> -    for (i = 0; i < argc; i++) {
> -        if (!strcmp(argv[i], "--user")) {
> -            VLOG_ERR("Can not mix --dpdk and --user options, aborting.");
> -        }
> +    if(!smap_get_bool(&ovs_cfg->other_config, "dpdk", false)) {
> +        return;
>      }
> 
> +    VLOG_INFO("DPDK Enabled, initializing");
> +
>  #ifdef VHOST_CUSE
> -    if (process_vhost_flags("-cuse_dev_name", strdup("vhost-net"),
> -                            PATH_MAX, argv, &cuse_dev_name)) {
> +    if (process_vhost_flags("cuse_dev_name", strdup("vhost-net"),
> +                            PATH_MAX, ovs_cfg, &cuse_dev_name)) {
>  #else
> -    if (process_vhost_flags("-vhost_sock_dir", strdup(ovs_rundir()),
> -                            NAME_MAX, argv, &vhost_sock_dir)) {
> +    if (process_vhost_flags("vhost_sock_dir", strdup(ovs_rundir()),
> +                            NAME_MAX, ovs_cfg, &vhost_sock_dir)) {
>          struct stat s;
> -        int err;
> 
>          err = stat(vhost_sock_dir, &s);
>          if (err) {
> -            VLOG_ERR("vHostUser socket DIR '%s' does not exist.",
> -                     vhost_sock_dir);
> -            return err;
> +            ovs_abort(0, "vhost-user sock directory '%s' does not exist.",
> +                      vhost_sock_dir);
>          }
>  #endif
> -        /* Remove the vhost flag configuration parameters from the argument
> -         * list, so that the correct elements are passed to the DPDK
> -         * initialization function
> -         */
> -        argc -= 2;
> -        argv += 2;    /* Increment by two to bypass the vhost flag arguments
> */
> -        base = 2;
>      }
> 
>      /* Get the main thread affinity */
>      CPU_ZERO(&cpuset);
>      err = pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t),
> &cpuset);
>      if (err) {
> -        VLOG_ERR("Thread getaffinity error %d.", err);
> -        return err;
> +        ovs_abort(0, "Thread getaffinity error %d.", err);
>      }
> 
> -    /* Keep the program name argument as this is needed for call to
> -     * rte_eal_init()
> -     */
> -    argv[0] = pragram_name;
> +    argv = grow_argv(&argv, 0, 1);
> +    if (!argv) {
> +        ovs_abort(0, "Unable to allocate an initial argv.");
> +    }
> +    argv[0] = strdup("ovs"); /* TODO use prctl to get process name */
> +    argc = get_dpdk_args(ovs_cfg, &argv);
> +
> +    argv = grow_argv(&argv, argc, argc+1);
> +    if (!argv) {
> +        ovs_abort(0, "Unable to make final argv allocation.");
> +    }
> +    argv[argc] = 0;
> +
> +    optind = 1;
> 
>      /* Make sure things are initialized ... */
>      result = rte_eal_init(argc, argv);
> @@ -2204,21 +2260,33 @@ dpdk_init(int argc, char **argv)
>      /* Set the main thread affinity back to pre rte_eal_init() value */
>      err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t),
> &cpuset);
>      if (err) {
> -        VLOG_ERR("Thread setaffinity error %d", err);
> -        return err;
> +        ovs_abort(0, "Thread getaffinity error %d.", err);
>      }
> -
> +
> +    dpdk_argv = argv;
> +    dpdk_argc = argc;
> +
> +    atexit(deferred_argv_release);
> +
>      rte_memzone_dump(stdout);
>      rte_eal_init_ret = 0;
> 
> -    if (argc > result) {
> -        argv[result] = argv[0];
> -    }
> -
>      /* We are called from the main thread here */
>      RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
> +}
> +
> +void
> +dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
> +{
> +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +    static bool initialized = false; /* optimization to bail out of the
> +                                        ovsthread_once_start lock early */
> 
> -    return result + 1 + base;
> +    if (!initialized && ovs_cfg && ovsthread_once_start(&once)) {

If you are just worried about grabbing the mutex, then we don't need the static
initialized - we could move once_done() and after that once->done should do the
same thing.

> +        initialized = true;
> +        __dpdk_init(ovs_cfg);
> +        ovsthread_once_done(&once);
> +    }
>  }
> 
>  static const struct netdev_class dpdk_class =
> @@ -2282,10 +2350,6 @@ netdev_dpdk_register(void)
>  {
>      static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> 
> -    if (rte_eal_init_ret) {
> -        return;
> -    }
> -

Why did you remove this?

>      if (ovsthread_once_start(&once)) {
>          dpdk_common_init();
>          netdev_register_provider(&dpdk_class);
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index 646d3e2..6c1100a 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -22,7 +22,9 @@ struct dp_packet;
> 
>  #define NON_PMD_CORE_ID LCORE_ID_ANY
> 
> -int dpdk_init(int argc, char **argv);
> +struct ovsrec_open_vswitch;
> +
> +void dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg);
>  void netdev_dpdk_register(void);
>  void free_dpdk_buf(struct dp_packet *);
>  int pmd_thread_setaffinity_cpu(unsigned cpu);
> @@ -30,15 +32,20 @@ int pmd_thread_setaffinity_cpu(unsigned cpu);
>  #else
> 
>  #define NON_PMD_CORE_ID UINT32_MAX
> -
>  #include "util.h"
> 
> -static inline int
> -dpdk_init(int argc, char **argv)
> +static inline void
> +dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
>  {
> -    if (argc >= 2 && !strcmp(argv[1], "--dpdk")) {
> -        ovs_fatal(0, "DPDK support not built into this copy of Open
> vSwitch.");
> +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +
> +    if (ovs_cfg && ovsthread_once_start(&once)) {
> +        if(smap_get_bool(&ovs_cfg->other_config, "dpdk", false)) {
> +            ovs_fatal(0, "DPDK not supported in this copy of Open
> vSwitch.");
> +        }
> +        ovsthread_once_done(&once);
>      }
> +
>      return 0;
>  }
> 
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index f8afe55..22b1891 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -68,6 +68,7 @@
>  #include "sflow_api.h"
>  #include "vlan-bitmap.h"
>  #include "packets.h"
> +#include "lib/netdev-dpdk.h"
> 
>  VLOG_DEFINE_THIS_MODULE(bridge);
> 
> @@ -2922,6 +2923,8 @@ bridge_run(void)
>      }
>      cfg = ovsrec_open_vswitch_first(idl);
> 
> +    dpdk_init(cfg);
> +
>      /* Initialize the ofproto library.  This only needs to run once, but
>       * it must be done after the configuration is set.  If the
>       * initialization has already occurred, bridge_init_ofproto()
> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index e78ecda..c448107 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -48,7 +48,6 @@
>  #include "openvswitch/vconn.h"
>  #include "openvswitch/vlog.h"
>  #include "lib/vswitch-idl.h"
> -#include "lib/netdev-dpdk.h"
> 
>  VLOG_DEFINE_THIS_MODULE(vswitchd);
> 
> @@ -71,13 +70,6 @@ main(int argc, char *argv[])
>      int retval;
> 
>      set_program_name(argv[0]);
> -    retval = dpdk_init(argc,argv);
> -    if (retval < 0) {
> -        return retval;
> -    }
> -
> -    argc -= retval;
> -    argv += retval;
> 
>      ovs_cmdl_proctitle_init(argc, argv);
>      service_start(&argc, &argv);
> @@ -166,7 +158,7 @@ parse_options(int argc, char *argv[], char
> **unixctl_pathp)
>          {"bootstrap-ca-cert", required_argument, NULL,
> OPT_BOOTSTRAP_CA_CERT},
>          {"enable-dummy", optional_argument, NULL, OPT_ENABLE_DUMMY},
>          {"disable-system", no_argument, NULL, OPT_DISABLE_SYSTEM},
> -        {"dpdk", required_argument, NULL, OPT_DPDK},
> +        {"dpdk", optional_argument, NULL, OPT_DPDK},
>          {NULL, 0, NULL, 0},
>      };
>      char *short_options =
> ovs_cmdl_long_options_to_short_options(long_options);
> @@ -219,7 +211,7 @@ parse_options(int argc, char *argv[], char
> **unixctl_pathp)
>              exit(EXIT_FAILURE);
> 
>          case OPT_DPDK:
> -            ovs_fatal(0, "--dpdk must be given at beginning of command
> line.");
> +            ovs_fatal(0, "Using --dpdk to configure DPDK is not
> supported.");
>              break;
> 
>          default:
> @@ -256,17 +248,8 @@ usage(void)
>      daemon_usage();
>      vlog_usage();
>      printf("\nDPDK options:\n"
> -           "  --dpdk [VHOST] [DPDK]     Initialize DPDK datapath.\n"
> -           "  where DPDK are options for initializing DPDK lib and VHOST
> is\n"
> -#ifdef VHOST_CUSE
> -           "  option to override default character device name used for\n"
> -           "  for use with userspace vHost\n"
> -           "    -cuse_dev_name NAME\n"
> -#else
> -           "  option to override default directory where vhost-user\n"
> -           "  sockets are created.\n"
> -           "    -vhost_sock_dir DIR\n"
> -#endif
> +           "Configuration of DPDK via command-line is removed from this\n"
> +           "version of Open vSwitch. DPDK is configured through ovsdb.\n"

cool - I think this message will be beneficial for anyone changing over to this code.

>             );
>      printf("\nOther options:\n"
>             "  --unixctl=SOCKET          override default control socket
> name\n"
> --
> 2.6.1.133.gf5b6079




More information about the dev mailing list