[ovs-dev] [PATCH 3/8] netdev-dpdk: Convert initialization from cmdline to db

Aaron Conole aconole at redhat.com
Fri Apr 1 03:22:44 UTC 2016


Existing DPDK integration is provided by use of command line options which
must be split out and passed to librte in a special manner. However, this
forces any configuration to be passed by way of a special DPDK flag, and
interferes with ovs+dpdk packaging solutions.

This commit delays dpdk initialization until after the OVS database
connection is established, at which point ovs initializes librte. It
pulls all of the config data from the OVS database, and assembles a
new argv/argc pair to be passed along.

Signed-off-by: Aaron Conole <aconole at redhat.com>
---
v2:
* Removed trailing whitespace
* Followed for() loop brace coding style
* Automatically enable DPDK when adding a DPDK enabled port
* Fixed an issue on startup when DPDK enabled ports are present
* Updated the documentation (including vswitch.xml) and documented all
  new parameters
* Dropped the premature initialization test

v3:
* Improved description language in INSTALL.DPDK.md
* Fixed the ovs-vsctl examples for DPDK
* Returned to the global dpdk-init (bullet 3 from v2)
* Fixed a build error when compiling without dpdk support enabled
* converted to xstrdup, for consistency after rebasing

v4:
* No change

v5:
* Adjust the ovs-dev script to account for the new dpdk configuration
* Update the ovs-vswitchd.8.in pointing to INSTALL.DPDK.md

v6:
* Remove excess whitespace addition
* Correct INSTALL.DPDK.md regarding when DPDK is initialized
* Used incorrect variable in the non-dpdk case for testing against
  dpdk

v7:
* Account for mutually exclusive options;

v8:
* ``make check`` testing revealed a number of flaws in the initialization
  which resulted in memory corruption
* Fixed the ovs-vswitchd startup during testing

v9:
* Re-arrange the added headers in netdev-dpdk.c to try and be alphabetical
* Convert '-c' and '-n' options to be default non-inserted

v10:
* Documentation adjustment in vswitch.xml explicitly stating these values
  are not runtime configurable.
* Wrapped vhost_cuse_dev in #ifdef

v11:
* Whitespace cleanup
* Numerous error path cleanups
* Removed ACKs and Tested-bys due to the error path changes

 FAQ.md                     |   6 +-
 INSTALL.DPDK.md            |  81 +++++++++---
 lib/automake.mk            |   4 +
 lib/netdev-dpdk.c          | 301 +++++++++++++++++++++++++++++++++------------
 lib/netdev-dpdk.h          |  13 +-
 lib/netdev-nodpdk.c        |  21 ++++
 tests/ofproto-macros.at    |   3 +-
 utilities/ovs-dev.py       |  13 +-
 vswitchd/bridge.c          |   3 +
 vswitchd/ovs-vswitchd.8.in |   6 +-
 vswitchd/ovs-vswitchd.c    |  25 +---
 vswitchd/vswitch.xml       | 131 +++++++++++++++++++-
 12 files changed, 463 insertions(+), 144 deletions(-)
 create mode 100644 lib/netdev-nodpdk.c

diff --git a/FAQ.md b/FAQ.md
index 04ffc84..0750956 100644
--- a/FAQ.md
+++ b/FAQ.md
@@ -432,9 +432,9 @@ A: Yes.  How you configure it depends on what you mean by "promiscuous
 
 A: Firstly, you must have a DPDK-enabled version of Open vSwitch.
 
-   If your version is DPDK-enabled it will support the --dpdk
-   argument on the command line and will display lines with
-   "EAL:..." during startup when --dpdk is supplied.
+   If your version is DPDK-enabled it will support the other_config:dpdk-init
+   configuration in the database and will display lines with
+   "EAL:..." during startup when other_config:dpdk-init is set to 'true'.
 
    Secondly, when adding a DPDK port, unlike a system port, the
    type for the interface must be specified. For example;
diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 9ec8bf6..8397688 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -143,22 +143,64 @@ Using the DPDK with ovs-vswitchd:
 
 5. Start vswitchd:
 
-   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
-   argument. This needs to be first argument passed to vswitchd process.
-   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
-   for dpdk initialization.
+   DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
+   other_config column. The recognized configuration options are listed.
+   Defaults will be provided for all values not explicitly set.
+
+   * dpdk-init
+   Specifies whether OVS should initialize and support DPDK ports. This is
+   a boolean, and defaults to false.
+
+   * dpdk-lcore-mask
+   Specifies the CPU cores on which dpdk lcore threads should be spawned.
+   The DPDK lcore threads are used for DPDK library tasks, such as
+   library internal message processing, logging, etc. Value should be in
+   the form of a hex string (so '0x123') similar to the 'taskset' mask
+   input.
+   If not specified, the value will be determined by choosing the lowest
+   CPU core from initial cpu affinity list. Otherwise, the value will be
+   passed directly to the DPDK library.
+   For performance reasons, it is best to set this to a single core on
+   the system, rather than allow lcore threads to float.
+
+   * dpdk-mem-channels
+   This sets the number of memory spread channels per CPU socket. It is purely
+   an optimization flag.
+
+   * dpdk-alloc-mem
+   This sets the total memory to preallocate from hugepages regardless of
+   processor socket. It is recommended to use dpdk-socket-mem instead.
+
+   * dpdk-socket-mem
+   Comma separated list of memory to pre-allocate from hugepages on specific
+   sockets.
+
+   * dpdk-hugepage-dir
+   Directory where hugetlbfs is mounted
+
+   * cuse-dev-name
+   Option to set the vhost_cuse character device name.
+
+   * vhost-sock-dir
+   Option to set the path to the vhost_user unix socket files.
+
+   NOTE: Changing any of these options requires restarting the ovs-vswitchd
+   application.
+
+   Open vSwitch can be started as normal. DPDK will be initialized as long
+   as the dpdk-init option has been set to 'true'.
 
    ```
    export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
-   ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
+   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
    ```
 
    If allocated more than one GB hugepage (as for IVSHMEM), set amount and
    use NUMA node 0 memory:
 
    ```
-   ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
-   -- unix:$DB_SOCK --pidfile --detach
+   ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk_socket_mem="1024,0"
+   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
    ```
 
 6. Add bridge & ports
@@ -545,11 +587,13 @@ in the names.
      `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
      to your VM on the QEMU command line. More instructions on this can be
      found in the next section "DPDK vhost-user VM configuration"
-     Note: If you wish for the vhost-user sockets to be created in a
-     directory other than `/usr/local/var/run/openvswitch`, you may specify
-     another location on the ovs-vswitchd command line like so:
 
-      `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
+  - If you wish for the vhost-user sockets to be created in a directory other
+    than `/usr/local/var/run/openvswitch`, you may specify another location
+    in the ovsdb like:
+
+    `./vswitchd/ovs-vsctl --no-wait \
+        set Open_vSwitch . other_config:vhost-sock-dir=path`
 
 DPDK vhost-user VM configuration:
 ---------------------------------
@@ -697,14 +741,13 @@ DPDK vhost-cuse VM configuration:
 
 1. This step is only needed if using an alternative character device.
 
-   The new character device filename must be specified on the vswitchd
-   commandline:
+   The new character device filename must be specified in the ovsdb:
 
-        `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...`
+   `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \
+                          other_config:cuse-dev-name=my-vhost-net`
 
-   Note that the `--cuse_dev_name` argument and associated string must be the first
-   arguments after `--dpdk` and come before the EAL arguments. In the example
-   above, the character device to be used will be `/dev/my-vhost-net`.
+   In the example above, the character device to be used will be
+   `/dev/my-vhost-net`.
 
 2. This step is only needed if reusing the standard character device. It will
    conflict with the kernel vhost character device so the user must first
@@ -817,8 +860,8 @@ steps.
 
         <my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-net`
         device. If you have specificed a different name on the ovs-vswitchd
-        commandline using the "--cuse_dev_name" parameter, please specify that
-        filename instead.
+        database using the "other_config:cuse-dev-name" parameter, please
+        specify that filename instead.
 
      2. Disable SELinux or set to permissive mode
 
diff --git a/lib/automake.mk b/lib/automake.mk
index 7b829d0..3629a6f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -364,6 +364,10 @@ if DPDK_NETDEV
 lib_libopenvswitch_la_SOURCES += \
        lib/netdev-dpdk.c \
        lib/netdev-dpdk.h
+else
+lib_libopenvswitch_la_SOURCES += \
+       lib/netdev-nodpdk.c \
+       lib/netdev-dpdk.h
 endif
 
 if WIN32
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 3b954e5..152b66f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -29,6 +29,7 @@
 #include <stdio.h>
 #include <sys/types.h>
 #include <sys/stat.h>
+#include <getopt.h>
 
 #include "dirs.h"
 #include "dp-packet.h"
@@ -51,6 +52,7 @@
 #include "timeval.h"
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
+#include "vswitch-idl.h"
 
 #include "rte_config.h"
 #include "rte_mbuf.h"
@@ -107,7 +109,9 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
 #define OVS_VHOST_QUEUE_DISABLED    (-2) /* Queue was disabled by guest and not
                                           * yet mapped to another queue. */
 
+#ifdef VHOST_CUSE
 static char *cuse_dev_name = NULL;    /* Character device cuse_dev_name. */
+#endif
 static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
 
 /*
@@ -649,8 +653,15 @@ netdev_dpdk_cast(const struct netdev *netdev)
 static struct netdev *
 netdev_dpdk_alloc(void)
 {
-    struct netdev_dpdk *netdev = dpdk_rte_mzalloc(sizeof *netdev);
-    return &netdev->up;
+    struct netdev_dpdk *netdev;
+
+    if (!rte_eal_init_ret) { /* Only after successful initialization */
+        netdev = dpdk_rte_mzalloc(sizeof *netdev);
+        if (netdev) {
+            return &netdev->up;
+        }
+    }
+    return NULL;
 }
 
 static void
@@ -783,6 +794,10 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev_)
     struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
     int err;
 
+    if (rte_eal_init_ret) {
+        return rte_eal_init_ret;
+    }
+
     ovs_mutex_lock(&dpdk_mutex);
     strncpy(netdev->vhost_id, netdev->up.name, sizeof(netdev->vhost_id));
     err = vhost_construct_helper(netdev_);
@@ -807,6 +822,10 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
         return EINVAL;
     }
 
+    if (rte_eal_init_ret) {
+        return rte_eal_init_ret;
+    }
+
     ovs_mutex_lock(&dpdk_mutex);
     /* Take the name of the vhost-user port and append it to the location where
      * the socket is to be created, then register the socket.
@@ -2250,28 +2269,12 @@ dpdk_vhost_class_init(void)
 static int
 dpdk_vhost_cuse_class_init(void)
 {
-    int err = -1;
-
-
-    /* Register CUSE device to handle IOCTLs.
-     * Unless otherwise specified on the vswitchd command line, cuse_dev_name
-     * is set to vhost-net.
-     */
-    err = rte_vhost_driver_register(cuse_dev_name);
-
-    if (err != 0) {
-        VLOG_ERR("CUSE device setup failure.");
-        return -1;
-    }
-
-    dpdk_vhost_class_init();
     return 0;
 }
 
 static int
 dpdk_vhost_user_class_init(void)
 {
-    dpdk_vhost_class_init();
     return 0;
 }
 
@@ -2281,8 +2284,6 @@ dpdk_common_init(void)
     unixctl_command_register("netdev-dpdk/set-admin-state",
                              "[netdev] up|down", 1, 2,
                              netdev_dpdk_set_admin_state, NULL);
-
-    ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
 }
 
 /* Client Rings */
@@ -2731,22 +2732,20 @@ static const struct dpdk_qos_ops egress_policer_ops = {
 
 static int
 process_vhost_flags(char *flag, char *default_val, int size,
-                    char **argv, char **new_val)
+                    const struct ovsrec_open_vswitch *ovs_cfg,
+                    char **new_val)
 {
+    const char *val;
     int changed = 0;
 
+    val = smap_get(&ovs_cfg->other_config, flag);
+
     /* Depending on which version of vhost is in use, process the vhost-specific
-     * flag if it is provided on the vswitchd command line, otherwise resort to
-     * a default value.
-     *
-     * For vhost-user: Process "-vhost_sock_dir" to set the custom location of
-     * the vhost-user socket(s).
-     * For vhost-cuse: Process "-cuse_dev_name" to set the custom name of the
-     * vhost-cuse character device.
+     * flag if it is provided, otherwise resort to a default value.
      */
-    if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) {
+    if (val && (strlen(val) <= size)) {
         changed = 1;
-        *new_val = xstrdup(argv[2]);
+        *new_val = xstrdup(val);
         VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
     } else {
         VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
@@ -2756,54 +2755,169 @@ process_vhost_flags(char *flag, char *default_val, int size,
     return changed;
 }
 
-int
-dpdk_init(int argc, char **argv)
+static char **
+grow_argv(char ***argv, size_t cur_siz, size_t grow_by)
 {
-    int result;
-    int base = 0;
-    char *pragram_name = argv[0];
-    int err;
-    int isset;
-    cpu_set_t cpuset;
+    return xrealloc(*argv, sizeof(char *) * (cur_siz + grow_by));
+}
 
-    if (argc < 2 || strcmp(argv[1], "--dpdk"))
-        return 0;
+static void
+dpdk_option_extend(char ***argv, int argc, const char *option,
+                   const char *value)
+{
+    char **newargv = grow_argv(argv, argc, 2);
 
-    /* Remove the --dpdk argument from arg list.*/
-    argc--;
-    argv++;
+    *argv = newargv;
+    newargv[argc] = xstrdup(option);
+    newargv[argc+1] = xstrdup(value);
+}
 
-    /* Reject --user option */
-    int i;
-    for (i = 0; i < argc; i++) {
-        if (!strcmp(argv[i], "--user")) {
-            VLOG_ERR("Can not mix --dpdk and --user options, aborting.");
+static int
+construct_dpdk_options(const struct ovsrec_open_vswitch *ovs_cfg,
+                       char ***argv, const int initial_size)
+{
+    struct dpdk_options_map {
+        const char *ovs_configuration;
+        const char *dpdk_option;
+        bool default_enabled;
+        const char *default_value;
+    } opts[] = {
+        {"dpdk-lcore-mask", "-c", false, NULL},
+        {"dpdk-mem-channels", "-n", false, NULL},
+        {"dpdk-hugepage-dir", "--huge-dir", false, NULL},
+    };
+
+    int i, ret = initial_size;
+
+    /*First, construct from the flat-options (non-mutex)*/
+    for (i = 0; i < ARRAY_SIZE(opts); ++i) {
+        const char *lookup = smap_get(&ovs_cfg->other_config,
+                                      opts[i].ovs_configuration);
+        if (!lookup && opts[i].default_enabled) {
+            lookup = opts[i].default_value;
+        }
+
+        if (lookup) {
+            dpdk_option_extend(argv, ret, opts[i].dpdk_option, lookup);
+            ret += 2;
         }
     }
 
+    return ret;
+}
+
+#define MAX_DPDK_EXCL_OPTS 10
+
+static int
+construct_dpdk_mutex_options(const struct ovsrec_open_vswitch *ovs_cfg,
+                             char ***argv, const int initial_size)
+{
+    struct dpdk_exclusive_options_map {
+        const char *category;
+        const char *ovs_dpdk_options[MAX_DPDK_EXCL_OPTS];
+        const char *eal_dpdk_options[MAX_DPDK_EXCL_OPTS];
+        const char *default_value;
+        int default_option;
+    } excl_opts[] = {
+        {"memory type",
+         {"dpdk-alloc-mem", "dpdk-socket-mem", NULL,},
+         {"-m",             "--socket-mem",    NULL,},
+         "1024,0", 1
+        },
+    };
+
+    int i, ret = initial_size;
+    for (i = 0; i < ARRAY_SIZE(excl_opts); ++i) {
+        int found_opts = 0, scan, found_pos = -1;
+        const char *found_value;
+        struct dpdk_exclusive_options_map *popt = &excl_opts[i];
+
+        for (scan = 0; scan < MAX_DPDK_EXCL_OPTS
+                 && popt->ovs_dpdk_options[scan]; ++scan) {
+            const char *lookup = smap_get(&ovs_cfg->other_config,
+                                          popt->ovs_dpdk_options[scan]);
+            if (lookup && strlen(lookup)) {
+                found_opts++;
+                found_pos = scan;
+                found_value = lookup;
+            }
+        }
+
+        if (!found_opts) {
+            if (popt->default_option) {
+                found_pos = popt->default_option;
+                found_value = popt->default_value;
+            } else {
+                continue;
+            }
+        }
+
+        if (found_opts > 1) {
+            VLOG_ERR("Multiple defined options for %s. Please check your"
+                     " database settings and reconfigure if necessary.",
+                     popt->category);
+        }
+
+        dpdk_option_extend(argv, ret, popt->eal_dpdk_options[found_pos],
+                           found_value);
+        ret += 2;
+    }
+
+    return ret;
+}
+
+static int
+get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
+{
+    int i = construct_dpdk_options(ovs_cfg, argv, 1);
+    i = construct_dpdk_mutex_options(ovs_cfg, argv, i);
+    return i;
+}
+
+static char **dpdk_argv;
+static int dpdk_argc;
+
+static void
+deferred_argv_release(void)
+{
+    int result;
+    for (result = 0; result < dpdk_argc; ++result) {
+        free(dpdk_argv[result]);
+    }
+
+    free(dpdk_argv);
+}
+
+static void
+dpdk_init__(const struct ovsrec_open_vswitch *ovs_cfg)
+{
+    char **argv = NULL;
+    int result;
+    int argc;
+    int err;
+    cpu_set_t cpuset;
+
+    if (!smap_get_bool(&ovs_cfg->other_config, "dpdk-init", false)) {
+        VLOG_INFO("DPDK Disabled - to change this requires a restart.\n");
+        return;
+    }
+
+    VLOG_INFO("DPDK Enabled, initializing");
+
 #ifdef VHOST_CUSE
-    if (process_vhost_flags("-cuse_dev_name", xstrdup("vhost-net"),
-                            PATH_MAX, argv, &cuse_dev_name)) {
+    if (process_vhost_flags("cuse-dev-name", xstrdup("vhost-net"),
+                            PATH_MAX, ovs_cfg, &cuse_dev_name)) {
 #else
-    if (process_vhost_flags("-vhost_sock_dir", xstrdup(ovs_rundir()),
-                            NAME_MAX, argv, &vhost_sock_dir)) {
+    if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
+                            NAME_MAX, ovs_cfg, &vhost_sock_dir)) {
         struct stat s;
-        int err;
 
         err = stat(vhost_sock_dir, &s);
         if (err) {
-            VLOG_ERR("vHostUser socket DIR '%s' does not exist.",
+            VLOG_ERR("vhost-user sock directory '%s' does not exist.",
                      vhost_sock_dir);
-            return err;
         }
 #endif
-        /* Remove the vhost flag configuration parameters from the argument
-         * list, so that the correct elements are passed to the DPDK
-         * initialization function
-         */
-        argc -= 2;
-        argv += 2;    /* Increment by two to bypass the vhost flag arguments */
-        base = 2;
     }
 
     /* Get the main thread affinity */
@@ -2811,13 +2925,16 @@ dpdk_init(int argc, char **argv)
     err = pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
     if (err) {
         VLOG_ERR("Thread getaffinity error %d.", err);
-        return err;
     }
 
-    /* Keep the program name argument as this is needed for call to
-     * rte_eal_init()
-     */
-    argv[0] = pragram_name;
+    argv = grow_argv(&argv, 0, 1);
+    argv[0] = xstrdup(ovs_get_program_name());
+    argc = get_dpdk_args(ovs_cfg, &argv);
+
+    argv = grow_argv(&argv, argc, 1);
+    argv[argc] = 0;
+
+    optind = 1;
 
     /* Make sure things are initialized ... */
     result = rte_eal_init(argc, argv);
@@ -2826,23 +2943,51 @@ dpdk_init(int argc, char **argv)
     }
 
     /* Set the main thread affinity back to pre rte_eal_init() value */
-    err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
-    if (err) {
-        VLOG_ERR("Thread setaffinity error %d", err);
-        return err;
+    if (!err) {
+        err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t),
+                                     &cpuset);
+        if (err) {
+            VLOG_ERR("Thread setaffinity error %d.", err);
+        }
     }
 
+    dpdk_argv = argv;
+    dpdk_argc = argc;
+
+    atexit(deferred_argv_release);
+
     rte_memzone_dump(stdout);
     rte_eal_init_ret = 0;
 
-    if (argc > result) {
-        argv[result] = argv[0];
-    }
-
     /* We are called from the main thread here */
     RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
 
-    return result + 1 + base;
+    ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
+
+#ifdef VHOST_CUSE
+    /* Register CUSE device to handle IOCTLs.
+     * Unless otherwise specified, cuse_dev_name is set to vhost-net.
+     */
+    err = rte_vhost_driver_register(cuse_dev_name);
+
+    if (err != 0) {
+        VLOG_ERR("CUSE device setup failure.");
+        return;
+    }
+#endif
+
+    dpdk_vhost_class_init();
+}
+
+void
+dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
+{
+    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+
+    if (ovs_cfg && ovsthread_once_start(&once)) {
+        dpdk_init__(ovs_cfg);
+        ovsthread_once_done(&once);
+    }
 }
 
 static const struct netdev_class dpdk_class =
@@ -2906,10 +3051,6 @@ netdev_dpdk_register(void)
 {
     static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 
-    if (rte_eal_init_ret) {
-        return;
-    }
-
     if (ovsthread_once_start(&once)) {
         dpdk_common_init();
         netdev_register_provider(&dpdk_class);
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 646d3e2..b400055 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -4,6 +4,7 @@
 #include <config.h>
 
 struct dp_packet;
+struct ovsrec_open_vswitch;
 
 #ifdef DPDK_NETDEV
 
@@ -22,7 +23,7 @@ struct dp_packet;
 
 #define NON_PMD_CORE_ID LCORE_ID_ANY
 
-int dpdk_init(int argc, char **argv);
+void dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg);
 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
 int pmd_thread_setaffinity_cpu(unsigned cpu);
@@ -30,17 +31,9 @@ int pmd_thread_setaffinity_cpu(unsigned cpu);
 #else
 
 #define NON_PMD_CORE_ID UINT32_MAX
-
 #include "util.h"
 
-static inline int
-dpdk_init(int argc, char **argv)
-{
-    if (argc >= 2 && !strcmp(argv[1], "--dpdk")) {
-        ovs_fatal(0, "DPDK support not built into this copy of Open vSwitch.");
-    }
-    return 0;
-}
+void dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg);
 
 static inline void
 netdev_dpdk_register(void)
diff --git a/lib/netdev-nodpdk.c b/lib/netdev-nodpdk.c
new file mode 100644
index 0000000..28e637a
--- /dev/null
+++ b/lib/netdev-nodpdk.c
@@ -0,0 +1,21 @@
+#include <config.h>
+#include "netdev-dpdk.h"
+#include "smap.h"
+#include "ovs-thread.h"
+#include "openvswitch/vlog.h"
+#include "vswitch-idl.h"
+
+VLOG_DEFINE_THIS_MODULE(dpdk);
+
+void
+dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg OVS_UNUSED)
+{
+    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+
+    if (ovs_cfg && ovsthread_once_start(&once)) {
+        if (smap_get_bool(&ovs_cfg->other_config, "dpdk-init", false)) {
+            VLOG_ERR("DPDK not supported in this copy of Open vSwitch.");
+        }
+        ovsthread_once_done(&once);
+    }
+}
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 2c52eca..0d631b2 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -281,7 +281,8 @@ m4_define([_OVS_VSWITCHD_START],
 /reconnect|INFO|/d
 /ofproto|INFO|using datapath ID/d
 /netdev_linux|INFO|.*device has unknown hardware address family/d
-/ofproto|INFO|datapath ID changed to fedcba9876543210/d']])
+/ofproto|INFO|datapath ID changed to fedcba9876543210/d
+/dpdk|INFO|DPDK Disabled - to change this requires a restart./d']])
 ])
 
 # OVS_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py
index c121706..a74b528 100755
--- a/utilities/ovs-dev.py
+++ b/utilities/ovs-dev.py
@@ -265,9 +265,11 @@ def run():
     cmd = [build + "/vswitchd/ovs-vswitchd"]
 
     if options.dpdk:
-        cmd.append("--dpdk")
-        cmd.extend(options.dpdk)
-        cmd.append("--")
+        _sh("ovs-vsctl --no-wait set Open_vSwitch %s " \
+            "other_config:dpdk-init=true" % root_uuid)
+    else:
+        _sh("ovs-vsctl --no-wait set Open_vSwitch %s " \
+            "other_config:dpdk-init=false" % root_uuid)
 
     if options.gdb:
         cmd = ["gdb", "--args"] + cmd
@@ -421,9 +423,8 @@ def main():
                      help="run ovs-vswitchd under gdb")
     group.add_option("--valgrind", dest="valgrind", action="store_true",
                      help="run ovs-vswitchd under valgrind")
-    group.add_option("--dpdk", dest="dpdk", action="callback",
-                     callback=parse_subargs,
-                     help="run ovs-vswitchd with dpdk subopts (ended by --)")
+    group.add_option("--dpdk", dest="dpdk", action="store_true",
+                     help="run ovs-vswitchd with dpdk")
     group.add_option("--clang", dest="clang", action="store_true",
                      help="Use binaries built by clang")
     group.add_option("--user", dest="user", action="store", default="",
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index cfe313b..e22e815 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -68,6 +68,7 @@
 #include "sflow_api.h"
 #include "vlan-bitmap.h"
 #include "packets.h"
+#include "lib/netdev-dpdk.h"
 
 VLOG_DEFINE_THIS_MODULE(bridge);
 
@@ -2927,6 +2928,8 @@ bridge_run(void)
     }
     cfg = ovsrec_open_vswitch_first(idl);
 
+    dpdk_init(cfg);
+
     /* Initialize the ofproto library.  This only needs to run once, but
      * it must be done after the configuration is set.  If the
      * initialization has already occurred, bridge_init_ofproto()
diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
index 601b7a1..3dacfc3 100644
--- a/vswitchd/ovs-vswitchd.8.in
+++ b/vswitchd/ovs-vswitchd.8.in
@@ -84,9 +84,9 @@ only allow privileged users, such as the superuser, to use it.
 unavailable or unsuccessful.
 .
 .SS "DPDK Options"
-.IP "\fB\-\-dpdk\fR"
-Initialize \fBovs\-vswitchd\fR DPDK datapath.  Refer to INSTALL.DPDK
-for details.
+For details on initializing the \fBovs\-vswitchd\fR DPDK datapath,
+refer to INSTALL.DPDK.md or \fBovs\-vswitchd.conf.db\fR(5) for
+details.
 .SS "Daemon Options"
 .ds DD \
 \fBovs\-vswitchd\fR detaches only after it has connected to the \
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index e78ecda..c448107 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -48,7 +48,6 @@
 #include "openvswitch/vconn.h"
 #include "openvswitch/vlog.h"
 #include "lib/vswitch-idl.h"
-#include "lib/netdev-dpdk.h"
 
 VLOG_DEFINE_THIS_MODULE(vswitchd);
 
@@ -71,13 +70,6 @@ main(int argc, char *argv[])
     int retval;
 
     set_program_name(argv[0]);
-    retval = dpdk_init(argc,argv);
-    if (retval < 0) {
-        return retval;
-    }
-
-    argc -= retval;
-    argv += retval;
 
     ovs_cmdl_proctitle_init(argc, argv);
     service_start(&argc, &argv);
@@ -166,7 +158,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp)
         {"bootstrap-ca-cert", required_argument, NULL, OPT_BOOTSTRAP_CA_CERT},
         {"enable-dummy", optional_argument, NULL, OPT_ENABLE_DUMMY},
         {"disable-system", no_argument, NULL, OPT_DISABLE_SYSTEM},
-        {"dpdk", required_argument, NULL, OPT_DPDK},
+        {"dpdk", optional_argument, NULL, OPT_DPDK},
         {NULL, 0, NULL, 0},
     };
     char *short_options = ovs_cmdl_long_options_to_short_options(long_options);
@@ -219,7 +211,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp)
             exit(EXIT_FAILURE);
 
         case OPT_DPDK:
-            ovs_fatal(0, "--dpdk must be given at beginning of command line.");
+            ovs_fatal(0, "Using --dpdk to configure DPDK is not supported.");
             break;
 
         default:
@@ -256,17 +248,8 @@ usage(void)
     daemon_usage();
     vlog_usage();
     printf("\nDPDK options:\n"
-           "  --dpdk [VHOST] [DPDK]     Initialize DPDK datapath.\n"
-           "  where DPDK are options for initializing DPDK lib and VHOST is\n"
-#ifdef VHOST_CUSE
-           "  option to override default character device name used for\n"
-           "  for use with userspace vHost\n"
-           "    -cuse_dev_name NAME\n"
-#else
-           "  option to override default directory where vhost-user\n"
-           "  sockets are created.\n"
-           "    -vhost_sock_dir DIR\n"
-#endif
+           "Configuration of DPDK via command-line is removed from this\n"
+           "version of Open vSwitch. DPDK is configured through ovsdb.\n"
            );
     printf("\nOther options:\n"
            "  --unixctl=SOCKET          override default control socket name\n"
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 7d6976f..c350247 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -171,11 +171,63 @@
         </p>
       </column>
 
+      <column name="other_config" key="n-dpdk-rxqs"
+              type='{"type": "integer", "minInteger": 1}'>
+        <p>
+          Specifies the maximum number of rx queues to be created for each dpdk
+          interface.  If not specified or specified to 0, one rx queue will
+          be created for each dpdk interface by default.
+        </p>
+      </column>
+
+      <column name="other_config" key="dpdk-init"
+              type='{"type": "boolean"}'>
+        <p>
+          Set this value to <code>true</code> to enable runtime support for
+          DPDK ports. The vswitch must have compile-time support for DPDK as
+          well.
+        </p>
+        <p>
+          The default value is <code>false</code>. Changing this value requires
+          restarting the daemon
+        </p>
+        <p>
+          If this value is <code>false</code> at startup, any dpdk ports which
+          are configured in the bridge will fail due to memory errors.
+        </p>
+      </column>
+
+      <column name="other_config" key="dpdk-lcore-mask"
+              type='{"type": "integer", "minInteger": 1}'>
+        <p>
+          Specifies the CPU cores where dpdk lcore threads should be spawned.
+          The DPDK lcore threads are used for DPDK library tasks, such as
+          library internal message processing, logging, etc. Value should be in
+          the form of a hex string (so '0x123') similar to the 'taskset' mask
+          input.
+        </p>
+        <p>
+          The lowest order bit corresponds to the first CPU core.  A set bit
+          means the corresponding core is available and an lcore thread will be
+          created and pinned to it.  If the input does not cover all cores,
+          those uncovered cores are considered not set.
+        </p>
+        <p>
+          For performance reasons, it is best to set this to a single core on
+          the system, rather than allow lcore threads to float.
+        </p>
+        <p>
+          If not specified, the value will be determined by choosing the lowest
+          CPU core from initial cpu affinity list. Otherwise, the value will be
+          passed directly to the DPDK library.
+        </p>
+      </column>
+
       <column name="other_config" key="pmd-cpu-mask">
         <p>
           Specifies CPU mask for setting the cpu affinity of PMD (Poll
           Mode Driver) threads.  Value should be in the form of hex string,
-          similar to the dpdk EAL '-c COREMASK' option input or the 'taskset'
+          similar to the dpdk-lcore-mask option input or the 'taskset'
           mask input.
         </p>
         <p>
@@ -190,6 +242,83 @@
         </p>
       </column>
 
+      <column name="other_config" key="dpdk-mem-channels"
+              type='{"type": "integer", "minInteger": 1}'>
+        <p>
+          Specifies the number of NUMA memory spread channels in the CPU to
+          be used by DPDK. It is purely optimization.
+        </p>
+        <p>
+          The default value is 4. Changing this value requires restarting the
+          daemon.
+        </p>
+      </column>
+
+      <column name="other_config" key="dpdk-alloc-mem"
+              type='{"type": "integer", "minInteger": 0}'>
+        <p>
+          Specifies the amount of memory to preallocate from the hugepage pool,
+          regardless of socket. It is recommended that dpdk-socket-mem is used
+          instead.
+        </p>
+        <p>
+          If not specified, the value is 0. Changing this value requires
+          restarting the daemon.
+        </p>
+      </column>
+
+      <column name="other_config" key="dpdk-socket-mem"
+              type='{"type": "string"}'>
+        <p>
+          Specifies the amount of memory to preallocate from the hugepage pool,
+          on a per-socket basis.
+        </p>
+        <p>
+          The specifier is a comma-separated string, in ascending order of CPU
+          socket (ex: 1024,2048,4096,8192 would set socket 0 to preallocate
+          1024MB, socket 1 to preallocate 2048MB, etc.)
+        </p>
+        <p>
+          If not specified, the default value is 1024,0. Changing this value
+          requires restarting the daemon.
+        </p>
+      </column>
+
+      <column name="other_config" key="dpdk-hugepage-dir"
+              type='{"type": "string"}'>
+        <p>
+          Specifies the path to the hugetlbfs mount point.
+        </p>
+        <p>
+          If not specified, this will be guessed by the DPDK library (default
+          is /dev/hugepages). Changing this value requires restarting the
+          daemon.
+        </p>
+      </column>
+
+      <column name="other_config" key="cuse-dev-name"
+              type='{"type": "string"}'>
+        <p>
+          Specifies the name of the vhost-cuse character device to open for
+          vhost-cuse support.
+        </p>
+        <p>
+          The default is vhost-net. Changing this value requires restarting
+          the daemon.
+        </p>
+      </column>
+
+      <column name="other_config" key="vhost-sock-dir"
+              type='{"type": "string"}'>
+        <p>
+          Specifies the path to the vhost-user unix domain socket files.
+        </p>
+        <p>
+          Defaults to the working directory of the application. Changing this
+          value requires restarting the daemon.
+        </p>
+      </column>
+
       <column name="other_config" key="n-handler-threads"
               type='{"type": "integer", "minInteger": 1}'>
         <p>
-- 
2.5.5



More information about the dev mailing list