[ovs-dev] [PATCH v10 2/6] netdev-dpdk: Convert initialization from cmdline to db

Daniele Di Proietto diproiettod at vmware.com
Mon Mar 28 21:59:13 UTC 2016


I still have some comment:

dpdk-mem-channels: This is not required by DPDK anymore, so I still
think that's not necessary and could be removed.  If someone want

I think we shouldn't abort if we fail something during the initialization.
I know that rte_eal_init() can still abort, but I want to avoid it as
much as possible in OVS (It's still ok to abort for memory
allocation failure, as we do in the rest of OVS).  I've commented inline
when ovs_abort() is used.

Since we only use the "other_config" member of "struct
ovsrec_open_vswitch",
I would avoid passing the whole structure to dpdk_init().  We can avoid
including "vswitch-idl.h".

On 09/03/2016 09:38, "Aaron Conole" <aconole at redhat.com> wrote:

>Existing DPDK integration is provided by use of command line options which
>must be split out and passed to librte in a special manner. However, this
>forces any configuration to be passed by way of a special DPDK flag, and
>interferes with ovs+dpdk packaging solutions.
>
>This commit delays dpdk initialization until after the OVS database
>connection is established, at which point ovs initializes librte. It
>pulls all of the config data from the OVS database, and assembles a
>new argv/argc pair to be passed along.
>
>Signed-off-by: Aaron Conole <aconole at redhat.com>
>Tested-by: Sean K Mooney <sean.k.mooney at intel.com>
>Tested-by: RobertX Wojciechowicz <robertx.wojciechowicz at intel.com>
>Tested-by: Kevin Traynor <kevin.traynor at intel.com>
>Acked-by: Panu Matilainen <pmatilai at redhat.com>
>Acked-by: Kevin Traynor <kevin.traynor at intel.com>
>Acked-by: Flavio Leitner <fbl at sysclose.org>
>---
>v2:
>* Removed trailing whitespace
>* Followed for() loop brace coding style
>* Automatically enable DPDK when adding a DPDK enabled port
>* Fixed an issue on startup when DPDK enabled ports are present
>* Updated the documentation (including vswitch.xml) and documented all
>  new parameters
>* Dropped the premature initialization test
>
>v3:
>* Improved description language in INSTALL.DPDK.md
>* Fixed the ovs-vsctl examples for DPDK
>* Returned to the global dpdk-init (bullet 3 from v2)
>* Fixed a build error when compiling without dpdk support enabled
>* converted to xstrdup, for consistency after rebasing
>
>v4:
>* No change
>
>v5:
>* Adjust the ovs-dev script to account for the new dpdk configuration
>* Update the ovs-vswitchd.8.in pointing to INSTALL.DPDK.md
>
>v6:
>* Remove excess whitespace addition
>* Correct INSTALL.DPDK.md regarding when DPDK is initialized
>* Used incorrect variable in the non-dpdk case for testing against
>  dpdk
>
>v7:
>* Account for mutually exclusive options;
>
>v8:
>* ``make check`` testing revealed a number of flaws in the initialization
>  which resulted in memory corruption
>* Fixed the ovs-vswitchd startup during testing
>
>v9:
>* Re-arrange the added headers in netdev-dpdk.c to try and be alphabetical
>* Convert '-c' and '-n' options to be default non-inserted
>
>v10:
>* Documentation adjustment in vswitch.xml explicitly stating these values
>  are not runtime configurable.
>* Wrapped vhost_cuse_dev in #ifdef
>
> FAQ.md                     |   6 +-
> INSTALL.DPDK.md            |  81 +++++++++---
> lib/netdev-dpdk.c          | 308
>+++++++++++++++++++++++++++++++++------------
> lib/netdev-dpdk.h          |  22 ++--
> tests/ofproto-macros.at    |   3 +-
> utilities/ovs-dev.py       |  11 +-
> vswitchd/bridge.c          |   3 +
> vswitchd/ovs-vswitchd.8.in |   5 +-
> vswitchd/ovs-vswitchd.c    |  25 +---
> vswitchd/vswitch.xml       | 131 ++++++++++++++++++-
> 10 files changed, 454 insertions(+), 141 deletions(-)
>
>diff --git a/FAQ.md b/FAQ.md
>index 8bd7ab9..018e6ae 100644
>--- a/FAQ.md
>+++ b/FAQ.md
>@@ -431,9 +431,9 @@ A: Yes.  How you configure it depends on what you
>mean by "promiscuous
> 
> A: Firstly, you must have a DPDK-enabled version of Open vSwitch.
> 
>-   If your version is DPDK-enabled it will support the --dpdk
>-   argument on the command line and will display lines with
>-   "EAL:..." during startup when --dpdk is supplied.
>+   If your version is DPDK-enabled it will support the
>other_config:dpdk-init
>+   configuration in the database and will display lines with
>+   "EAL:..." during startup when other_config:dpdk-init is set to 'true'.
> 
>    Secondly, when adding a DPDK port, unlike a system port, the
>    type for the interface must be specified. For example;
>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>index 1fc1b66..613764f 100644
>--- a/INSTALL.DPDK.md
>+++ b/INSTALL.DPDK.md
>@@ -143,22 +143,64 @@ Using the DPDK with ovs-vswitchd:
> 
> 5. Start vswitchd:
> 
>-   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
>-   argument. This needs to be first argument passed to vswitchd process.
>-   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
>-   for dpdk initialization.
>+   DPDK configuration arguments can be passed to vswitchd via
>Open_vSwitch
>+   other_config database. The recognized configuration options are

"other_config" is a column in the "Open_vSwitch" table

> 
>listed.
>+   Defaults will be provided for all values not explicitly set.
>+
>+   * dpdk-init
>+   Specifies whether OVS should initialize and support DPDK ports. This
>is
>+   a boolean, and defaults to false.
>+
>+   * dpdk-lcore-mask
>+   Specifies the CPU cores on which dpdk lcore threads should be spawned.
>+   The DPDK lcore threads are used for DPDK library tasks, such as
>+   library internal message processing, logging, etc. Value should be in
>+   the form of a hex string (so '0x123') similar to the 'taskset' mask
>+   input.
>+   If not specified, the value will be determined by choosing the lowest
>+   CPU core from initial cpu affinity list. Otherwise, the value will be
>+   passed directly to the DPDK library.
>+   For performance reasons, it is best to set this to a single core on
>+   the system, rather than allow lcore threads to float.
>+
>+   * dpdk-mem-channels
>+   This sets the number of memory spread channels per CPU socket. It is
>purely
>+   an optimization flag.
>+
>+   * dpdk-alloc-mem
>+   This sets the total memory to preallocate from hugepages regardless of
>+   processor socket. It is recommended to use dpdk-socket-mem instead.
>+
>+   * dpdk-socket-mem
>+   Comma separated list of memory to pre-allocate from hugepages on
>specific
>+   sockets.
>+
>+   * dpdk-hugepage-dir
>+   Directory where hugetlbfs is mounted
>+
>+   * cuse-dev-name
>+   Option to set the vhost_cuse character device name.
>+
>+   * vhost-sock-dir
>+   Option to set the path to the vhost_user unix socket files.
>+
>+   NOTE: Changing any of these options requires restarting the
>ovs-vswitchd
>+   application.
>+
>+   Open vSwitch can be started as normal. DPDK will be initialized as
>long
>+   as the dpdk-init option has been set to 'true'.
> 
>    ```
>    export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
>-   ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
>+   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
>    ```
> 
>    If allocated more than one GB hugepage (as for IVSHMEM), set amount
>and
>    use NUMA node 0 memory:
> 
>    ```
>-   ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
>-   -- unix:$DB_SOCK --pidfile --detach
>+   ovs-vsctl --no-wait set Open_vSwitch .
>other_config:dpdk_socket_mem="1024,0"
>+   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
>    ```
> 
> 6. Add bridge & ports
>@@ -545,11 +587,13 @@ in the names.
>      `/usr/local/var/run/openvswitch/vhost-user-1`, which you must
>provide
>      to your VM on the QEMU command line. More instructions on this can
>be
>      found in the next section "DPDK vhost-user VM configuration"
>-     Note: If you wish for the vhost-user sockets to be created in a
>-     directory other than `/usr/local/var/run/openvswitch`, you may
>specify
>-     another location on the ovs-vswitchd command line like so:
> 
>-      `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
>+  - If you wish for the vhost-user sockets to be created in a directory
>other
>+    than `/usr/local/var/run/openvswitch`, you may specify another
>location
>+    in the ovsdb like:
>+
>+    `./vswitchd/ovs-vsctl --no-wait \
>+        set Open_vSwitch . other_config:vhost-sock-dir=path`
> 
> DPDK vhost-user VM configuration:
> ---------------------------------
>@@ -697,14 +741,13 @@ DPDK vhost-cuse VM configuration:
> 
> 1. This step is only needed if using an alternative character device.
> 
>-   The new character device filename must be specified on the vswitchd
>-   commandline:
>+   The new character device filename must be specified in the ovsdb:
> 
>-        `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c
>0x1 ...`
>+   `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \
>+                          other_config:cuse-dev-name=my-vhost-net`
> 
>-   Note that the `--cuse_dev_name` argument and associated string must
>be the first
>-   arguments after `--dpdk` and come before the EAL arguments. In the
>example
>-   above, the character device to be used will be `/dev/my-vhost-net`.
>+   In the example above, the character device to be used will be
>+   `/dev/my-vhost-net`.
> 
> 2. This step is only needed if reusing the standard character device. It
>will
>    conflict with the kernel vhost character device so the user must first
>@@ -817,8 +860,8 @@ steps.
> 
>         <my-vhost-device> refers to "vhost-net" if using the
>`/dev/vhost-net`
>         device. If you have specificed a different name on the
>ovs-vswitchd
>-        commandline using the "--cuse_dev_name" parameter, please
>specify that
>-        filename instead.
>+        database using the "other_config:cuse-dev-name" parameter, please
>+        specify that filename instead.
> 
>      2. Disable SELinux or set to permissive mode
> 
>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>index d44cf46..7e8e72e 100644
>--- a/lib/netdev-dpdk.c
>+++ b/lib/netdev-dpdk.c
>@@ -29,6 +29,7 @@
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
>+#include <getopt.h>
> 
> #include "dirs.h"
> #include "dp-packet.h"
>@@ -51,6 +52,7 @@
> #include "timeval.h"
> #include "unixctl.h"
> #include "openvswitch/vlog.h"
>+#include "vswitch-idl.h"
> 
> #include "rte_config.h"
> #include "rte_mbuf.h"
>@@ -65,6 +67,8 @@ static struct vlog_rate_limit rl =
>VLOG_RATE_LIMIT_INIT(5, 20);
> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
> #define OVS_VPORT_DPDK "ovs_dpdk"
> 
>+#define MAX_DPDK_EXCL_OPTS 10
>+

Please move this define closer to where it's used

> /*
>  * need to reserve tons of extra space in the mbufs so we can align the
>  * DMA addresses to 4KB.
>@@ -104,7 +108,9 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
>ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
> 
> #define OVS_VHOST_MAX_QUEUE_NUM 1024  /* Maximum number of vHost TX
>queues. */
> 
>+#ifdef VHOST_CUSE
> static char *cuse_dev_name = NULL;    /* Character device cuse_dev_name.
>*/
>+#endif
> static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets
>*/
> 
> /*
>@@ -646,8 +652,15 @@ netdev_dpdk_cast(const struct netdev *netdev)
> static struct netdev *
> netdev_dpdk_alloc(void)
> {
>-    struct netdev_dpdk *netdev = dpdk_rte_mzalloc(sizeof *netdev);
>-    return &netdev->up;
>+    struct netdev_dpdk *netdev;
>+
>+    if (!rte_eal_init_ret) { /* Only after successful initialization */
>+        netdev = dpdk_rte_mzalloc(sizeof *netdev);
>+        if (netdev) {
>+            return &netdev->up;
>+        }
>+    }
>+    return NULL;
> }
> 
> static void
>@@ -780,6 +793,10 @@ netdev_dpdk_vhost_cuse_construct(struct netdev
>*netdev_)
>     struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>     int err;
> 
>+    if (rte_eal_init_ret) {
>+        return rte_eal_init_ret;
>+    }
>+
>     ovs_mutex_lock(&dpdk_mutex);
>     strncpy(netdev->vhost_id, netdev->up.name, sizeof(netdev->vhost_id));
>     err = vhost_construct_helper(netdev_);
>@@ -804,6 +821,10 @@ netdev_dpdk_vhost_user_construct(struct netdev
>*netdev_)
>         return EINVAL;
>     }
> 
>+    if (rte_eal_init_ret) {
>+        return rte_eal_init_ret;
>+    }
>+
>     ovs_mutex_lock(&dpdk_mutex);
>     /* Take the name of the vhost-user port and append it to the
>location where
>      * the socket is to be created, then register the socket.
>@@ -2227,28 +2248,12 @@ dpdk_vhost_class_init(void)
> static int
> dpdk_vhost_cuse_class_init(void)
> {
>-    int err = -1;
>-
>-
>-    /* Register CUSE device to handle IOCTLs.
>-     * Unless otherwise specified on the vswitchd command line,
>cuse_dev_name
>-     * is set to vhost-net.
>-     */
>-    err = rte_vhost_driver_register(cuse_dev_name);
>-
>-    if (err != 0) {
>-        VLOG_ERR("CUSE device setup failure.");
>-        return -1;
>-    }
>-
>-    dpdk_vhost_class_init();
>     return 0;
> }
> 
> static int
> dpdk_vhost_user_class_init(void)
> {
>-    dpdk_vhost_class_init();
>     return 0;
> }
> 
>@@ -2258,8 +2263,6 @@ dpdk_common_init(void)
>     unixctl_command_register("netdev-dpdk/set-admin-state",
>                              "[netdev] up|down", 1, 2,
>                              netdev_dpdk_set_admin_state, NULL);
>-
>-    ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
> }
> 
> /* Client Rings */
>@@ -2708,22 +2711,20 @@ static const struct dpdk_qos_ops
>egress_policer_ops = {
> 
> static int
> process_vhost_flags(char *flag, char *default_val, int size,
>-                    char **argv, char **new_val)
>+                    const struct ovsrec_open_vswitch *ovs_cfg,
>+                    char **new_val)
> {
>+    const char *val;
>     int changed = 0;
> 
>+    val = smap_get(&ovs_cfg->other_config, flag);
>+
>     /* Depending on which version of vhost is in use, process the
>vhost-specific
>-     * flag if it is provided on the vswitchd command line, otherwise
>resort to
>-     * a default value.
>-     *
>-     * For vhost-user: Process "-vhost_sock_dir" to set the custom
>location of
>-     * the vhost-user socket(s).
>-     * For vhost-cuse: Process "-cuse_dev_name" to set the custom name
>of the
>-     * vhost-cuse character device.
>+     * flag if it is provided, otherwise resort to a default value.
>      */
>-    if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) {
>+    if (val && (strlen(val) <= size)) {
>         changed = 1;
>-        *new_val = xstrdup(argv[2]);
>+        *new_val = xstrdup(val);
>         VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
>     } else {
>         VLOG_INFO("No %s provided - defaulting to %s", flag,
>default_val);
>@@ -2733,68 +2734,194 @@ process_vhost_flags(char *flag, char
>*default_val, int size,
>     return changed;
> }
> 
>-int
>-dpdk_init(int argc, char **argv)
>+static char **
>+grow_argv(char ***argv, size_t cur_siz, size_t grow_by)
> {
>-    int result;
>-    int base = 0;
>-    char *pragram_name = argv[0];
>-    int err;
>-    int isset;
>-    cpu_set_t cpuset;
>+    char **new_argv = realloc(*argv, sizeof(char *) * (cur_siz +
>grow_by));

please use xrealloc(), it will abort if the memory allocation fails.

>+    return new_argv;
>+}
> 
>-    if (argc < 2 || strcmp(argv[1], "--dpdk"))
>-        return 0;
>+static void
>+dpdk_option_extend(char ***argv, int argc, const char *option,
>+                   const char *value)
>+{
>+    char **newargv = grow_argv(argv, argc, 2);
> 
>-    /* Remove the --dpdk argument from arg list.*/
>-    argc--;
>-    argv++;
>+    if (!newargv) {
>+        ovs_abort(0, "grow_argv() failed to allocate memory.");
>+    }

No need for this check if we use xrealloc() above.

> 
>-    /* Reject --user option */
>-    int i;
>-    for (i = 0; i < argc; i++) {
>-        if (!strcmp(argv[i], "--user")) {
>-            VLOG_ERR("Can not mix --dpdk and --user options, aborting.");
>+    *argv = newargv;
>+    (*argv)[argc]   = xstrdup(option);
>+    (*argv)[argc+1] = xstrdup(value);

I would use newargv instead on (*argv)

There's no need to align "=" vertically.

>+}
>+
>+static int
>+construct_dpdk_options(const struct ovsrec_open_vswitch *ovs_cfg,
>+                       char ***argv, const int initial_size)
>+{
>+    struct dpdk_options_map {
>+        const char *ovs_configuration;
>+        const char *dpdk_option;
>+        bool default_enabled;
>+        const char *default_value;
>+    } opts[] = {
>+        {"dpdk-lcore-mask", "-c", false, NULL},
>+        {"dpdk-mem-channels", "-n", false, "4"},
>+        {"dpdk-hugepage-dir", "--huge-dir", false, NULL},
>+    };
>+
>+    int i, ret = initial_size;
>+
>+    /*First, construct from the flat-options (non-mutex)*/
>+    for(i = 0; i < ARRAY_SIZE(opts); ++i) {

Space between for and (

>+        const char *lookup = smap_get(&ovs_cfg->other_config,
>+                                      opts[i].ovs_configuration);
>+        if(!lookup && opts[i].default_enabled)

Space between if and (

Braces are required even for single statements.

>+            lookup = opts[i].default_value;
>+
>+        if(lookup) {

Space between if and (

>+            dpdk_option_extend(argv, ret, opts[i].dpdk_option, lookup);
>+            ret += 2;
>         }
>     }
> 
>+    return ret;
>+}
>+
>+static int
>+construct_dpdk_mutex_options(const struct ovsrec_open_vswitch *ovs_cfg,
>+                             char ***argv, const int initial_size)
>+{
>+    struct dpdk_exclusive_options_map {
>+        const char *category;
>+        const char *ovs_dpdk_options[MAX_DPDK_EXCL_OPTS];
>+        const char *eal_dpdk_options[MAX_DPDK_EXCL_OPTS];
>+        const char *default_value;
>+        int default_option;
>+    } excl_opts[] = {
>+        {"memory type",
>+         {"dpdk-alloc-mem", "dpdk-socket-mem", NULL,},
>+         {"-m",             "--socket-mem",    NULL,},
>+         "1024,0", 1
>+        },
>+    };
>+
>+    int i, ret = initial_size;
>+    for (i = 0; i < ARRAY_SIZE(excl_opts); ++i) {
>+        int found_opts = 0, scan, found_pos = -1;
>+        const char *found_value;
>+        struct dpdk_exclusive_options_map *popt = &excl_opts[i];
>+
>+        for (scan = 0; scan < MAX_DPDK_EXCL_OPTS &&

please put the newline before the &&

>+                 popt->ovs_dpdk_options[scan]; ++scan) {
>+            const char *lookup = smap_get(&ovs_cfg->other_config,
>+                                          popt->ovs_dpdk_options[scan]);
>+            if (lookup && strlen(lookup)) {
>+                found_opts++;
>+                found_pos = scan;
>+                found_value = lookup;
>+            }
>+        }
>+
>+        if (!found_opts) {
>+            if (popt->default_option) {
>+                found_pos = popt->default_option;
>+                found_value = popt->default_value;
>+            } else {
>+                continue;
>+            }
>+        }
>+
>+        if (found_opts > 1) {
>+            VLOG_ERR("Multiple defined options for %s. Please check your"
>+                     " database settings.", popt->category);
>+            ovs_abort(0, "Unable to cope with DPDK settings.");

Again, I'm not sure it's a good idea to abort.

>+        }
>+
>+        dpdk_option_extend(argv, ret, popt->eal_dpdk_options[found_pos],
>+                           found_value);
>+        ret += 2;
>+    }
>+
>+    return ret;
>+}
>+
>+static int
>+get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
>+{
>+    int i = construct_dpdk_options(ovs_cfg, argv, 1);
>+    i = construct_dpdk_mutex_options(ovs_cfg, argv, i);
>+    return i;
>+}
>+
>+static char **dpdk_argv;
>+static int dpdk_argc;
>+
>+static void
>+deferred_argv_release(void)
>+{
>+    int result;
>+    for (result = 0; result < dpdk_argc; ++result) {
>+        free(dpdk_argv[result]);
>+    }
>+
>+    free(dpdk_argv);
>+}
>+
>+static void
>+__dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)

CodingStyle.md suggests to use "__" as a suffix.

>+{
>+    char **argv = NULL;
>+    int result;
>+    int argc;
>+    int err;
>+    cpu_set_t cpuset;
>+
>+    if(!smap_get_bool(&ovs_cfg->other_config, "dpdk-init", false)) {

space between if and (

>+        VLOG_INFO("DPDK Disabled - to change this requires a
>restart.\n");
>+        return;
>+    }
>+
>+    VLOG_INFO("DPDK Enabled, initializing");
>+
> #ifdef VHOST_CUSE
>-    if (process_vhost_flags("-cuse_dev_name", xstrdup("vhost-net"),
>-                            PATH_MAX, argv, &cuse_dev_name)) {
>+    if (process_vhost_flags("cuse-dev-name", xstrdup("vhost-net"),
>+                            PATH_MAX, ovs_cfg, &cuse_dev_name)) {
> #else
>-    if (process_vhost_flags("-vhost_sock_dir", xstrdup(ovs_rundir()),
>-                            NAME_MAX, argv, &vhost_sock_dir)) {
>+    if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
>+                            NAME_MAX, ovs_cfg, &vhost_sock_dir)) {
>         struct stat s;
>-        int err;
> 
>         err = stat(vhost_sock_dir, &s);
>         if (err) {
>-            VLOG_ERR("vHostUser socket DIR '%s' does not exist.",
>-                     vhost_sock_dir);
>-            return err;
>+            ovs_abort(0, "vhost-user sock directory '%s' does not
>exist.",
>+                      vhost_sock_dir);

I'd rather just log this without aborting.

>         }
> #endif
>-        /* Remove the vhost flag configuration parameters from the
>argument
>-         * list, so that the correct elements are passed to the DPDK
>-         * initialization function
>-         */
>-        argc -= 2;
>-        argv += 2;    /* Increment by two to bypass the vhost flag
>arguments */
>-        base = 2;
>     }
> 
>     /* Get the main thread affinity */
>     CPU_ZERO(&cpuset);
>     err = pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t),
>&cpuset);
>     if (err) {
>-        VLOG_ERR("Thread getaffinity error %d.", err);
>-        return err;
>+        ovs_abort(0, "Thread getaffinity error %d.", err);

Just logging the error and avoid restoring the cpu mask later is probably
fine

>     }
> 
>-    /* Keep the program name argument as this is needed for call to
>-     * rte_eal_init()
>-     */
>-    argv[0] = pragram_name;
>+    argv = grow_argv(&argv, 0, 1);
>+    if (!argv) {
>+        ovs_abort(0, "Unable to allocate an initial argv.");
>+    }

No need to abort here if we use xrealloc() in grow_argv()

>+    argv[0] = xstrdup("ovs"); /* TODO use prctl to get process name */

I guess we can use ovs_get_program_name()

>+    argc = get_dpdk_args(ovs_cfg, &argv);
>+
>+    argv = grow_argv(&argv, argc, 1);
>+    if (!argv) {
>+        ovs_abort(0, "Unable to make final argv allocation.");
>+    }

No need to abort here if we use xrealloc() in grow_argv()


>+    argv[argc] = 0;
>+
>+    optind = 1;
> 
>     /* Make sure things are initialized ... */
>     result = rte_eal_init(argc, argv);
>@@ -2805,21 +2932,46 @@ dpdk_init(int argc, char **argv)
>     /* Set the main thread affinity back to pre rte_eal_init() value */
>     err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t),
>&cpuset);
>     if (err) {
>-        VLOG_ERR("Thread setaffinity error %d", err);
>-        return err;
>+        ovs_abort(0, "Thread getaffinity error %d.", err);

I would just log this.

>     }
> 
>+    dpdk_argv = argv;
>+    dpdk_argc = argc;
>+
>+    atexit(deferred_argv_release);
>+
>     rte_memzone_dump(stdout);
>     rte_eal_init_ret = 0;
> 
>-    if (argc > result) {
>-        argv[result] = argv[0];
>-    }
>-
>     /* We are called from the main thread here */
>     RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
> 
>-    return result + 1 + base;
>+    ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL);
>+
>+#ifdef VHOST_CUSE
>+    /* Register CUSE device to handle IOCTLs.
>+     * Unless otherwise specified, cuse_dev_name is set to vhost-net.
>+     */
>+    err = rte_vhost_driver_register(cuse_dev_name);
>+
>+    if (err != 0) {
>+        VLOG_ERR("CUSE device setup failure.");
>+        return -1;

This function now returns void

>+    }
>+#endif
>+
>+    dpdk_vhost_class_init();
>+}
>+
>+void
>+dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
>+{
>+    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>+
>+    if(ovs_cfg && ovsthread_once_start(&once)) {
>+        __dpdk_init(ovs_cfg);
>+        ovsthread_once_done(&once);
>+    }
> }
> 
> static const struct netdev_class dpdk_class =
>@@ -2883,10 +3035,6 @@ netdev_dpdk_register(void)
> {
>     static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> 
>-    if (rte_eal_init_ret) {
>-        return;
>-    }
>-
>     if (ovsthread_once_start(&once)) {
>         dpdk_common_init();
>         netdev_register_provider(&dpdk_class);
>diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
>index 646d3e2..b1e96f7 100644
>--- a/lib/netdev-dpdk.h
>+++ b/lib/netdev-dpdk.h
>@@ -2,6 +2,9 @@
> #define NETDEV_DPDK_H
> 
> #include <config.h>
>+#include "smap.h"
>+#include "ovs-thread.h"
>+#include "vswitch-idl.h"
> 
> struct dp_packet;
> 
>@@ -22,7 +25,9 @@ struct dp_packet;
> 
> #define NON_PMD_CORE_ID LCORE_ID_ANY
> 
>-int dpdk_init(int argc, char **argv);
>+struct ovsrec_open_vswitch;
>+
>+void dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg);
> void netdev_dpdk_register(void);
> void free_dpdk_buf(struct dp_packet *);
> int pmd_thread_setaffinity_cpu(unsigned cpu);
>@@ -30,16 +35,19 @@ int pmd_thread_setaffinity_cpu(unsigned cpu);
> #else
> 
> #define NON_PMD_CORE_ID UINT32_MAX
>-
> #include "util.h"
> 
>-static inline int
>-dpdk_init(int argc, char **argv)
>+static inline void
>+dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
> {
>-    if (argc >= 2 && !strcmp(argv[1], "--dpdk")) {
>-        ovs_fatal(0, "DPDK support not built into this copy of Open
>vSwitch.");
>+    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>+
>+    if (ovs_cfg && ovsthread_once_start(&once)) {
>+        if(smap_get_bool(&ovs_cfg->other_config, "dpdk-init", false)) {
>+            ovs_fatal(0, "DPDK not supported in this copy of Open
>vSwitch.");

Since this is coming from the database I think we shouldn't abort.

>+        }
>+        ovsthread_once_done(&once);
>     }
>-    return 0;
> }
> 
> static inline void
>diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
>index 18114d9..d1f95b5 100644
>--- a/tests/ofproto-macros.at
>+++ b/tests/ofproto-macros.at
>@@ -282,7 +282,8 @@ m4_define([_OVS_VSWITCHD_START],
> /reconnect|INFO|/d
> /ofproto|INFO|using datapath ID/d
> /netdev_linux|INFO|.*device has unknown hardware address family/d
>-/ofproto|INFO|datapath ID changed to fedcba9876543210/d']])
>+/ofproto|INFO|datapath ID changed to fedcba9876543210/d
>+/dpdk|INFO|DPDK Disabled - to change this requires a restart./d']])
> ])
> 
> # OVS_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
>diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py
>index c121706..f2368c2 100755
>--- a/utilities/ovs-dev.py
>+++ b/utilities/ovs-dev.py
>@@ -265,9 +265,9 @@ def run():
>     cmd = [build + "/vswitchd/ovs-vswitchd"]
> 
>     if options.dpdk:
>-        cmd.append("--dpdk")
>-        cmd.extend(options.dpdk)
>-        cmd.append("--")
>+        _sh("ovs-vsctl --no-wait set Open_vSwitch %s
>other_config:dpdk-init=true" % root_uuid)
>+    else:
>+        _sh("ovs-vsctl --no-wait set Open_vSwitch %s
>other_config:dpdk-init=false" % root_uuid)
> 
>     if options.gdb:
>         cmd = ["gdb", "--args"] + cmd
>@@ -421,9 +421,8 @@ def main():
>                      help="run ovs-vswitchd under gdb")
>     group.add_option("--valgrind", dest="valgrind", action="store_true",
>                      help="run ovs-vswitchd under valgrind")
>-    group.add_option("--dpdk", dest="dpdk", action="callback",
>-                     callback=parse_subargs,
>-                     help="run ovs-vswitchd with dpdk subopts (ended by
>--)")
>+    group.add_option("--dpdk", dest="dpdk", action="store_true",
>+                     help="run ovs-vswitchd with dpdk")
>     group.add_option("--clang", dest="clang", action="store_true",
>                      help="Use binaries built by clang")
>     group.add_option("--user", dest="user", action="store", default="",
>diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>index 25a0663..1fb288e 100644
>--- a/vswitchd/bridge.c
>+++ b/vswitchd/bridge.c
>@@ -68,6 +68,7 @@
> #include "sflow_api.h"
> #include "vlan-bitmap.h"
> #include "packets.h"
>+#include "lib/netdev-dpdk.h"
> 
> VLOG_DEFINE_THIS_MODULE(bridge);
> 
>@@ -2924,6 +2925,8 @@ bridge_run(void)
>     }
>     cfg = ovsrec_open_vswitch_first(idl);
> 
>+    dpdk_init(cfg);
>+
>     /* Initialize the ofproto library.  This only needs to run once, but
>      * it must be done after the configuration is set.  If the
>      * initialization has already occurred, bridge_init_ofproto()
>diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
>index b628f2f..794c3df 100644
>--- a/vswitchd/ovs-vswitchd.8.in
>+++ b/vswitchd/ovs-vswitchd.8.in
>@@ -84,9 +84,8 @@ only allow privileged users, such as the superuser, to
>use it.
> unavailable or unsuccessful.
> .
> .SS "DPDK Options"
>-.IP "\fB\-\-dpdk\fR"
>-Initialize \fBovs\-vswitchd\fR DPDK datapath.  Refer to INSTALL.DPDK
>-for details.
>+For details on initializing the \fBovs\-vswitchd\fR DPDK datapath,
>+refer to INSTALL.DPDK.md

I would also point to the ovs-vswitchd.conf.db manpage

> .SS "Daemon Options"
> .ds DD \
> \fBovs\-vswitchd\fR detaches only after it has connected to the \
>diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
>index e78ecda..c448107 100644
>--- a/vswitchd/ovs-vswitchd.c
>+++ b/vswitchd/ovs-vswitchd.c
>@@ -48,7 +48,6 @@
> #include "openvswitch/vconn.h"
> #include "openvswitch/vlog.h"
> #include "lib/vswitch-idl.h"
>-#include "lib/netdev-dpdk.h"
> 
> VLOG_DEFINE_THIS_MODULE(vswitchd);
> 
>@@ -71,13 +70,6 @@ main(int argc, char *argv[])
>     int retval;
> 
>     set_program_name(argv[0]);
>-    retval = dpdk_init(argc,argv);
>-    if (retval < 0) {
>-        return retval;
>-    }
>-
>-    argc -= retval;
>-    argv += retval;
> 
>     ovs_cmdl_proctitle_init(argc, argv);
>     service_start(&argc, &argv);
>@@ -166,7 +158,7 @@ parse_options(int argc, char *argv[], char
>**unixctl_pathp)
>         {"bootstrap-ca-cert", required_argument, NULL,
>OPT_BOOTSTRAP_CA_CERT},
>         {"enable-dummy", optional_argument, NULL, OPT_ENABLE_DUMMY},
>         {"disable-system", no_argument, NULL, OPT_DISABLE_SYSTEM},
>-        {"dpdk", required_argument, NULL, OPT_DPDK},
>+        {"dpdk", optional_argument, NULL, OPT_DPDK},
>         {NULL, 0, NULL, 0},
>     };
>     char *short_options =
>ovs_cmdl_long_options_to_short_options(long_options);
>@@ -219,7 +211,7 @@ parse_options(int argc, char *argv[], char
>**unixctl_pathp)
>             exit(EXIT_FAILURE);
> 
>         case OPT_DPDK:
>-            ovs_fatal(0, "--dpdk must be given at beginning of command
>line.");
>+            ovs_fatal(0, "Using --dpdk to configure DPDK is not
>supported.");
>             break;
> 
>         default:
>@@ -256,17 +248,8 @@ usage(void)
>     daemon_usage();
>     vlog_usage();
>     printf("\nDPDK options:\n"
>-           "  --dpdk [VHOST] [DPDK]     Initialize DPDK datapath.\n"
>-           "  where DPDK are options for initializing DPDK lib and VHOST
>is\n"
>-#ifdef VHOST_CUSE
>-           "  option to override default character device name used
>for\n"
>-           "  for use with userspace vHost\n"
>-           "    -cuse_dev_name NAME\n"
>-#else
>-           "  option to override default directory where vhost-user\n"
>-           "  sockets are created.\n"
>-           "    -vhost_sock_dir DIR\n"
>-#endif
>+           "Configuration of DPDK via command-line is removed from
>this\n"
>+           "version of Open vSwitch. DPDK is configured through ovsdb.\n"
>            );
>     printf("\nOther options:\n"
>            "  --unixctl=SOCKET          override default control socket
>name\n"
>diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>index f7b7b9c..0aae391 100644
>--- a/vswitchd/vswitch.xml
>+++ b/vswitchd/vswitch.xml
>@@ -167,11 +167,63 @@
>         </p>
>       </column>
> 
>+      <column name="other_config" key="n-dpdk-rxqs"
>+              type='{"type": "integer", "minInteger": 1}'>
>+        <p>
>+          Specifies the maximum number of rx queues to be created for
>each dpdk
>+          interface.  If not specified or specified to 0, one rx queue
>will
>+          be created for each dpdk interface by default.
>+        </p>
>+      </column>
>+
>+      <column name="other_config" key="dpdk-init"
>+              type='{"type": "boolean"}'>
>+        <p>
>+          Set this value to <code>true</code> to enable runtime support
>for
>+          DPDK ports. The vswitch must have compile-time support for
>DPDK as
>+          well.
>+        </p>
>+        <p>
>+          The default value is <code>false</code>. Changing this value 
>requires
>+          restarting the forwarding path.

I would say "daemon" instead of "forwarding path" (also below, multiple 
times)

>+        </p>
>+        <p>
>+          If this value is <code>false</code> at startup, any dpdk ports 
>which
>+          are configured in the bridge will fail due to memory errors.
>+        </p>
>+      </column>
>+
>+      <column name="other_config" key="dpdk-lcore-mask"
>+              type='{"type": "integer", "minInteger": 1}'>
>+        <p>
>+          Specifies the CPU cores on which dpdk lcore threads should be 
>spawned.
>+          The DPDK lcore threads are used for DPDK library tasks, such as
>+          library internal message processing, logging, etc. Value 
>should be in
>+          the form of a hex string (so '0x123') similar to the 'taskset' 
>mask
>+          input.
>+        </p>
>+        <p>
>+          The lowest order bit corresponds to the first CPU core.  A set 
>bit
>+          means the corresponding core is available and a pmd thread 
>will be
>+          created and pinned to it.  If the input does not cover all

dpdk-lcore-mask doesn't influence pmd threads.


> 
>cores,
>+          those uncovered cores are considered not set.
>+        </p>
>+        <p>
>+          For performance reasons, it is best to set this to a single 
>core on
>+          the system, rather than allow lcore threads to float.
>+        </p>
>+        <p>
>+          If not specified, the value will be determined by choosing the
> 
>lowest
>+          CPU core from initial cpu affinity list. Otherwise, the value 
>will be
>+          passed directly to the DPDK library.
>+        </p>
>+      </column>
>+
>       <column name="other_config" key="pmd-cpu-mask">
>         <p>
>           Specifies CPU mask for setting the cpu affinity of PMD (Poll
>           Mode Driver) threads.  Value should be in the form of hex 
>string,
>-          similar to the dpdk EAL '-c COREMASK' option input or the 
>'taskset'
>+          similar to the dpdk-lcore-mask option input or the 'taskset'
>           mask input.
>         </p>
>         <p>
>@@ -186,6 +238,83 @@
>         </p>
>       </column>
> 
>+      <column name="other_config" key="dpdk-mem-channels"
>+              type='{"type": "integer", "minInteger": 1}'>
>+        <p>
>+          Specifies the number of NUMA memory spread channels in the CPU 
>to
>+          be used by DPDK. It is purely optimization.
>+        </p>
>+        <p>
>+          The default value is 4. Changing this value requires 
>restarting the
>+          forwarding path.
>+        </p>
>+      </column>
>+
>+      <column name="other_config" key="dpdk-alloc-mem"
>+              type='{"type": "integer", "minInteger": 0}'>
>+        <p>
>+          Specifies the amount of memory to preallocate from the 
>hugepage pool,
>+          regardless of socket. It is recommended that dpdk-socket-mem 
>is used
>+          instead.
>+        </p>
>+        <p>
>+          If not specified, the value is 0. Changing this value requires
>+          restarting the forwarding path.
>+        </p>
>+      </column>
>+
>+      <column name="other_config" key="dpdk-socket-mem"
>+              type='{"type": "string"}'>
>+        <p>
>+          Specifies the amount of memory to preallocate from the 
>hugepage pool,
>+          on a per-socket basis.
>+        </p>
>+        <p>
>+          The specifier is a comma-separated string, in ascending order 
>of CPU
>+          socket (ex: 1024,2048,4096,8192 would set socket 0 to 
>preallocate
>+          1024MB, socket 1 to preallocate 2048MB, etc.)
>+        </p>
>+        <p>
>+          If not specified, the default value is 1024,0. Changing this 
>value
>+          requires restarting the forwarding path.
>+        </p>
>+      </column>
>+
>+      <column name="other_config" key="dpdk-hugepage-dir"
>+              type='{"type": "string"}'>
>+        <p>
>+          Specifies the path to the hugetlbfs mount point.
>+        </p>
>+        <p>
>+          If not specified, this will be guessed by the DPDK library 
>(default
>+          is /dev/hugepages). Changing this value requires restarting the
>+          forwarding path.
>+        </p>
>+      </column>
>+
>+      <column name="other_config" key="cuse-dev-name"
>+              type='{"type": "string"}'>
>+        <p>
>+          Specifies the name of the vhost-cuse character device to open 
>for
>+          vhost-cuse support.
>+        </p>
>+        <p>
>+          The default is vhost-net. Changing this value requires 
>restarting
>+          the forwarding path.
>+        </p>
>+      </column>
>+
>+      <column name="other_config" key="vhost-sock-dir"
>+              type='{"type": "string"}'>
>+        <p>
>+          Specifies the path to the vhost-user unix domain socket files.
>+        </p>
>+        <p>
>+          The default is the working directory of the application. 
>Changing this
>+          value requires restarting the forwarding path.
>+        </p>
>+      </column>
>+
>       <column name="other_config" key="n-handler-threads"
>               type='{"type": "integer", "minInteger": 1}'>
>         <p>
>-- 
>2.5.0




More information about the dev mailing list