[ovs-dev] [PATCH v2 2/3] netdev-dpdk: Convert initialization from cmdline to db

Aaron Conole aconole at redhat.com
Tue Jan 12 14:46:26 UTC 2016


"Traynor, Kevin" <kevin.traynor at intel.com> writes:
>> -----Original Message-----
>> From: Aaron Conole [mailto:aconole at redhat.com]
>> Sent: Monday, January 4, 2016 9:47 PM
>> To: dev at openvswitch.org; Flavio Leitner; Traynor, Kevin
>> Subject: [PATCH v2 2/3] netdev-dpdk: Convert initialization from cmdline to
>> db
>> 
>> Existing DPDK integration is provided by use of command line options which
>> must be split out and passed to librte in a special manner. However, this
>> forces any configuration to be passed by way of a special DPDK flag, and
>> interferes with ovs+dpdk packaging solutions.
>> 
>> This commit delays dpdk initialization until the first DPDK netdev is added
>> to the bridge, at which point ovs initializes librte. It pulls all of
>> the config data from the OVS database, and assembles a new argv/argc
>> pair to be passed along.
>> 
>> Signed-off-by: Aaron Conole <aconole at redhat.com>
>> ---
>> v2:
>> * Removed trailing whitespace
>> * Followed for() loop brace coding style
>> * Automatically enable DPDK when adding a DPDK enabled port
>> * Fixed an issue on startup when DPDK enabled ports are present
>> * Updated the documentation (including vswitch.xml) and documented all
>>   new parameters
>> * Dropped the premature initialization test
>
> Hi, mostly very minor comments below,

Kevin, thanks for the review!

>
>> 
>> INSTALL.DPDK.md         |  75 ++++++++++++----
>>  lib/netdev-dpdk.c       | 224 +++++++++++++++++++++++++++++++++++-----------
>> --
>>  lib/netdev-dpdk.h       |  19 ++--
>>  vswitchd/bridge.c       |   3 +
>>  vswitchd/ovs-vswitchd.c |  25 +-----
>>  vswitchd/vswitch.xml    | 102 +++++++++++++++++++++-
>>  6 files changed, 341 insertions(+), 107 deletions(-)
>> 
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index 96b686c..2dd2120 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -143,22 +143,59 @@ Using the DPDK with ovs-vswitchd:
>> 
>>  5. Start vswitchd:
>> 
>> -   DPDK configuration arguments can be passed to vswitchd via `--dpdk`
>> -   argument. This needs to be first argument passed to vswitchd process.
>> -   dpdk arg -c is ignored by ovs-dpdk, but it is a required parameter
>> -   for dpdk initialization.
>> +   DPDK configuration arguments can be passed to vswitchd via Open_vSwitch
>> +   other_config database. The recognized configuration options are listed.
>
> 'Defaults will be provided for all values not set.'

Thanks, will add this.

>
>> +
>> +   * dpdk-lcore-mask
>> +   Specifies the CPU cores on which dpdk lcore threads should be spawned.
>> +   The DPDK lcore threads are used for DPDK library tasks, such as
>> +   library internal message processing, logging, etc. Value should be in
>> +   the form of a hex string (so '0x123') similar to the 'taskset' mask
>> +   input.
>
> 'CPU cores' and '0x123' imply it will be multiple cores in this case - better
> to change to CPU core and 0x1
>
> Also, I don't think the 0x prefix is accepted based on using pmd-cpu-mask.

Hrrm.. okay. I used the documentation on pmd-cpu-mask as my guideline,
so if that's broken it's a bug against pmd-cpu-mask as well (it's
documented as a <hex-string>).

>> +   If not specified, the value will be determined by choosing the lowest
>> +   CPU core from initial cpu affinity list. Otherwise, the value will be
>> +   passed directly to the DPDK library.
>> +   For performance reasons, it is best to set this to a single core on
>> +   the system, rather than allow lcore threads to float.
>
> I suppose it depends on the system load and vswitch usage as to which will give
> better performance but I agree setting a dedicated core is at least more
> deterministic and less risky, so statement is probably fine.

Well, I used the advice in rings as my guideline here. I think it's
correct to say that the workload is important. Okay, I'll think if
there's a better comment to use here.

>> +
>> +   * dpdk-mem-channels
>> +   This sets the number of memory spread channels per CPU socket. It is
>> purely
>> +   an optimization flag.
>> +
>> +   * dpdk-alloc-mem
>> +   This sets the total memory to preallocate from hugepages regardless of
>> +   processor socket. It is recommended to use dpdk-socket-mem instead.
>> +
>> +   * dpdk-socket-mem
>> +   Comma separated list of memory to pre-allocate from hugepages on specific
>> +   sockets.
>> +
>> +   * dpdk-hugepage-dir
>> +   Directory where hugetlbfs is mounted
>> +
>> +   * cuse-dev-name
>> +   Option to set the vhost_cuse character device name.
>> +
>> +   * vhost-sock-dir
>> +   Option to set the path to the vhost_user unix socket files.
>> +
>> +   NOTE: Changing any of these options requires restarting the ovs-vswitchd
>> +   application.
>> +
>> +   Open vSwitch can be started as normal. DPDK will not be initialized until
>> +   the first DPDK-enabled port is added to the bridge.
>> 
>>     ```
>>     export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
>> -   ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
>> +   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
>>     ```
>> 
>>     If allocated more than one GB hugepage (as for IVSHMEM), set amount and
>>     use NUMA node 0 memory:
>> 
>>     ```
>> -   ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
>> -   -- unix:$DB_SOCK --pidfile --detach
>> +   ovs-vsctl --no-wait set Open_vSwitch .
>> other_config:dpdk_socket_mem="1024,0"
>> +   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
>>     ```
>> 
>>  6. Add bridge & ports
>> @@ -521,11 +558,12 @@ have arbitrary names.
>>       `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
>>       to your VM on the QEMU command line. More instructions on this can be
>>       found in the next section "DPDK vhost-user VM configuration"
>> -     Note: If you wish for the vhost-user sockets to be created in a
>> -     directory other than `/usr/local/var/run/openvswitch`, you may specify
>> -     another location on the ovs-vswitchd command line like so:
>> 
>> -      `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
>> +  - If you wish for the vhost-user sockets to be created in a directory
>> other
>> +    than `/usr/local/var/run/openvswitch`, you may specify another location
>> +    in the ovsdb like:
>> +
>> +    `./vswitchd/ovs-vsctl set Open_vSwitch . other_config:vhost-sock-
>> dir=path`
>
> --no-wait

d'oh!

>> 
>>  DPDK vhost-user VM configuration:
>>  ---------------------------------
>> @@ -638,14 +676,13 @@ DPDK vhost-cuse VM configuration:
>> 
>>  1. This step is only needed if using an alternative character device.
>> 
>> -   The new character device filename must be specified on the vswitchd
>> -   commandline:
>> +   The new character device filename must be specified in the ovsdb:
>> 
>> -        `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1
>> ...`
>> +   `./utilities/ovs-vsctl set Open_vSwitch . \
>> +                          other_config:cuse-dev-name=my-vhost-net`
>
> --no-wait

d'oh!

>> -   Note that the `--cuse_dev_name` argument and associated string must be
>> the first
>> -   arguments after `--dpdk` and come before the EAL arguments. In the
>> example
>> -   above, the character device to be used will be `/dev/my-vhost-net`.
>> +   In the example above, the character device to be used will be
>> +   `/dev/my-vhost-net`.
>> 
>>  2. This step is only needed if reusing the standard character device. It
>> will
>>     conflict with the kernel vhost character device so the user must first
>> @@ -758,8 +795,8 @@ steps.
>> 
>>          <my-vhost-device> refers to "vhost-net" if using the `/dev/vhost-
>> net`
>>          device. If you have specificed a different name on the ovs-vswitchd
>> -        commandline using the "--cuse_dev_name" parameter, please specify
>> that
>> -        filename instead.
>> +        database using the "other_config:cuse-dev-name" parameter, please
>> +        specify that filename instead.
>> 
>>       2. Disable SELinux or set to permissive mode
>> 
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index b42712f..2ce9f71 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -29,6 +29,7 @@
>>  #include <stdio.h>
>>  #include <sys/types.h>
>>  #include <sys/stat.h>
>> +#include <getopt.h>
>> 
>>  #include "dirs.h"
>>  #include "dp-packet.h"
>> @@ -49,6 +50,8 @@
>>  #include "timeval.h"
>>  #include "unixctl.h"
>>  #include "openvswitch/vlog.h"
>> +#include "smap.h"
>> +#include "vswitch-idl.h"
>> 
>>  #include "rte_config.h"
>>  #include "rte_mbuf.h"
>> @@ -138,6 +141,10 @@ enum dpdk_dev_type {
>>      DPDK_DEV_VHOST = 1,
>>  };
>> 
>> +static const struct ovsrec_open_vswitch *ovs_last_cfg;
>> +
>> +void dpdk_init(void);
>> +
>>  static int rte_eal_init_ret = ENODEV;
>> 
>>  static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
>> @@ -551,8 +558,20 @@ netdev_dpdk_cast(const struct netdev *netdev)
>>  static struct netdev *
>>  netdev_dpdk_alloc(void)
>>  {
>> -    struct netdev_dpdk *netdev = dpdk_rte_mzalloc(sizeof *netdev);
>> -    return &netdev->up;
>> +    struct netdev_dpdk *netdev;
>> +
>> +    if (rte_eal_init_ret) {
>> +        dpdk_init();
>
> This may be gone from the next patch, but there is protection inside
> and outside of dpdk_init(); to prevent it executing twice. There might
> be a way to get rid of one of them while still getting the return values
> you want?

I wanted to skip the dpdk_init path altogether, but you're probably
right. Anyway, you're correct that it will need to be re-done from the
patch series.

>> +        if (rte_eal_init_ret) {
>> +            return NULL;
>> +        }
>> +    }
>> +
>> +    netdev = dpdk_rte_mzalloc(sizeof *netdev);
>> +    if (netdev) {
>> +        return &netdev->up;
>> +    }
>> +    return NULL;
>>  }
>> 
>>  static void
>> @@ -657,7 +676,10 @@ vhost_construct_helper(struct netdev *netdev_)
>> OVS_REQUIRES(dpdk_mutex)
>>      struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> 
>>      if (rte_eal_init_ret) {
>> -        return rte_eal_init_ret;
>> +        dpdk_init();
>> +        if (rte_eal_init_ret) {
>> +            return rte_eal_init_ret;
>> +        }
>>      }
>> 
>>      rte_spinlock_init(&netdev->vhost_tx_lock);
>> @@ -670,6 +692,13 @@ netdev_dpdk_vhost_cuse_construct(struct netdev *netdev_)
>>      struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>>      int err;
>> 
>> +    if (rte_eal_init_ret) {
>> +        dpdk_init();
>> +        if (rte_eal_init_ret) {
>> +            return rte_eal_init_ret;
>> +        }
>> +    }
>> +
>>      ovs_mutex_lock(&dpdk_mutex);
>>      strncpy(netdev->vhost_id, netdev->up.name, sizeof(netdev->vhost_id));
>>      err = vhost_construct_helper(netdev_);
>> @@ -683,6 +712,13 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
>>      struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>>      int err;
>> 
>> +    if (rte_eal_init_ret) {
>> +        dpdk_init();
>> +        if (rte_eal_init_ret) {
>> +            return rte_eal_init_ret;
>> +        }
>> +    }
>> +
>>      ovs_mutex_lock(&dpdk_mutex);
>>      /* Take the name of the vhost-user port and append it to the location
>> where
>>       * the socket is to be created, then register the socket.
>> @@ -707,7 +743,10 @@ netdev_dpdk_construct(struct netdev *netdev)
>>      int err;
>> 
>>      if (rte_eal_init_ret) {
>> -        return rte_eal_init_ret;
>> +        dpdk_init();
>> +        if (rte_eal_init_ret) {
>> +            return rte_eal_init_ret;
>> +        }
>>      }
>> 
>>      /* Names always start with "dpdk" */
>> @@ -1776,6 +1815,8 @@ new_device(struct virtio_net *dev)
>>      struct netdev_dpdk *netdev;
>>      bool exists = false;
>> 
>> +
>> +
>>      ovs_mutex_lock(&dpdk_mutex);
>>      /* Add device to the vhost port with the same name as that passed down.
>> */
>>      LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
>> @@ -2023,7 +2064,10 @@ netdev_dpdk_ring_construct(struct netdev *netdev)
>>      int err = 0;
>> 
>>      if (rte_eal_init_ret) {
>> -        return rte_eal_init_ret;
>> +        dpdk_init();
>> +        if (rte_eal_init_ret) {
>> +            return rte_eal_init_ret;
>> +        }
>>      }
>> 
>>      ovs_mutex_lock(&dpdk_mutex);
>> @@ -2111,10 +2155,14 @@ unlock_dpdk:
>> 
>>  static int
>>  process_vhost_flags(char *flag, char *default_val, int size,
>> -                    char **argv, char **new_val)
>> +                    const struct ovsrec_open_vswitch *ovs_cfg,
>> +                    char **new_val)
>>  {
>> +    const char *val;
>>      int changed = 0;
>> 
>> +    val = smap_get(&ovs_cfg->other_config, flag);
>> +
>>      /* Depending on which version of vhost is in use, process the vhost-
>> specific
>>       * flag if it is provided on the vswitchd command line, otherwise resort
>> to
>>       * a default value.
>> @@ -2124,9 +2172,9 @@ process_vhost_flags(char *flag, char *default_val, int
>> size,
>>       * For vhost-cuse: Process "-cuse_dev_name" to set the custom name of
>> the
>>       * vhost-cuse character device.
>>       */
>> -    if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) {
>> +    if (val && (strlen(val) <= size)) {
>>          changed = 1;
>> -        *new_val = strdup(argv[2]);
>> +        *new_val = strdup(val);
>>          VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
>>      } else {
>>          VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
>> @@ -2136,68 +2184,115 @@ process_vhost_flags(char *flag, char *default_val,
>> int size,
>>      return changed;
>>  }
>> 
>> -int
>> -dpdk_init(int argc, char **argv)
>> +static char **
>> +grow_argv(char ***argv, size_t cur_siz, size_t grow_by)
>>  {
>> -    int result;
>> -    int base = 0;
>> -    char *pragram_name = argv[0];
>> -    int err;
>> -    int isset;
>> -    cpu_set_t cpuset;
>> -
>> -    if (argc < 2 || strcmp(argv[1], "--dpdk"))
>> -        return 0;
>> +    char **new_argv = realloc(*argv, sizeof(char *) * (cur_siz + grow_by));
>> +    return new_argv;
>> +}
>> 
>> -    /* Remove the --dpdk argument from arg list.*/
>> -    argc--;
>> -    argv++;
>> +static int
>> +get_dpdk_args(const struct ovsrec_open_vswitch *ovs_cfg, char ***argv)
>> +{
>> +    struct dpdk_options_map {
>> +        const char *ovs_configuration;
>> +        const char *dpdk_option;
>> +        bool default_enabled;
>> +        const char *default_value;
>> +    } opts[] = {
>> +        {"dpdk-lcore-mask", "-c", true, "0x1"},
>> +        /* XXX: DPDK 2.2.0 support, the true should become false for -n */
>> +        {"dpdk-mem-channels", "-n", true, "4"},
>> +        {"dpdk-alloc-mem", "-m", false, NULL},
>> +        {"dpdk-socket-mem", "--socket-mem", true, "1024,0"},
>> +        {"dpdk-hugepage-dir", "--huge-dir", false, NULL},
>> +    };
>> +    int i, ret = 1;
>> +
>> +    for(i = 0; i < (sizeof(opts) / sizeof(opts[0])); ++i) {
>> +        const char *lookup = smap_get(&ovs_cfg->other_config,
>> +                                      opts[i].ovs_configuration);
>> +        if(!lookup && opts[i].default_enabled)
>> +            lookup = opts[i].default_value;
>> +
>> +        if(lookup) {
>> +            char **newargv = grow_argv(argv, ret, 2);
>> +
>> +            if (!newargv) {
>> +                ovs_abort(0, "grow_argv() failed to allocate memory.");
>> +            }
>> 
>> -    /* Reject --user option */
>> -    int i;
>> -    for (i = 0; i < argc; i++) {
>> -        if (!strcmp(argv[i], "--user")) {
>> -            VLOG_ERR("Can not mix --dpdk and --user options, aborting.");
>> +            *argv = newargv;
>> +            (*argv)[ret++] = strdup(opts[i].dpdk_option);
>> +            (*argv)[ret++] = strdup(lookup);
>>          }
>>      }
>> 
>> +    return ret;
>> +}
>> +
>> +static char **dpdk_argv;
>> +static int dpdk_argc;
>> +
>> +static void
>> +deferred_argv_release(void)
>> +{
>> +    int result;
>> +    for (result = 0; result < dpdk_argc; ++result) {
>> +        free(dpdk_argv[result]);
>> +    }
>> +
>> +    free(dpdk_argv);
>> +}
>> +
>> +static void
>> +__dpdk_init(const struct ovsrec_open_vswitch *ovs_cfg)
>> +{
>> +    char **argv = NULL;
>> +    int result;
>> +    int argc;
>> +    int err;
>> +    cpu_set_t cpuset;
>> +
>> +    VLOG_INFO("DPDK Enabled, initializing");
>> +
>>  #ifdef VHOST_CUSE
>> -    if (process_vhost_flags("-cuse_dev_name", strdup("vhost-net"),
>> -                            PATH_MAX, argv, &cuse_dev_name)) {
>> +    if (process_vhost_flags("cuse-dev-name", strdup("vhost-net"),
>> +                            PATH_MAX, ovs_cfg, &cuse_dev_name)) {
>>  #else
>> -    if (process_vhost_flags("-vhost_sock_dir", strdup(ovs_rundir()),
>> -                            NAME_MAX, argv, &vhost_sock_dir)) {
>> +    if (process_vhost_flags("vhost-sock-dir", strdup(ovs_rundir()),
>> +                            NAME_MAX, ovs_cfg, &vhost_sock_dir)) {
>>          struct stat s;
>> -        int err;
>> 
>>          err = stat(vhost_sock_dir, &s);
>>          if (err) {
>> -            VLOG_ERR("vHostUser socket DIR '%s' does not exist.",
>> -                     vhost_sock_dir);
>> -            return err;
>> +            ovs_abort(0, "vhost-user sock directory '%s' does not exist.",
>> +                      vhost_sock_dir);
>>          }
>>  #endif
>> -        /* Remove the vhost flag configuration parameters from the argument
>> -         * list, so that the correct elements are passed to the DPDK
>> -         * initialization function
>> -         */
>> -        argc -= 2;
>> -        argv += 2;    /* Increment by two to bypass the vhost flag arguments
>> */
>> -        base = 2;
>>      }
>> 
>>      /* Get the main thread affinity */
>>      CPU_ZERO(&cpuset);
>>      err = pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t),
>> &cpuset);
>>      if (err) {
>> -        VLOG_ERR("Thread getaffinity error %d.", err);
>> -        return err;
>> +        ovs_abort(0, "Thread getaffinity error %d.", err);
>>      }
>> 
>> -    /* Keep the program name argument as this is needed for call to
>> -     * rte_eal_init()
>> -     */
>> -    argv[0] = pragram_name;
>> +    argv = grow_argv(&argv, 0, 1);
>> +    if (!argv) {
>> +        ovs_abort(0, "Unable to allocate an initial argv.");
>> +    }
>> +    argv[0] = strdup("ovs"); /* TODO use prctl to get process name */
>> +    argc = get_dpdk_args(ovs_cfg, &argv);
>> +
>> +    argv = grow_argv(&argv, argc, argc+1);
>> +    if (!argv) {
>> +        ovs_abort(0, "Unable to make final argv allocation.");
>> +    }
>> +    argv[argc] = 0;
>> +
>> +    optind = 1;
>> 
>>      /* Make sure things are initialized ... */
>>      result = rte_eal_init(argc, argv);
>> @@ -2208,21 +2303,36 @@ dpdk_init(int argc, char **argv)
>>      /* Set the main thread affinity back to pre rte_eal_init() value */
>>      err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t),
>> &cpuset);
>>      if (err) {
>> -        VLOG_ERR("Thread setaffinity error %d", err);
>> -        return err;
>> +        ovs_abort(0, "Thread getaffinity error %d.", err);
>>      }
>> 
>> +    dpdk_argv = argv;
>> +    dpdk_argc = argc;
>> +
>> +    atexit(deferred_argv_release);
>> +
>>      rte_memzone_dump(stdout);
>>      rte_eal_init_ret = 0;
>> 
>> -    if (argc > result) {
>> -        argv[result] = argv[0];
>> -    }
>> -
>>      /* We are called from the main thread here */
>>      RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
>> +}
>> 
>> -    return result + 1 + base;
>> +void
>> +dpdk_config(const struct ovsrec_open_vswitch *ovs_cfg)
>> +{
>> +    ovs_last_cfg = ovs_cfg;
>
> I think there would need to be locking around ovs_last_cfg as we
> could also be in the middle of initialization here.

This won't be an issue with the next series I post.

>> +}
>> +
>> +void
>> +dpdk_init(void)
>> +{
>> +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +
>> +    if(ovs_last_cfg && ovsthread_once_start(&once)) {
>> +        __dpdk_init(ovs_last_cfg);
>> +        ovsthread_once_done(&once);
>> +    }
>>  }
>> 
>>  static const struct netdev_class dpdk_class =
>> @@ -2286,10 +2396,6 @@ netdev_dpdk_register(void)
>>  {
>>      static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> 
>> -    if (rte_eal_init_ret) {
>> -        return;
>> -    }
>> -
>>      if (ovsthread_once_start(&once)) {
>>          dpdk_common_init();
>>          netdev_register_provider(&dpdk_class);
>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
>> index 646d3e2..e699f37 100644
>> --- a/lib/netdev-dpdk.h
>> +++ b/lib/netdev-dpdk.h
>> @@ -22,7 +22,9 @@ struct dp_packet;
>> 
>>  #define NON_PMD_CORE_ID LCORE_ID_ANY
>> 
>> -int dpdk_init(int argc, char **argv);
>> +struct ovsrec_open_vswitch;
>> +
>> +void dpdk_config(const struct ovsrec_open_vswitch *ovs_cfg);
>>  void netdev_dpdk_register(void);
>>  void free_dpdk_buf(struct dp_packet *);
>>  int pmd_thread_setaffinity_cpu(unsigned cpu);
>> @@ -30,15 +32,20 @@ int pmd_thread_setaffinity_cpu(unsigned cpu);
>>  #else
>> 
>>  #define NON_PMD_CORE_ID UINT32_MAX
>> -
>>  #include "util.h"
>> 
>> -static inline int
>> -dpdk_init(int argc, char **argv)
>> +static inline void
>> +dpdk_config(const struct ovsrec_open_vswitch *ovs_cfg)
>>  {
>> -    if (argc >= 2 && !strcmp(argv[1], "--dpdk")) {
>> -        ovs_fatal(0, "DPDK support not built into this copy of Open
>> vSwitch.");
>> +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +
>> +    if (ovs_cfg && ovsthread_once_start(&once)) {
>> +        if(smap_get_bool(&ovs_cfg->other_config, "dpdk", false)) {
>> +            ovs_fatal(0, "DPDK not supported in this copy of Open
>> vSwitch.");
>
> "dpdk" db entry is not in this version of the patchset. 

Good catch... I should have tested it more thoroughly.

> Might be best for this function to just do nothing now. A print doesn't
> even seem to make sense as this will be called for OVS without DPDK.

Okay.

>> +        }
>> +        ovsthread_once_done(&once);
>>      }
>> +
>>      return 0;
>>  }
>> 
>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>> index f8afe55..f7aab07 100644
>> --- a/vswitchd/bridge.c
>> +++ b/vswitchd/bridge.c
>> @@ -68,6 +68,7 @@
>>  #include "sflow_api.h"
>>  #include "vlan-bitmap.h"
>>  #include "packets.h"
>> +#include "lib/netdev-dpdk.h"
>> 
>>  VLOG_DEFINE_THIS_MODULE(bridge);
>> 
>> @@ -2922,6 +2923,8 @@ bridge_run(void)
>>      }
>>      cfg = ovsrec_open_vswitch_first(idl);
>> 
>> +    dpdk_config(cfg);
>> +
>>      /* Initialize the ofproto library.  This only needs to run once, but
>>       * it must be done after the configuration is set.  If the
>>       * initialization has already occurred, bridge_init_ofproto()
>> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
>> index e78ecda..c448107 100644
>> --- a/vswitchd/ovs-vswitchd.c
>> +++ b/vswitchd/ovs-vswitchd.c
>> @@ -48,7 +48,6 @@
>>  #include "openvswitch/vconn.h"
>>  #include "openvswitch/vlog.h"
>>  #include "lib/vswitch-idl.h"
>> -#include "lib/netdev-dpdk.h"
>> 
>>  VLOG_DEFINE_THIS_MODULE(vswitchd);
>> 
>> @@ -71,13 +70,6 @@ main(int argc, char *argv[])
>>      int retval;
>> 
>>      set_program_name(argv[0]);
>> -    retval = dpdk_init(argc,argv);
>> -    if (retval < 0) {
>> -        return retval;
>> -    }
>> -
>> -    argc -= retval;
>> -    argv += retval;
>> 
>>      ovs_cmdl_proctitle_init(argc, argv);
>>      service_start(&argc, &argv);
>> @@ -166,7 +158,7 @@ parse_options(int argc, char *argv[], char
>> **unixctl_pathp)
>>          {"bootstrap-ca-cert", required_argument, NULL,
>> OPT_BOOTSTRAP_CA_CERT},
>>          {"enable-dummy", optional_argument, NULL, OPT_ENABLE_DUMMY},
>>          {"disable-system", no_argument, NULL, OPT_DISABLE_SYSTEM},
>> -        {"dpdk", required_argument, NULL, OPT_DPDK},
>> +        {"dpdk", optional_argument, NULL, OPT_DPDK},
>>          {NULL, 0, NULL, 0},
>>      };
>>      char *short_options =
>> ovs_cmdl_long_options_to_short_options(long_options);
>> @@ -219,7 +211,7 @@ parse_options(int argc, char *argv[], char
>> **unixctl_pathp)
>>              exit(EXIT_FAILURE);
>> 
>>          case OPT_DPDK:
>> -            ovs_fatal(0, "--dpdk must be given at beginning of command
>> line.");
>> +            ovs_fatal(0, "Using --dpdk to configure DPDK is not
>> supported.");
>>              break;
>> 
>>          default:
>> @@ -256,17 +248,8 @@ usage(void)
>>      daemon_usage();
>>      vlog_usage();
>>      printf("\nDPDK options:\n"
>> -           "  --dpdk [VHOST] [DPDK]     Initialize DPDK datapath.\n"
>> -           "  where DPDK are options for initializing DPDK lib and VHOST
>> is\n"
>> -#ifdef VHOST_CUSE
>> -           "  option to override default character device name used for\n"
>> -           "  for use with userspace vHost\n"
>> -           "    -cuse_dev_name NAME\n"
>> -#else
>> -           "  option to override default directory where vhost-user\n"
>> -           "  sockets are created.\n"
>> -           "    -vhost_sock_dir DIR\n"
>> -#endif
>> +           "Configuration of DPDK via command-line is removed from this\n"
>> +           "version of Open vSwitch. DPDK is configured through ovsdb.\n"
>>             );
>>      printf("\nOther options:\n"
>>             "  --unixctl=SOCKET          override default control socket
>> name\n"
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index ce0dbc1..355da95 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -176,11 +176,38 @@
>>          </p>
>>        </column>
>> 
>> -      <column name="other_config" key="pmd-cpu-mask">
>> +      <column name="other_config" key="dpdk-lcore-mask"
>> +              type='{"type": "integer", "minInteger": 1}'>
>> +        <p>
>> +          Specifies the CPU cores on which dpdk lcore threads should be
>> spawned.
>> +          The DPDK lcore threads are used for DPDK library tasks, such as
>> +          library internal message processing, logging, etc. Value should be
>> in
>> +          the form of a hex string (so '0x123') similar to the 'taskset'
>> mask
>> +          input.
>> +        </p>
>> +        <p>
>> +          The lowest order bit corresponds to the first CPU core.  A set bit
>> +          means the corresponding core is available and a pmd thread will be
>> +          created and pinned to it.  If the input does not cover all cores,
>> +          those uncovered cores are considered not set.
>> +        </p>
>> +        <p>
>> +          For performance reasons, it is best to set this to a single core
>> on
>> +          the system, rather than allow lcore threads to float.
>> +        </p>
>> +        <p>
>> +          If not specified, the value will be determined by choosing the
>> lowest
>> +          CPU core from initial cpu affinity list. Otherwise, the value will
>> be
>> +          passed directly to the DPDK library.
>> +        </p>
>> +      </column>
>> +
>> +      <column name="other_config" key="pmd-cpu-mask"
>> +              type='{"type": "integer", "minInteger": 1}'>
>>          <p>
>>            Specifies CPU mask for setting the cpu affinity of PMD (Poll
>>            Mode Driver) threads.  Value should be in the form of hex string,
>> -          similar to the dpdk EAL '-c COREMASK' option input or the
>> 'taskset'
>> +          similar to the dpdk-lcore-mask option input or the 'taskset'
>>            mask input.
>>          </p>
>>          <p>
>> @@ -195,6 +222,77 @@
>>          </p>
>>        </column>
>> 
>> +      <column name="other_config" key="dpdk-mem-channels"
>> +              type='{"type": "integer", "minInteger": 1}'>
>> +        <p>
>> +          Specifies the number of NUMA memory spread channels in the CPU to
>> +          be used by DPDK. It is purely optimization.
>> +        </p>
>> +        <p>
>> +          The default value is 4.
>> +        </p>
>> +      </column>
>> +
>> +      <column name="other_config" key="dpdk-alloc-mem"
>> +              type='{"type": "integer", "minInteger": 0}'>
>> +        <p>
>> +          Specifies the amount of memory to preallocate from the hugepage
>> pool,
>> +          regardless of socket. It is recommended that dpdk-socket-mem is
>> used
>> +          instead.
>> +        </p>
>> +        <p>
>> +          If not specified, the value is 0.
>> +        </p>
>> +      </column>
>> +
>> +      <column name="other_config" key="dpdk-socket-mem"
>> +              type='{"type": "string"}'>
>> +        <p>
>> +          Specifies the amount of memory to preallocate from the hugepage
>> pool,
>> +          on a per-socket basis.
>> +        </p>
>> +        <p>
>> +          The specifier is a comma-separated string, in ascending order of
>> CPU
>> +          socket (ex: 1024,2048,4096,8192 would set socket 0 to preallocate
>> +          1024MB, socket 1 to preallocate 2048MB, etc.)
>> +        </p>
>> +        <p>
>> +          If not specified, the default value is 1024,0.
>> +        </p>
>> +      </column>
>> +
>> +      <column name="other_config" key="dpdk-hugepage-dir"
>> +              type='{"type": "string"}'>
>> +        <p>
>> +          Specifies the path to the hugetlbfs mount point.
>> +        </p>
>> +        <p>
>> +          If not specified, this will be guessed by the DPDK library
>> (default
>> +          is /dev/hugepages).
>> +        </p>
>> +      </column>
>> +
>> +      <column name="other_config" key="cuse-dev-name"
>> +              type='{"type": "string"}'>
>> +        <p>
>> +          Specifies the name of the vhost-cuse character device to open for
>> +          vhost-cuse support.
>> +        </p>
>> +        <p>
>> +          The default is vhost-net
>> +        </p>
>> +      </column>
>> +
>> +      <column name="other_config" key="vhost-sock-dir"
>> +              type='{"type": "string"}'>
>> +        <p>
>> +          Specifies the path to the vhost-user unix domain socket files.
>> +        </p>
>> +        <p>
>> +          The default is the working directory of the application.
>> +        </p>
>> +      </column>
>> +
>>        <column name="other_config" key="n-handler-threads"
>>                type='{"type": "integer", "minInteger": 1}'>
>>          <p>
>> --
>> 2.6.1.133.gf5b6079



More information about the dev mailing list