[ovs-dev] [RFC 1/3] dpdk: Unify passing of DPDK EAL arguments.

Ilya Maximets i.maximets at samsung.com
Tue Feb 5 15:43:08 UTC 2019


Hi Flavio.
Thanks for taking a look.

On 05.02.2019 15:38, Flavio Leitner wrote:
> 
> Hi Ilya,
> 
> Thanks for the patch and I think we knew about that pain when we
> exposed the very first parameter. I still remember Aaron struggling
> to find the best path forward. Time flies and here we are.
> 
> The problem is that this change is not friendly from the community
> perspective to its users. That is like an exposed API which we should
> avoid break as much as possible.
> 
> For instance, there are users (OpenStack) with support life cycle of
> 5+ years that cannot keep up with this kind of change but they expect
> to be able to update to newer OVS.

Sure, there are users that wants stable API that will never change.
But this is really impossible in practice. I'm working with OpenStack
too and will have to fixup deployment tools with these changes. BTW, from
the whole OpenStack only few deployment projects (like TripleO) will need
to make changes and these changes are not very big.

> 
> One idea is to freeze whatever we have now and update the documentation
> to recommend the new way. We give like a couple OVS releases and only
> then ignore (or remove?) those parameters.

Yes, In cover letter I proposed these patches to be applied one per release.
And current (first) patch does not remove the functionality, only docs.
Users still will be able to use old interface, but will have warnings
in the log. In the next release cycle we'll start ignore the values
while still printing the warnings. This should give enough time for adaptation.
If you feel that we need more time, we could apply the second patch to 2.14
(or whatever number will be in 2 releases from now).


> 
> IMO in the end we are moving the problem from one place to another
> because even with a single string, OVS users will be caught off guard
> when DPDK changes. Yes, less pain to OVS community in the sense that
> we don't have to add/remove/deprecate stuff, but it is still a bad
> user experience regardless, which is not what OVS is known for.

Unfortunately, DPDK was never user-friendly enough. It tries, but still not.
We're keeping few sane defaults like providing default lcore and setting the
socket-limit if needed. And we'll try to do that in the future. The thing
this patch tries to eliminate is the dependency tracking between different
EAL arguments, i.e. duplicating the work with rte_eal_init() and having too
many configuration knobs with similar meanings what does not work at the
same time.

Right now there are no critical arguments that user must provide.
So, IMHO, having special configs for them is really unnecessary.

> 
> Thanks,
> fbl
> 
> 
> On Fri, Feb 01, 2019 at 04:11:12PM +0300, Ilya Maximets wrote:
>> This patch deprecates the usage of current configuration knobs for
>> DPDK EAL arguments:
>>
>>   - dpdk-alloc-mem
>>   - dpdk-socket-mem
>>   - dpdk-socket-limit
>>   - dpdk-lcore-mask
>>   - dpdk-hugepage-dir
>>   - dpdk-extra
>>
>> All above configuration options replaced with single 'dpdk-options'
>> which has same format as old 'dpdk-extra', i.e. just a string with
>> all the DPDK EAL arguments.
>>
>> All the documentation about old configuration knobs removed. Users
>> could still use an old interface, but the deprecation warning will be
>> printed. In case 'dpdk-options' provided, values of old options will
>> be ignored. New users will start using new 'dpdk-options' as this is
>> the only documented way providing EAL arguments.
>>
>> Rationale:
>>
>> From release to release DPDK becomes more complex. New EAL arguments
>> appears, old arguments becomes deprecated. Some changes their meaning
>> and behaviour. It's not easy task to manage all this and current
>> code in OVS that tries to manage DPDK options is not flexible/extendable
>> enough even to track all the dependencies between options in DPDK 18.11.
>> For example, '--socket-mem' changed its meaning, '--legacy-mem' and
>> '--socket-limit' appeared. But '--legacy-mem' and '--socket-limit'
>> could not be used at the same time and, also, we want to provide
>> good default value for '--socket-limit' to keep the consistent
>> behaviour of OVS memory usage. And this will be worse in the future.
>>
>> Another point is that even now we have mutually exclusive database
>> configs in OVS and we have to handle them. i.e we're providing
>> 'dpdk-alloc-mem' and 'dpdk-socket-mem' at the same time. Additionally,
>> user allowed to configure same options in 'dpdk-extra'. So, we have
>> similar/equal options in a three places in ovsdb and we need to
>> choose which one to use. It's pretty easy for user to get into
>> inconsistent database configuration, because users could update
>> one field without removing others, and OVS has to resolve all the
>> conflicts somehow constructing the EAL args. But we do not know in
>> practice, if the resulted arguments are the arguments that user wanted
>> to see or just forget to remove the higher priority knob.
>>
>> Next point is that we have 'dpdk-lcore-mask' (-c) which is actually not
>> so user-friendly as '-l', but we have no option for it. Will we create
>> additional 'dpdk-lcore-list' ? I guess, no, because it's not really
>> important thing. But users will stuck with this not so user-friendly
>> masks.
>>
>> Another thing is that OVS is not really need to check the consistency
>> of the EAL arguments because DPDK could check it instead. DPDK will
>> always check the args and it will do it better. There is no real
>> need to duplicate this functionality.
>>
>> Keeping all the options in a single string 'dpdk-options' allows:
>>
>>   * To have only one place for all the configs, i.e. it will be harder
>>     for user to forget to remove something from the single string while
>>     updating it.
>>   * Not to check the consistency between different configs, DPDK could
>>     check the cmdline itself.
>>   * Not to document every DPDK EAL argument in OVS. We can just
>>     describe few of them and point to DPDK docs for more information.
>>   * OVS still able to provide some minimal default arguments.
>>     Thanks to DPDK 18.11 we only need to specify an lcore. All other
>>     arguments are not necessary. (We still also providing default
>>     --socket-limit in case --socket-mem passed without it and without
>>     --legacy-mem)
>>
>> Usage example:
>>
>>   ovs-vsctl set Open_vSwitch . \
>>       other_config:dpdk-options="-l 1 -n 4 --socket-mem 1024,1024"
>>
>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>> ---
>>  Documentation/intro/install/dpdk.rst |  40 +++++---
>>  NEWS                                 |   7 +-
>>  lib/dpdk.c                           |  49 +++++++++-
>>  utilities/ovs-dev.py                 |   2 +-
>>  vswitchd/vswitch.xml                 | 132 ++++++---------------------
>>  5 files changed, 108 insertions(+), 122 deletions(-)
>>
>> diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
>> index 344d2b3a6..2243fde53 100644
>> --- a/Documentation/intro/install/dpdk.rst
>> +++ b/Documentation/intro/install/dpdk.rst
>> @@ -235,32 +235,44 @@ listed below. Defaults will be provided for all values not explicitly set.
>>    A value of ``try`` will imply that the ovs-vswitchd process should
>>    continue running even if the EAL initialization fails.
>>  
>> -``dpdk-lcore-mask``
>> -  Specifies the CPU cores on which dpdk lcore threads should be spawned and
>> -  expects hex string (eg '0x123').
>> +``dpdk-options``
>> +  Specifies the string of EAL command line arguments for DPDK.
>> +  For example, ``other_config:dpdk-options="-l 0 --socket-mem 1024,1024"``.
>> +  Important arguments that could be passed there are:
>>  
>> -``dpdk-socket-mem``
>> -  Comma separated list of memory to pre-allocate from hugepages on specific
>> -  sockets. If not specified, 1024 MB will be set for each numa node by
>> -  default.
>> +  - ``-c`` or ``-l``
>>  
>> -``dpdk-hugepage-dir``
>> -  Directory where hugetlbfs is mounted
>> +    Specifies the CPU cores on which dpdk lcore threads should be spawned.
>> +    ``-c`` expects hex string (eg ``0x123``) while ``-l`` expects core list
>> +    (eg ``0-5,8``).
>> +
>> +  - ``--socket-mem``
>> +
>> +    Comma separated list of memory to pre-allocate from hugepages on specific
>> +    sockets.
>> +
>> +  - ``--huge-dir``
>> +
>> +    Directory where hugetlbfs is mounted
>> +
>> +  - See DPDK documentation for the full list:
>> +
>> +    https://doc.dpdk.org/guides-18.11/linux_gsg/linux_eal_parameters.html
>>  
>>  ``vhost-sock-dir``
>>    Option to set the path to the vhost-user unix socket files.
>>  
>> -If allocating more than one GB hugepage, you can configure the
>> -amount of memory used from any given NUMA nodes. For example, to use 1GB from
>> -NUMA node 0 and 0GB for all other NUMA nodes, run::
>> +You can configure the amount of memory preallocated from any given NUMA nodes.
>> +For example, to preallocate ``1GB`` from NUMA node ``0`` and ``0GB`` for all
>> +other NUMA nodes, run::
>>  
>>      $ ovs-vsctl --no-wait set Open_vSwitch . \
>> -        other_config:dpdk-socket-mem="1024,0"
>> +        other_config:dpdk-options="<...> --socket-mem 1024,0"
>>  
>>  or::
>>  
>>      $ ovs-vsctl --no-wait set Open_vSwitch . \
>> -        other_config:dpdk-socket-mem="1024"
>> +        other_config:dpdk-options="<...> --socket-mem 1024"
>>  
>>  .. note::
>>    Changing any of these options requires restarting the ovs-vswitchd
>> diff --git a/NEWS b/NEWS
>> index a64b9d94d..1c09e9d57 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -1,8 +1,11 @@
>>  Post-v2.11.0
>>  ---------------------
>>     - DPDK:
>> -     * New option 'other_config:dpdk-socket-limit' to limit amount of
>> -       hugepage memory that can be used by DPDK.
>> +     * New option 'other_config:dpdk-options' to pass all the
>> +       DPDK EAL argumants in a single string.
>> +     * Following config options deprecated:
>> +       'dpdk-alloc-mem', 'dpdk-socket-mem', 'dpdk-socket-limit',
>> +       'dpdk-lcore-mask', 'dpdk-hugepage-dir' and 'dpdk-extra'.
>>  
>>  
>>  v2.11.0 - xx xxx xxxx
>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>> index 53b74fba4..0c37f231d 100644
>> --- a/lib/dpdk.c
>> +++ b/lib/dpdk.c
>> @@ -91,6 +91,41 @@ args_contains(const struct svec *args, const char *value)
>>      return false;
>>  }
>>  
>> +static bool
>> +report_deprecated_configs(const struct smap *ovs_other_config, bool ignore)
>> +{
>> +    struct option {
>> +        const char *opt;
>> +        const char *replacement;
>> +    } options[] = {
>> +        { "dpdk-alloc-mem",    "-m"             },
>> +        { "dpdk-socket-mem",   "--socket-mem"   },
>> +        { "dpdk-socket-limit", "--socket-limit" },
>> +        { "dpdk-lcore-mask",   "-c"             },
>> +        { "dpdk-hugepage-dir", "--huge-dir"     },
>> +        { "dpdk-extra",        ""               }
>> +    };
>> +    int i;
>> +    bool found = false;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(options); i++) {
>> +        const char *value = smap_get(ovs_other_config, options[i].opt);
>> +
>> +        if (value) {
>> +            VLOG_WARN("Detected deprecated '%s' config. Use '%s %s'"
>> +                      " in 'dpdk-options' instead.%s",
>> +                      options[i].opt, options[i].replacement, value,
>> +                      ignore ? " Value ignored." : "");
>> +            found = true;
>> +        }
>> +    }
>> +    if (found) {
>> +        VLOG_WARN("Deprecated options will be "
>> +                  "silently ignored in the future.");
>> +    }
>> +    return found;
>> +}
>> +
>>  static void
>>  construct_dpdk_options(const struct smap *ovs_other_config, struct svec *args)
>>  {
>> @@ -270,8 +305,10 @@ dpdk_init__(const struct smap *ovs_other_config)
>>      char **argv = NULL;
>>      int result;
>>      bool auto_determine = true;
>> +    bool deprecated_found;
>>      int err = 0;
>>      cpu_set_t cpuset;
>> +    const char *dpdk_options;
>>      struct svec args = SVEC_EMPTY_INITIALIZER;
>>  
>>      log_stream = fopencookie(NULL, "w+", dpdk_log_func);
>> @@ -317,7 +354,15 @@ dpdk_init__(const struct smap *ovs_other_config)
>>                per_port_memory ? "enabled" : "disabled");
>>  
>>      svec_add(&args, ovs_get_program_name());
>> -    construct_dpdk_args(ovs_other_config, &args);
>> +    dpdk_options = smap_get(ovs_other_config, "dpdk-options");
>> +
>> +    deprecated_found = report_deprecated_configs(ovs_other_config,
>> +                                                 dpdk_options ? true : false);
>> +    if (dpdk_options) {
>> +        svec_parse_words(&args, dpdk_options);
>> +    } else if (deprecated_found) {
>> +        construct_dpdk_args(ovs_other_config, &args);
>> +    }
>>  
>>      if (!args_contains(&args, "--legacy-mem")
>>          && !args_contains(&args, "--socket-limit")) {
>> @@ -357,7 +402,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>>                  }
>>              }
>>          } else {
>> -            /* User did not set dpdk-lcore-mask and unable to get current
>> +            /* User did not set lcore mask and unable to get current
>>               * thread affintity - default to core #0 */
>>              VLOG_ERR("Thread getaffinity error %d. Using core #0", err);
>>          }
>> diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py
>> index 248d22ab9..d10b40c87 100755
>> --- a/utilities/ovs-dev.py
>> +++ b/utilities/ovs-dev.py
>> @@ -283,7 +283,7 @@ def run():
>>          _sh("ovs-vsctl --no-wait set Open_vSwitch %s "
>>              "other_config:dpdk-init=true" % root_uuid)
>>          _sh("ovs-vsctl --no-wait set Open_vSwitch %s other_config:"
>> -            "dpdk-extra=\"%s\"" % (root_uuid, ' '.join(options.dpdk)))
>> +            "dpdk-options=\"%s\"" % (root_uuid, ' '.join(options.dpdk)))
>>      else:
>>          _sh("ovs-vsctl --no-wait set Open_vSwitch %s "
>>              "other_config:dpdk-init=false" % root_uuid)
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 1db4e8694..c30265d9e 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -234,56 +234,6 @@
>>          </p>
>>        </column>
>>  
>> -      <column name="other_config" key="dpdk-init"
>> -              type='{"type": "string",
>> -                     "enum": ["set", ["false", "true", "try"]]}'>
>> -        <p>
>> -          Set this value to <code>true</code> or <code>try</code> to enable
>> -          runtime support for DPDK ports. The vswitch must have compile-time
>> -          support for DPDK as well.
>> -        </p>
>> -        <p>
>> -          A value of <code>true</code> will cause the ovs-vswitchd process to
>> -          abort if DPDK cannot be initialized. A value of <code>try</code>
>> -          will allow the ovs-vswitchd process to continue running even if DPDK
>> -          cannot be initialized.
>> -        </p>
>> -        <p>
>> -          The default value is <code>false</code>. Changing this value requires
>> -          restarting the daemon
>> -        </p>
>> -        <p>
>> -          If this value is <code>false</code> at startup, any dpdk ports which
>> -          are configured in the bridge will fail due to memory errors.
>> -        </p>
>> -      </column>
>> -
>> -      <column name="other_config" key="dpdk-lcore-mask"
>> -              type='{"type": "integer", "minInteger": 1}'>
>> -        <p>
>> -          Specifies the CPU cores where dpdk lcore threads should be spawned.
>> -          The DPDK lcore threads are used for DPDK library tasks, such as
>> -          library internal message processing, logging, etc. Value should be in
>> -          the form of a hex string (so '0x123') similar to the 'taskset' mask
>> -          input.
>> -        </p>
>> -        <p>
>> -          The lowest order bit corresponds to the first CPU core.  A set bit
>> -          means the corresponding core is available and an lcore thread will be
>> -          created and pinned to it.  If the input does not cover all cores,
>> -          those uncovered cores are considered not set.
>> -        </p>
>> -        <p>
>> -          For performance reasons, it is best to set this to a single core on
>> -          the system, rather than allow lcore threads to float.
>> -        </p>
>> -        <p>
>> -          If not specified, the value will be determined by choosing the lowest
>> -          CPU core from initial cpu affinity list. Otherwise, the value will be
>> -          passed directly to the DPDK library.
>> -        </p>
>> -      </column>
>> -
>>        <column name="other_config" key="pmd-cpu-mask">
>>          <p>
>>            Specifies CPU mask for setting the cpu affinity of PMD (Poll
>> @@ -303,78 +253,54 @@
>>          </p>
>>        </column>
>>  
>> -      <column name="other_config" key="dpdk-alloc-mem"
>> -              type='{"type": "integer", "minInteger": 0}'>
>> +      <column name="other_config" key="dpdk-init"
>> +              type='{"type": "string",
>> +                     "enum": ["set", ["false", "true", "try"]]}'>
>>          <p>
>> -          Specifies the amount of memory to preallocate from the hugepage pool,
>> -          regardless of socket. It is recommended that dpdk-socket-mem is used
>> -          instead.
>> +          Set this value to <code>true</code> or <code>try</code> to enable
>> +          runtime support for DPDK ports. The vswitch must have compile-time
>> +          support for DPDK as well.
>>          </p>
>> -      </column>
>> -
>> -      <column name="other_config" key="dpdk-socket-mem"
>> -              type='{"type": "string"}'>
>>          <p>
>> -          Specifies the amount of memory to preallocate from the hugepage pool,
>> -          on a per-socket basis.
>> +          A value of <code>true</code> will cause the ovs-vswitchd process to
>> +          abort if DPDK cannot be initialized. A value of <code>try</code>
>> +          will allow the ovs-vswitchd process to continue running even if DPDK
>> +          cannot be initialized.
>>          </p>
>>          <p>
>> -          The specifier is a comma-separated string, in ascending order of CPU
>> -          socket. E.g. On a four socket system 1024,0,2048 would set socket 0
>> -          to preallocate 1024MB, socket 1 to preallocate 0MB, socket 2 to
>> -          preallocate 2048MB and socket 3 (no value given) to preallocate 0MB.
>> +          The default value is <code>false</code>. Changing this value requires
>> +          restarting the daemon
>>          </p>
>>          <p>
>> -          If dpdk-socket-mem and dpdk-alloc-mem are not specified, dpdk-socket-mem
>> -          will be used and the default value is 1024 for each numa node. If
>> -          dpdk-socket-mem and dpdk-alloc-mem are specified at same time,
>> -          dpdk-socket-mem will be used as default. Changing this value
>> -          requires restarting the daemon.
>> +          If this value is <code>false</code> at startup, any dpdk ports which
>> +          are configured in the bridge will fail due to memory errors.
>>          </p>
>>        </column>
>>  
>> -      <column name="other_config" key="dpdk-socket-limit"
>> +      <column name="other_config" key="dpdk-options"
>>                type='{"type": "string"}'>
>>          <p>
>> -          Limits the maximum amount of memory that can be used from the
>> -          hugepage pool, on a per-socket basis.
>> +          Specifies EAL command line arguments for DPDK. For example,
>> +          <code>"-l 1 --socket-mem 1024,1024 -n 4"</code>
>>          </p>
>>          <p>
>> -          The specifier is a comma-separated list of memory limits per socket.
>> -          <code>0</code> will disable the limit for a particular socket.
>> -        </p>
>> -        <p>
>> -          If not specified, OVS will configure limits equal to the amount of
>> -          preallocated memory specified by <ref column="other_config"
>> -          key="dpdk-socket-mem"/> or <code>--socket-mem</code> in
>> -          <ref column="other_config" key="dpdk-extra"/>. If none of the above
>> -          options specified or <code>--legacy-mem</code> provided in
>> -          <ref column="other_config" key="dpdk-extra"/>, limits will not be
>> -          applied.
>>            Changing this value requires restarting the daemon.
>>          </p>
>> -      </column>
>> -
>> -      <column name="other_config" key="dpdk-hugepage-dir"
>> -              type='{"type": "string"}'>
>> -        <p>
>> -          Specifies the path to the hugetlbfs mount point.
>> -        </p>
>> -        <p>
>> -          If not specified, this will be guessed by the DPDK library (default
>> -          is /dev/hugepages). Changing this value requires restarting the
>> -          daemon.
>> -        </p>
>> -      </column>
>> -
>> -      <column name="other_config" key="dpdk-extra"
>> -              type='{"type": "string"}'>
>>          <p>
>> -          Specifies additional eal command line arguments for DPDK.
>> +          If <code>-l</code> and <code>-c</code> options are not specified,
>> +          the default value will be provided by choosing the lowest CPU core
>> +          from initial cpu affinity list.
>> +          For performance reasons, it is best to set lcore set/list to a single
>> +          core on the system, rather than allow lcore threads to float.
>> +          CPU cores proveded by <code>-l</code> and <code>-c</code> options
>> +          only used by DPDK internal threads. To manage affinity of PMD threads
>> +          use <ref column="other_config" key="pmd-cpu-mask"/>.
>>          </p>
>>          <p>
>> -          The default is empty. Changing this value requires restarting the
>> -          daemon
>> +          if <code>--socket-mem</code> option provided without
>> +          <code>--legacy-mem</code> and <code>--socket-limit</code>, OVS will
>> +          provide <code>--socket-limit</code> equal to the amount of
>> +          preallocated memory specified by <code>--socket-mem</code>.
>>          </p>
>>        </column>
>>  
>> -- 
>> 2.17.1
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> 
> 


More information about the dev mailing list