[ovs-dev] [v14 05/11] dpif-netdev: Add command to switch dpif implementation.
Flavio Leitner
fbl at sysclose.org
Wed Jul 7 14:40:21 UTC 2021
Hi,
Please see my comments below.
On Thu, Jul 01, 2021 at 04:06:13PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren <harry.van.haaren at intel.com>
>
> This commit adds a new command to allow the user to switch
> the active DPIF implementation at runtime. A probe function
> is executed before switching the DPIF implementation, to ensure
> the CPU is capable of running the ISA required. For example, the
> below code will switch to the AVX512 enabled DPIF assuming
> that the runtime CPU is capable of running AVX512 instructions:
>
> $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
>
> A new configuration flag is added to allow selection of the
> default DPIF. This is useful for running the unit-tests against
> the available DPIF implementations, without modifying each unit test.
>
> The design of the testing & validation for ISA optimized DPIF
> implementations is based around the work already upstream for DPCLS.
> Note however that a DPCLS lookup has no state or side-effects, allowing
> the auto-validator implementation to perform multiple lookups and
> provide consistent statistic counters.
>
> The DPIF component does have state, so running two implementations in
> parallel and comparing output is not a valid testing method, as there
> are changes in DPIF statistic counters (side effects). As a result, the
> DPIF is tested directly against the unit-tests.
>
> Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> Co-authored-by: Cian Ferriter <cian.ferriter at intel.com>
> Signed-off-by: Cian Ferriter <cian.ferriter at intel.com>
>
> ---
>
> v14:
> - Change command name to dpif-impl-set
> - Fix the order of includes to what is layed out in the coding-style.rst
> - Use bool not int to capture return value of dpdk_get_cpu_has_isa()
> - Use an enum to index DPIF impls array.
> - Hide more of the dpif impl details from lib/dpif-netdev.c.
> - Fix comment on *dp_netdev_input_func() typedef.
> - Rename dp_netdev_input_func func to input_func.
> - Remove the datapath or dp argument from the dpif-impl-set CMD.
> - Set the DPIF function pointer atomically.
>
> v13:
> - Add Docs items about the switch DPIF command here rather than in
> later commit.
> - Document operation in manpages as well as rST.
> - Minor code refactoring to address review comments.
> ---
> Documentation/topics/dpdk/bridge.rst | 34 ++++++++
> acinclude.m4 | 15 ++++
> configure.ac | 1 +
> lib/automake.mk | 1 +
> lib/dpif-netdev-avx512.c | 14 +++
> lib/dpif-netdev-private-dpif.c | 122 +++++++++++++++++++++++++++
> lib/dpif-netdev-private-dpif.h | 47 +++++++++++
> lib/dpif-netdev-private-thread.h | 10 ---
> lib/dpif-netdev-unixctl.man | 3 +
> lib/dpif-netdev.c | 74 ++++++++++++++--
> 10 files changed, 306 insertions(+), 15 deletions(-)
> create mode 100644 lib/dpif-netdev-private-dpif.c
>
> diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst
> index 526d5c959..06d1f943c 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -214,3 +214,37 @@ implementation ::
>
> Compile OVS in debug mode to have `ovs_assert` statements error out if
> there is a mis-match in the DPCLS lookup implementation.
> +
> +Datapath Interface Performance
> +------------------------------
> +
> +The datapath interface (DPIF) or dp_netdev_input() is responsible for taking
> +packets through the major components of the userspace datapath; such as
> +miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance
> +stats associated with the datapath.
> +
> +Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to
> +improve performance.
> +
> +By default, dpif_scalar is used. The DPIF implementation can be selected by
> +name ::
> +
> + $ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
> + DPIF implementation set to dpif_avx512.
> +
> + $ ovs-appctl dpif-netdev/dpif-impl-set dpif_scalar
> + DPIF implementation set to dpif_scalar.
> +
> +Running Unit Tests with AVX512 DPIF
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Since the AVX512 DPIF is disabled by default, a compile time option is
> +available in order to test it with the OVS unit test suite. When building with
> +a CPU that supports AVX512, use the following configure option ::
> +
> + $ ./configure --enable-dpif-default-avx512
> +
> +The following line should be seen in the configure output when the above option
> +is used ::
> +
> + checking whether DPIF AVX512 is default implementation... yes
> diff --git a/acinclude.m4 b/acinclude.m4
> index 15a54d636..5fbcd9872 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -30,6 +30,21 @@ AC_DEFUN([OVS_CHECK_DPCLS_AUTOVALIDATOR], [
> fi
> ])
>
> +dnl Set OVS DPIF default implementation at configure time for running the unit
> +dnl tests on the whole codebase without modifying tests per DPIF impl
> +AC_DEFUN([OVS_CHECK_DPIF_AVX512_DEFAULT], [
> + AC_ARG_ENABLE([dpif-default-avx512],
> + [AC_HELP_STRING([--enable-dpif-default-avx512], [Enable DPIF AVX512 implementation as default.])],
> + [dpifavx512=yes],[dpifavx512=no])
> + AC_MSG_CHECKING([whether DPIF AVX512 is default implementation])
> + if test "$dpifavx512" != yes; then
> + AC_MSG_RESULT([no])
> + else
> + OVS_CFLAGS="$OVS_CFLAGS -DDPIF_AVX512_DEFAULT"
> + AC_MSG_RESULT([yes])
> + fi
> +])
> +
> dnl OVS_ENABLE_WERROR
> AC_DEFUN([OVS_ENABLE_WERROR],
> [AC_ARG_ENABLE(
> diff --git a/configure.ac b/configure.ac
> index c077034d4..e45685a6c 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -185,6 +185,7 @@ OVS_ENABLE_WERROR
> OVS_ENABLE_SPARSE
> OVS_CTAGS_IDENTIFIERS
> OVS_CHECK_DPCLS_AUTOVALIDATOR
> +OVS_CHECK_DPIF_AVX512_DEFAULT
> OVS_CHECK_BINUTILS_AVX512
>
> AC_ARG_VAR(KARCH, [Kernel Architecture String])
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 660cd07f0..49f42c2a3 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -116,6 +116,7 @@ lib_libopenvswitch_la_SOURCES = \
> lib/dpif-netdev-private-dfc.c \
> lib/dpif-netdev-private-dfc.h \
> lib/dpif-netdev-private-dpcls.h \
> + lib/dpif-netdev-private-dpif.c \
> lib/dpif-netdev-private-dpif.h \
> lib/dpif-netdev-private-flow.h \
> lib/dpif-netdev-private-hwol.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index d013fea1f..cb252617d 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -24,6 +24,7 @@
> #include "dpif-netdev-perf.h"
> #include "dpif-netdev-private.h"
>
> +#include <errno.h>
> #include <immintrin.h>
>
> #include "dp-packet.h"
> @@ -57,6 +58,19 @@ struct dpif_userdata {
> struct pkt_flow_meta pkt_meta[NETDEV_MAX_BURST];
> };
>
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void)
> +{
> + bool avx512f_available = dpdk_get_cpu_has_isa("x86_64", "avx512f");
> + bool bmi2_available = dpdk_get_cpu_has_isa("x86_64", "bmi2");
> +
> + if (!avx512f_available || !bmi2_available) {
> + return -ENOTSUP;
> + }
> +
> + return 0;
> +}
> +
> int32_t
> dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-dpif.c b/lib/dpif-netdev-private-dpif.c
> new file mode 100644
> index 000000000..da3511f51
> --- /dev/null
> +++ b/lib/dpif-netdev-private-dpif.c
> @@ -0,0 +1,122 @@
> +/*
> + * Copyright (c) 2021 Intel Corporation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +
> +#include "dpif-netdev-private-dpif.h"
> +#include "dpif-netdev-private-thread.h"
> +
> +#include <errno.h>
> +#include <string.h>
> +
> +#include "openvswitch/dynamic-string.h"
> +#include "openvswitch/vlog.h"
> +#include "util.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_netdev_impl);
> +
> +enum dpif_netdev_impl_info_idx {
> + DPIF_NETDEV_IMPL_SCALAR,
> + DPIF_NETDEV_IMPL_AVX512
> +};
> +
> +/* Actual list of implementations goes here. */
> +static struct dpif_netdev_impl_info_t dpif_impls[] = {
> + /* The default scalar C code implementation. */
> + [DPIF_NETDEV_IMPL_SCALAR] = { .input_func = dp_netdev_input,
> + .probe = NULL,
> + .name = "dpif_scalar", },
> +
> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> + /* Only available on x86_64 bit builds with SSE 4.2 used for OVS core. */
> + [DPIF_NETDEV_IMPL_AVX512] = { .input_func = dp_netdev_input_outer_avx512,
> + .probe = dp_netdev_input_outer_avx512_probe,
> + .name = "dpif_avx512", },
> +#endif
> +};
> +
> +static dp_netdev_input_func default_dpif_func;
> +
> +dp_netdev_input_func
> +dp_netdev_impl_get_default(void)
> +{
> + /* For the first call, this will be NULL. Compute the compile time default.
> + */
> + if (!default_dpif_func) {
> + int dpif_idx = 0;
That should be DPIF_NETDEV_IMPL_SCALAR.
> +
> +/* Configure-time overriding to run test suite on all implementations. */
> +#if (__x86_64__ && HAVE_AVX512F && HAVE_LD_AVX512_GOOD && __SSE4_2__)
> +#ifdef DPIF_AVX512_DEFAULT
> + ovs_assert(dpif_impls[DPIF_NETDEV_IMPL_AVX512].input_func
> + == dp_netdev_input_outer_avx512);
This assert() makes little sense now. It's not possible to change
the dpif_impls at runtime, and if we change the code we will notice
the problem only at runtime. Wouldn't it make more sense to make it
generic like below?
#ifdef DPIF_AVX512_DEFAULT
dp_netdev_input_func_probe probe;
/* Check if the compiled default is compatible. */
probe = dpif_impls[DPIF_NETDEV_IMPL_AVX512].probe;
if (!probe || !probe()) {
dpif_idx = DPIF_NETDEV_IMPL_AVX512;
}
#endif
> + if (!dp_netdev_input_outer_avx512_probe()) {
> + dpif_idx = DPIF_NETDEV_IMPL_AVX512;
> + };
> +#endif
> +#endif
> +
> + VLOG_INFO("Default DPIF implementation is %s.\n",
> + dpif_impls[dpif_idx].name);
> + default_dpif_func = dpif_impls[dpif_idx].input_func;
> + }
> +
> + return default_dpif_func;
> +}
> +
> +int32_t
> +dp_netdev_impl_set_default_by_name(const char *name)
> +{
> + dp_netdev_input_func new_default;
> +
> + int32_t err = dp_netdev_impl_get_by_name(name, &new_default);
> +
> + if (!err) {
> + default_dpif_func = new_default;
> + }
> +
> + return err;
> +
> +}
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func)
That one should be static and removed from the
lib/dpif-netdev-private-dpif.h.
> +{
> + ovs_assert(name);
> + ovs_assert(out_func);
> +
> + uint32_t i;
> +
> + for (i = 0; i < ARRAY_SIZE(dpif_impls); i++) {
> + if (strcmp(dpif_impls[i].name, name) == 0) {
> + /* Probe function is optional - so check it is set before exec. */
> + if (dpif_impls[i].probe) {
> + int probe_err = dpif_impls[i].probe();
> + if (probe_err) {
> + *out_func = NULL;
> + return probe_err;
> + }
> + }
> + *out_func = dpif_impls[i].input_func;
> + return 0;
> + }
> + }
> +
> + return -EINVAL;
> +}
> diff --git a/lib/dpif-netdev-private-dpif.h b/lib/dpif-netdev-private-dpif.h
> index bbd719b22..0e58153f4 100644
> --- a/lib/dpif-netdev-private-dpif.h
> +++ b/lib/dpif-netdev-private-dpif.h
> @@ -23,7 +23,54 @@
> struct dp_netdev_pmd_thread;
> struct dp_packet_batch;
>
> +/* Typedef for DPIF functions.
> + * Returns whether all packets were processed successfully.
> + */
> +typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> + struct dp_packet_batch *packets,
> + odp_port_t port_no);
> +
> +/* Probe a DPIF implementation. This allows the implementation to validate CPU
> + * ISA availability. Returns -ENOTSUP if not available, returns 1 if valid to
> + * use.
Returns 0 if valid to use?
> + */
> +typedef int32_t (*dp_netdev_input_func_probe)(void);
> +
> +/* Structure describing each available DPIF implementation. */
> +struct dpif_netdev_impl_info_t {
> + /* Function pointer to execute to have this DPIF implementation run. */
> + dp_netdev_input_func input_func;
> + /* Function pointer to execute to check the CPU ISA is available to run.
> + * May be NULL, which implies that it is always valid to use.
Please reword to make sure setting to NULL is required:
/* Function pointer to execute to check the CPU ISA is available to
* run. If not necessary, it must be set to NULL which implies that
* it is always valid to use. */
> + */
> + dp_netdev_input_func_probe probe;
> + /* Name used to select this DPIF implementation. */
> + const char *name;
> +};
> +
> +/* This function checks all available DPIF implementations, and selects the
> + * returns the function pointer to the one requested by "name".
> + */
> +int32_t
> +dp_netdev_impl_get_by_name(const char *name, dp_netdev_input_func *out_func);
That one doesn't need to be exposed as I mentioned before.
> +
> +/* Returns the default DPIF which is first ./configure selected, but can be
> + * overridden at runtime. */
> +dp_netdev_input_func dp_netdev_impl_get_default(void);
> +
> +/* Overrides the default DPIF with the user set DPIF. */
> +int32_t dp_netdev_impl_set_default_by_name(const char *name);
> +
> /* Available DPIF implementations below. */
> +int32_t
> +dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> + struct dp_packet_batch *packets,
> + odp_port_t in_port);
> +
> +/* AVX512 enabled DPIF implementation and probe functions. */
> +int32_t
> +dp_netdev_input_outer_avx512_probe(void);
> +
> int32_t
> dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet_batch *packets,
> diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
> index 63b99220b..ba79c4a0a 100644
> --- a/lib/dpif-netdev-private-thread.h
> +++ b/lib/dpif-netdev-private-thread.h
> @@ -50,16 +50,6 @@ struct dp_netdev_pmd_thread_ctx {
> bool smc_enable_db;
> };
>
> -/* Forward declaration for typedef. */
> -struct dp_netdev_pmd_thread;
> -
> -/* Typedef for DPIF functions.
> - * Returns a bitmask of packets to handle, possibly including upcall/misses.
> - */
> -typedef int32_t (*dp_netdev_input_func)(struct dp_netdev_pmd_thread *pmd,
> - struct dp_packet_batch *packets,
> - odp_port_t port_no);
> -
> /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate
> * the performance overhead of interrupt processing. Therefore netdev can
> * not implement rx-wait for these devices. dpif-netdev needs to poll
> diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
> index 858d491df..76cc949f9 100644
> --- a/lib/dpif-netdev-unixctl.man
> +++ b/lib/dpif-netdev-unixctl.man
> @@ -226,3 +226,6 @@ recirculation (only in balance-tcp mode).
> When this is the case, the above command prints the load-balancing information
> of the bonds configured in datapath \fIdp\fR showing the interface associated
> with each bucket (hash).
> +.
> +.IP "\fBdpif-netdev/dpif-impl-set\fR \fIdpif_impl\fR"
> +Sets the DPIF to be used to \fIdpif_impl\fR. By default "dpif_scalar" is used.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e0c8f055d..19917c7c5 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -470,8 +470,6 @@ static void dp_netdev_execute_actions(struct dp_netdev_pmd_thread *pmd,
> const struct flow *flow,
> const struct nlattr *actions,
> size_t actions_len);
> -static int32_t dp_netdev_input(struct dp_netdev_pmd_thread *,
> - struct dp_packet_batch *, odp_port_t port_no);
> static void dp_netdev_recirculate(struct dp_netdev_pmd_thread *,
> struct dp_packet_batch *);
>
> @@ -982,6 +980,66 @@ dpif_netdev_subtable_lookup_set(struct unixctl_conn *conn, int argc,
> ds_destroy(&reply);
> }
>
> +static void
> +dpif_netdev_impl_set(struct unixctl_conn *conn, int argc OVS_UNUSED,
> + const char *argv[], void *aux OVS_UNUSED)
> +{
> + /* This function requires just one parameter, the DPIF name. */
> + const char *dpif_name = argv[1];
> + struct shash_node *node;
> +
> + static const char *error_description[2] = {
> + "Unknown DPIF implementation",
> + "CPU doesn't support the required instruction for",
> + };
> +
> + ovs_mutex_lock(&dp_netdev_mutex);
> + int32_t err = dp_netdev_impl_set_default_by_name(dpif_name);
> +
> + if (err) {
> + struct ds reply = DS_EMPTY_INITIALIZER;
> + ds_put_format(&reply, "DPIF implementation not available: %s %s.\n",
> + error_description[ (err == -ENOTSUP) ], dpif_name);
> + const char *reply_str = ds_cstr(&reply);
> + unixctl_command_reply(conn, reply_str);
That should be unixctl_command_reply_error(conn, reply_str)
> + VLOG_INFO("%s", reply_str);
> + ds_destroy(&reply);
> + ovs_mutex_unlock(&dp_netdev_mutex);
> + return;
> + }
> +
> + SHASH_FOR_EACH (node, &dp_netdevs) {
> + struct dp_netdev *dp = node->data;
> +
> + /* Get PMD threads list, required to get DPCLS instances. */
> + size_t n;
> + struct dp_netdev_pmd_thread **pmd_list;
> + sorted_poll_thread_list(dp, &pmd_list, &n);
> +
> + for (size_t i = 0; i < n; i++) {
> + struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> + if (pmd->core_id == NON_PMD_CORE_ID) {
> + continue;
> + }
> +
> + /* Initialize DPIF function pointer to the newly configured
> + * default. */
> + dp_netdev_input_func default_func = dp_netdev_impl_get_default();
> + atomic_uintptr_t *pmd_func = (void *) &pmd->netdev_input_func;
> + atomic_store_relaxed(pmd_func, (uintptr_t) default_func);
> + };
> + }
> + ovs_mutex_unlock(&dp_netdev_mutex);
> +
> + /* Reply with success to command. */
> + struct ds reply = DS_EMPTY_INITIALIZER;
> + ds_put_format(&reply, "DPIF implementation set to %s.\n", dpif_name);
> + const char *reply_str = ds_cstr(&reply);
> + unixctl_command_reply(conn, reply_str);
> + VLOG_INFO("%s", reply_str);
> + ds_destroy(&reply);
> +}
> +
> static void
> dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, int argc,
> const char *argv[], void *aux OVS_UNUSED)
> @@ -1204,6 +1262,10 @@ dpif_netdev_init(void)
> unixctl_command_register("dpif-netdev/subtable-lookup-prio-get", "",
> 0, 0, dpif_netdev_subtable_lookup_get,
> NULL);
> + unixctl_command_register("dpif-netdev/dpif-impl-set",
> + "dpif_implementation_name",
> + 1, 1, dpif_netdev_impl_set,
> + NULL);
> return 0;
> }
>
> @@ -6106,8 +6168,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> hmap_init(&pmd->send_port_cache);
> cmap_init(&pmd->tx_bonds);
>
> - /* Initialize the DPIF function pointer to the default scalar version. */
> - pmd->netdev_input_func = dp_netdev_input;
> + /* Initialize DPIF function pointer to the default configured version. */
> + dp_netdev_input_func default_func = dp_netdev_impl_get_default();
> + atomic_uintptr_t *pmd_func = (void *) &pmd->netdev_input_func;
> + atomic_init(pmd_func, (uintptr_t) default_func);
>
> /* init the 'flow_cache' since there is no
> * actual thread created for NON_PMD_CORE_ID. */
> @@ -7078,7 +7142,7 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
> }
> }
>
> -static int32_t
> +int32_t
> dp_netdev_input(struct dp_netdev_pmd_thread *pmd,
> struct dp_packet_batch *packets,
> odp_port_t port_no)
> --
> 2.32.0
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
--
fbl
More information about the dev
mailing list