[ovs-dev] [PATCH v6 5/6] dpif-lookup: add avx512 gather implementation.

Van Haaren, Harry harry.van.haaren at intel.com
Fri Jul 10 17:11:21 UTC 2020


> -----Original Message-----
> From: Stokes, Ian <ian.stokes at intel.com>
> Sent: Friday, July 10, 2020 4:51 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>; ovs-dev at openvswitch.org
> Cc: i.maximets at ovn.org; u9012063 at gmail.com; fiezzi at redhat.com
> Subject: Re: [PATCH v6 5/6] dpif-lookup: add avx512 gather implementation.
> 
> 
> 
> On 7/2/2020 6:42 PM, Harry van Haaren wrote:
> > This commit adds an AVX-512 dpcls lookup implementation.
> > It uses the AVX-512 SIMD ISA to perform multiple miniflow
> > operations in parallel.
> >
> > To run this implementation, the "avx512f" and "bmi2" ISAs are
> > required. These ISA checks are performed at runtime while
> > probing the subtable implementation. If a CPU does not provide
> > both "avx512f" and "bmi2", then this code does not execute.
> >
> > The avx512 code is built as a seperate static library, with added
> > CFLAGS to enable the required ISA features. By building only this
> > static library with avx512 enabled, it is ensured that the main OVS
> > core library is *not* using avx512, and that OVS continues to run
> > as before on CPUs that do not support avx512.
> >
> > The approach taken in this implementation is to use the
> > gather instruction to access the packet miniflow, allowing
> > any miniflow blocks to be loaded into an AVX-512 register.
> > This maximises the usefulness of the register, and hence this
> > implementation handles any subtable with up to miniflow 8 bits.
> >
> > Note that specialization of these avx512 lookup routines
> > still provides performance value, as the hashing of the
> > resulting data is performed in scalar code, and compile-time
> > loop unrolling occurs when specialized to miniflow bits.
> >
> > This commit checks at configure time if the assembling in use
> > has a known bug in assembling AVX512 code. If this bug is present,
> > all AVX512 code is disabled. Checking the version string of the binutils
> > or assembler is not a good method to detect the issue, as backported fixes
> > would not be reflected.
> >
> > Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> 
> Thanks for this Harry,
> 
> I've spent some time testing this on both AVX512 enabled and non-enabled
> systems and can confirm the performance increase between scalar, generic
> and avx512 which is great too see.
> 
> One thing I noticed was that with the AVX512 system, there is a
> dependency on how the CFLAGS are passed when configuring and compiling
> OVS in order to enable AVX512 lookup.
> 
> In my testing passing the CFLAGS="CFLAGS="-g -Ofast -march=native" with
> configure seems to work fine and I could see the AVX512 lookup available.
> 
> However when testing with an older script and the same CFLAG was passed
> along with the make command instead of at configure, then AVX512  lookup
> would not be available.
> 
> Depending on how users configure and compile this may not be an issue
> but thought it worth flagging  as there does seem to be a dependency
> that could be missed at compilation (but from the configure logs all
> looked well).
> 
> At the minimum I think it might be worth documenting in the docs,
> possibly in the datapath performance section you add later in the series
> but also maybe in the compiler optimizations section.

Sure, can add command examples in those sections yes.

> > +    /* Disabling AVX512 at compile time, as compile time requirements not met.
> > +     * This could be due to a number of reasons:
> > +     *  1) core OVS is not compiled with SSE4.2 instruction set.
> > +     *     The SSE42 instructions are required to use CRC32 ISA for high-
> > +     *     performance hashing. Consider ./configure of OVS with -msse42 (or
> > +     *     newer) to enable CRC32 hashing and higher performance.
> > +     *  2) The assembler in binutils versions 2.30 and 2.31 has bugs in AVX512
> > +     *     assembly. Compile time probes check for this assembler issue, and
> > +     *     disable the HAVE_LD_AVX512_GOOD check if an issue is detected.
> > +     *     Please upgrade binutils, or backport this binutils fix commit:
> > +     *     2069ccaf8dc28ea699bd901fdd35d90613e4402a
> > +     */
> 
> I wonder if the above info regarding what to check for would be useful
> in the documentation section?

Sure, will add some detail in the docs, and point to the source code for
technical details.

> >   int32_t
> > diff --git a/lib/dpif-netdev-lookup.h b/lib/dpif-netdev-lookup.h
> > index 61f44b9e8..bd72aa29b 100644
> > --- a/lib/dpif-netdev-lookup.h
> > +++ b/lib/dpif-netdev-lookup.h
> > @@ -42,6 +42,10 @@ dpcls_subtable_autovalidator_probe(uint32_t
> u0_bit_count,
> >   dpcls_subtable_lookup_func
> >   dpcls_subtable_generic_probe(uint32_t u0_bit_count, uint32_t u1_bit_count);
> >
> > +/* Probe function for AVX-512 gather implementation */
> > +dpcls_subtable_lookup_func
> > +dpcls_subtable_avx512_gather_probe(uint32_t u0_bit_cnt, uint32_t u1_bit_cnt);
> > +
> >
> >   /* Subtable registration and iteration helpers */
> >   struct dpcls_subtable_lookup_info_t {
> > diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4
> > index add3aabcc..36c25d5fc 100644
> > --- a/m4/openvswitch.m4
> > +++ b/m4/openvswitch.m4
> > @@ -404,6 +404,36 @@ AC_DEFUN([OVS_CHECK_SPHINX],
> >      AC_ARG_VAR([SPHINXBUILD])
> >      AM_CONDITIONAL([HAVE_SPHINX], [test "$SPHINXBUILD" != none])])
> >
> > +dnl Checks for binutils/assembler known issue with AVX512.
> > +dnl Due to backports, we probe assembling a reproducer instaed of checking
> Minor typo above "instead"

Fixed in v7.

<snip patch>

> I've checked the unit tests with the auto validator as well myself and
> looks good with all tests passing, really neat feature.
> 
> Regards
> Ian

Thanks for review, -Harry


More information about the dev mailing list