[ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

Van Haaren, Harry harry.van.haaren at intel.com
Wed Jun 30 16:54:22 UTC 2021


> -----Original Message-----
> From: Eelco Chaudron <echaudro at redhat.com>
> Sent: Wednesday, June 30, 2021 3:35 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>
> Cc: Amber, Kumar <kumar.amber at intel.com>; dev at openvswitch.org;
> i.maximets at ovn.org; Flavio Leitner <fbl at sysclose.org>; Stokes, Ian
> <ian.stokes at intel.com>
> Subject: Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized
> miniflow extract
> 
> 
> 
> On 30 Jun 2021, at 15:30, Van Haaren, Harry wrote:
> 
> >> -----Original Message-----
> >> From: Eelco Chaudron <echaudro at redhat.com>
> >> Sent: Wednesday, June 30, 2021 2:12 PM
> >> To: Amber, Kumar <kumar.amber at intel.com>; Van Haaren, Harry
> >> <harry.van.haaren at intel.com>
> >> Cc: dev at openvswitch.org; i.maximets at ovn.org; Flavio Leitner
> <fbl at sysclose.org>;
> >> Stokes, Ian <ian.stokes at intel.com>
> >> Subject: Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized
> >> miniflow extract
> >>
> >> This patch was an interesting patch to review and being reminded about
> endianness,
> >> and this site,
> >>
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_maskz
> >> _permutexvar_epi8&expand=4315, got me through it ;)
> >
> > Hah, yes the Intrinsics Guide is very useful for reading/investigating what/how
> instructions can do.
> > Its... almost always open in a browser in some tab here! :)
> >
> >
> >> Some comments below...
> >>
> >> //Eelco
> >
> > Thanks for review, I'll snip away large chunks of code to reduce verbosity.
> >
> > Regards, -Harry
> >
> >
> >> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> >>
> >>> From: Harry van Haaren <harry.van.haaren at intel.com>
> >
> > <snip>
> >
> >>> +/* AVX512-BW level permutex2var_epi8 emulation. */
> >>> +static inline __m512i
> >>> +__attribute__((target("avx512bw")))
> >>
> >> Are these targets universal enough for all supported compilers, if not we might
> need
> >> to move them to individual macros in compile.h.
> >
> > Yes, these are the standard gcc/clang etc compiler -m <isa level> switches.
> >
> > Search for "-mavx512bw" on e.g. this GCC page, lists them all;
> > https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> >
> > If a compiler does not understand them, we will have to #ifdef that compiler out,
> > as it just doesn't support the ISA.
> 
> Guess my concern is with the windows/Microsoft compiler, as I have no windows
> setup, I can not verify this.

Me neither. Flavio you mentioned a windows compiler issue on the DPIF patchset,
would you test compile here please?


> >>> +/* Static const instances of profiles. These are compile-time constants,
> >>> + * and are specialized into individual miniflow-extract functions.
> >>> + */
> >>> +static const struct mfex_profile mfex_profiles[PROFILE_COUNT] =
> >>> +{
> >>> +    [PROFILE_ETH_IPV4_UDP] = {
> >>> +        .probe_mask.u8_data = { PATTERN_ETHERTYPE_MASK
> PATTERN_IPV4_MASK
> >> },
> >>> +        .probe_data.u8_data = { PATTERN_ETHERTYPE_IPV4
> PATTERN_IPV4_UDP},
> >>> +
> >>> +        .store_shuf.u8_data = { PATTERN_IPV4_UDP_SHUFFLE },
> >>> +        .store_kmsk = PATTERN_IPV4_UDP_KMASK,
> >>> +
> >>> +        .mf_bits = { 0x18a0000000000000, 0x0000000000040401},
> >>
> >> I did some manual translation from these bits, to parts of the flow structure they
> >> represent, but it was not something fun to do. Maybe you still have your notes
> and
> >> could add some to the code? It might help debugging?
> >
> > Agree that these bits are "arbitrary" to some degree, they're offsets into the
> miniflow
> > datastructure, with each bit representing 8-bytes of data.
> >
> > These are derived from the output of the autovalidator, which prints "good" and
> "test"
> > values.
> 
> Nice forgot about that part ;)

The autovalidator strikes again :)

> > <snip>
> >
> >> As we are explicitly manual defining the mf_bits I think we also need to update
> the
> >> comment in the “struct flow” definition to reflect that if the order change these
> >> specific functions need updating also.
> >
> > There's an "ABI Macro" in that struct, we can throw one of those build-time asserts
> into here
> > too to be "extra sure", but this would be caught by running MFEX autovalidation
> unit tests.
> 
> Guess they will but not sure if the dpdk test is part of the standard tests. Anyway, this
> is the comment I think should be updated:
> https://github.com/openvswitch/ovs/blob/e5b5008acdf08e90874f5b4da09ffe162fc
> 762aa/include/openvswitch/flow.h#L97

Will include a build-time check in the AVX512 MFEX to fail a build if struct flow is
updated in future. Autovalidator would again catch any mis-matches, but nice to
know it at build-time too.


> > <snip>
> >
> >>> +/* Generic loop to process any mfex profile. This code is specialized into
> >>> + * multiple actual MFEX implementation functions. Its marked ALWAYS_INLINE
> >>> + * to ensure the compiler specializes each instance. The code is marked "hot"
> >>> + * to inform the compiler this is a hotspot in the program, encouraging
> >>> + * inlining of callee functions such as the permute calls.
> >>> + */
> >>> +static inline uint32_t ALWAYS_INLINE
> >>> +__attribute__ ((hot))
> >>
> >> Do we need to move this to a macro in compiler.h as OVS_HOT to make sure it’s
> not
> >> causing issues on other compilers like windows, etc?
> >
> > I'm not sure, we could I suppose, I'm not strongly for or against. Today this
> > patchset doesn't modify compiler.h at all, perhaps cleaner to update in a later
> patch,
> > and consider other functions for tagging with OVS_HOT too in that patchset?
> >
> > <snip>
> 
> I do not have a strong preference either. It looks like this is the only patch/place
> using it, and as you suggested, we could do it in a follow-up patch if we start using it
> in more places.

OK, lets avoid scope-creep in this patchset, not including.


> >>> +        /* Copy known dp packet offsets to the dp_packet instance. */
> >>> +        memcpy(&packet->l2_pad_size, &profile->dp_pkt_offs,
> >>> +               sizeof(uint16_t) * 4);
> >>> +
> >>
> >> Here we copy four fields to the packet structure (l2_pad_size, l2_5_ofs, l3_ofs,
> >> l4_ofs). I think we should add some static_assert to make sure the order of these
> >> fields do not change.
> >
> > Yes, I think Flavio had a similar comment in one of the reviews. Good point,
> > has been addressed with BUILD_ASSERT_DELC() and offsets into struct by Amber.
> 
> ACK.
> > <snip to end>

Thanks!


More information about the dev mailing list