[ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

Van Haaren, Harry harry.van.haaren at intel.com
Wed Jun 30 13:30:08 UTC 2021


> -----Original Message-----
> From: Eelco Chaudron <echaudro at redhat.com>
> Sent: Wednesday, June 30, 2021 2:12 PM
> To: Amber, Kumar <kumar.amber at intel.com>; Van Haaren, Harry
> <harry.van.haaren at intel.com>
> Cc: dev at openvswitch.org; i.maximets at ovn.org; Flavio Leitner <fbl at sysclose.org>;
> Stokes, Ian <ian.stokes at intel.com>
> Subject: Re: [ovs-dev] [v4 10/12] dpif-netdev/mfex: Add AVX512 based optimized
> miniflow extract
> 
> This patch was an interesting patch to review and being reminded about endianness,
> and this site,
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_maskz
> _permutexvar_epi8&expand=4315, got me through it ;)

Hah, yes the Intrinsics Guide is very useful for reading/investigating what/how instructions can do.
Its... almost always open in a browser in some tab here! :)


> Some comments below...
> 
> //Eelco

Thanks for review, I'll snip away large chunks of code to reduce verbosity.

Regards, -Harry


> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
> 
> > From: Harry van Haaren <harry.van.haaren at intel.com>

<snip>

> > +/* AVX512-BW level permutex2var_epi8 emulation. */
> > +static inline __m512i
> > +__attribute__((target("avx512bw")))
> 
> Are these targets universal enough for all supported compilers, if not we might need
> to move them to individual macros in compile.h.

Yes, these are the standard gcc/clang etc compiler -m <isa level> switches.

Search for "-mavx512bw" on e.g. this GCC page, lists them all;
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html 

If a compiler does not understand them, we will have to #ifdef that compiler out,
as it just doesn't support the ISA.


> > +/* Static const instances of profiles. These are compile-time constants,
> > + * and are specialized into individual miniflow-extract functions.
> > + */
> > +static const struct mfex_profile mfex_profiles[PROFILE_COUNT] =
> > +{
> > +    [PROFILE_ETH_IPV4_UDP] = {
> > +        .probe_mask.u8_data = { PATTERN_ETHERTYPE_MASK PATTERN_IPV4_MASK
> },
> > +        .probe_data.u8_data = { PATTERN_ETHERTYPE_IPV4 PATTERN_IPV4_UDP},
> > +
> > +        .store_shuf.u8_data = { PATTERN_IPV4_UDP_SHUFFLE },
> > +        .store_kmsk = PATTERN_IPV4_UDP_KMASK,
> > +
> > +        .mf_bits = { 0x18a0000000000000, 0x0000000000040401},
> 
> I did some manual translation from these bits, to parts of the flow structure they
> represent, but it was not something fun to do. Maybe you still have your notes and
> could add some to the code? It might help debugging?

Agree that these bits are "arbitrary" to some degree, they're offsets into the miniflow
datastructure, with each bit representing 8-bytes of data.

These are derived from the output of the autovalidator, which prints "good" and "test"
values.

<snip>

> As we are explicitly manual defining the mf_bits I think we also need to update the
> comment in the “struct flow” definition to reflect that if the order change these
> specific functions need updating also.

There's an "ABI Macro" in that struct, we can throw one of those build-time asserts into here
too to be "extra sure", but this would be caught by running MFEX autovalidation unit tests.


<snip>

> > +/* Generic loop to process any mfex profile. This code is specialized into
> > + * multiple actual MFEX implementation functions. Its marked ALWAYS_INLINE
> > + * to ensure the compiler specializes each instance. The code is marked "hot"
> > + * to inform the compiler this is a hotspot in the program, encouraging
> > + * inlining of callee functions such as the permute calls.
> > + */
> > +static inline uint32_t ALWAYS_INLINE
> > +__attribute__ ((hot))
> 
> Do we need to move this to a macro in compiler.h as OVS_HOT to make sure it’s not
> causing issues on other compilers like windows, etc?

I'm not sure, we could I suppose, I'm not strongly for or against. Today this
patchset doesn't modify compiler.h at all, perhaps cleaner to update in a later patch,
and consider other functions for tagging with OVS_HOT too in that patchset? 

<snip>

> > +        /* Copy known dp packet offsets to the dp_packet instance. */
> > +        memcpy(&packet->l2_pad_size, &profile->dp_pkt_offs,
> > +               sizeof(uint16_t) * 4);
> > +
> 
> Here we copy four fields to the packet structure (l2_pad_size, l2_5_ofs, l3_ofs,
> l4_ofs). I think we should add some static_assert to make sure the order of these
> fields do not change.

Yes, I think Flavio had a similar comment in one of the reviews. Good point,
has been addressed with BUILD_ASSERT_DELC() and offsets into struct by Amber.

<snip to end>


More information about the dev mailing list