[ovs-dev] [PATCH] dpif-netdev-lookup: Allow AVX512 dpcls without building core OVS with SSE4.2.

Ilya Maximets i.maximets at ovn.org
Mon Mar 22 12:54:46 UTC 2021

On 3/22/21 12:45 PM, Van Haaren, Harry wrote:
> Hi Ilya,
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets at ovn.org>
>> Sent: Monday, March 22, 2021 10:59 AM
>> To: ovs-dev at openvswitch.org; Van Haaren, Harry
>> <harry.van.haaren at intel.com>
>> Cc: Stokes, Ian <ian.stokes at intel.com>; Timothy Redaelli
>> <tredaelli at redhat.com>; Flavio Leitner <fbl at sysclose.org>; Ilya Maximets
>> <i.maximets at ovn.org>
>> Subject: [PATCH] dpif-netdev-lookup: Allow AVX512 dpcls without building core
>> OVS with SSE4.2.
>> It's only required to build lib/dpif-netdev-lookup-avx512-gather.c
>> with SSE4.2.  All hash functions in lib/hash.h are defined as
>> 'static inline', so they will use fast crc instructions while building
>> this module.  The minimal requirement for core OVS could be left
>> intact.
>> With this change support for SSE4.2 is checked in ./configure and
>> -msse4.2  passed to build libopenvswitchavx512.  The rest of OVS is
>> built with default flags provided by user.  So, AVX512 dpcls could
>> be enabled in a build with default CFLAGS and the rest of the OVS
>> could run on older CPUs or in virtual environments without support
>> for sse4.2 instructions.
>> Same for popcnt.
>> Since the whole lib/dpif-netdev-lookup-avx512-gather.c is built with
>> special instruction sets enabled it's not safe to call any function
>> from where unless we're sure that these ISA is supported by current
>> CPU.  For this reason runtime ISA checks moved to common initialization
>> code and performed only once.  avx512-gather implementation will
>> not be registered and will not be available for a user if current
>> CPU doesn't support it.  Log in dpdk-stub lowered to DBG level because
>> now it will be printed always on startup and it's not really useful
>> because user will not have ability to enable avx512 implementation.
>> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> Thanks for taking a look at the DPCLS AVX512 optimizations & functionality.
> From the commit message I understand "what" you're changing, but I failed
> to understand a clear "why"?
> From initial review, it seems to do a few things:
> 1) Rework CPU flags (sse4.2,popcnt) from compile-time checks to runtime

Some compile time checks are still there, but moved around a little.

> 2) Rework DPCLS impl from static registration to runtime/dynamic
> Is the big-picture goal to avoid compiling core OVS with SSE4.2/popcnt enabled?
> Perhaps a better question - what is the big picture goal?

Yes, the big-picture goal was to avoid compiling core OVS with SSE4.2,
i.e. being able to run the same binary on old or virtual hardware and
being able to enable avx512 optimizations while running on high-end hardware.

> Regarding 1) the requirement for SSE4.2 for CRC32Q instruction based hashing,
> I think you have mistook the reason for *requiring* core OVS to be built with
> SSE4.2 enabled. The hashing done in OVS core and DPCLS must be identical.
> Allowing OVS core to use murmurhash (no SSE4.2), and DPCLS using CRC32 (yes SSE4.2)
> means that upcall hash values, and datapath hash values don't match. For this
> reason, it is invalid to mix-and-match SSE4.2 support in *compliation units* across
> OVS.

Thanks for pointing that out.  I missed that completely.  Basically, I forgot
about this.  We should probably add this information to code comments as it's
easy to miss that fact.  Or do we have it already and I just messed it?
Would be nice to have a build assertion.  I'm not sure how to implement it, though.

The thing that we need is that hash computation in netdev_flow_mask/key_init()
needs to match with hash in dpif-netdev-lookup implementation.
And this is likely achievable if we will create a helper static inline function
implemented somewhere inside dpif-netdev-lookup.h to calculate flow hash.
And since miniflow structure already exists at the point of hash calculation
we can choose the implementation the same way as we're choosing lookup
But, yes, this will be a bigger change.

> To be clear, this is a side-effect of compile-time ISA-based hash function selection,
> and has actually nothing to do with DPCLS per-se: it just shows up here first as it
> is the first location ISA specialization is being done in OVS.
> I don't see issues with 2) reworking registration, it can be a separate patch to
> the 1) CPU-flag rework topic. Although related, it does not depend on it I think?
> All in all, I'm OK with the registration rework if split into a separate patch.

It could be that rework is required as a bug fix, because current runtime
checks are done inside the code that already built with higher ISA enabled,
so it might trigger illegal instruction exception during the check, in theory.

> Regarding topic 1) SSE4.2 ISA-changes, I think it is not valid.
>> ---
>> I only tested the build, didn't check in runtime.
> You state this wasn't runtime checked - likely issues would have shown up (DPCLS
> misses on installed keys, resulting in every packet getting an upcall) if it was tested.
> What needs to be done to ensure you have access to a machine with AVX512?

In practice, I just have no enough time for proper runtime checking.
Writing this patch was pretty fast, OTOH. :)

> Regards, -Harry
> <snip patch contents, as high-level review didn't investigate details>

More information about the dev mailing list