[ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount instruction.

Thu Jun 24 12:52:49 UTC 2021

> -----Original Message-----
> From: Flavio Leitner <fbl at sysclose.org>
> Sent: Thursday, June 24, 2021 1:18 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>
> Cc: Ferriter, Cian <cian.ferriter at intel.com>; ovs-dev at openvswitch.org;
> i.maximets at ovn.org; Amber, Kumar <kumar.amber at intel.com>
> Subject: Re: [ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount
> instruction.
> 
> 
> Hi Harry,
> 
> On Thu, Jun 24, 2021 at 11:07:59AM +0000, Van Haaren, Harry wrote:
> > > -----Original Message-----
> > > From: dev <ovs-dev-bounces at openvswitch.org> On Behalf Of Flavio Leitner
> > > Sent: Thursday, June 24, 2021 4:57 AM
> > > To: Ferriter, Cian <cian.ferriter at intel.com>
> > > Cc: ovs-dev at openvswitch.org; i.maximets at ovn.org
> > > Subject: Re: [ovs-dev] [v13 12/12] dpcls-avx512: Enable avx512 vector popcount
> > > instruction.
> > >
> > > On Thu, Jun 17, 2021 at 05:18:25PM +0100, Cian Ferriter wrote:
> > > > From: Harry van Haaren <harry.van.haaren at intel.com>

<snip some previous discussion detail away>

> > I do like the idea of toolchain supporting ISA options a bit more, there is
> > so much compute performance available that is not widely used today.
> > Such an effort industry wide would be very beneficial to all for improving
> > performance, but would be a pretty large undertaking too... outside the
> > scope of this patchset! :)
> 
> Yeah, it is. I mean, if the toolchain is not ready yet and we think
> worth the benefits considering that most probably fewer people will
> be able to contribute or maintain, then I see no other way to solve
> the issue.

So the toolchain is "ready" in that we have a path to enable CPU ISA, and
see the benefits. We can dream about future toolchains, and how those might
improve our workflow in future, but pragmatically the approach here is the
best-known-method based on available tools today. DPDK uses the same
techniques (Function pointer, CPUID based ISA check, and plug in ISA if available).

Improving the toolchain would only solve the problem to allow the compiler to use the
CPU ISA. This does not solve the problem of the compiler not being able to understand
the data-movement & processing to be able to reason about it and auto-vectorize.

> Do you think improving the toolchain is a larger commitment than
> manually improving applications? A quick look on gcc gave me the
> impression that it does support at least some basic vector
> optimization capabilities.

Yes - you raise a good point, "basic vector optimization capabilities" are present
in various compilers (gcc and clang/llvm is what I test with). For the matrix-multiply
problem that is often used to showcase compiler auto-vectorization, it is an extremely
well bounded, and simple task from understanding the work to be done.

Our emails crossed paths, there's more detail here about matrix multiply & basic vectorization.
https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384377.html

> > I'll admit to being a bit of an ISA fan, but there's some magical instructions
> > that can do stuff in 1x instruction that otherwise take large amounts of
> > shifts & loops. Did I hear somebody ask for examples..??
> 
> Out of curiosity, which tool are you using (if you are) to measure
> the improvements at cycles level? vtune?

I use the Linux Perf tooling for performance measurements, along with OVS's
own per-packet cycle count reporting. Hardware performance measuring (as Linux
Perf and VTune use) provide all the info that's required.

For those not measuring performance at the function/ASM level, run the following
commands and view the performance in your terminal:    perf top -C <pmd_core> -b

Based on that, focus on the area's where lots of cycles are spent, and investigate
alternative SIMD based implementations for that same functionality, making use
of the CPU ISA. That's the general workflow :)

For those particularly interested, I done a "Measure Software Performance of Data Plane Applications"
talk at DPDK Userspace in 2019 talking about workflow/method: https://www.youtube.com/watch?v=ZmwOKR5JyPk

<snip lots of ISA details>

> > I'll stop promoting ISA here, but am happy to continue detailed discussions, or
> break out
> > conversations about specific areas of compute in OVS if there's appetite for that!
> Feel free
> > to email to OVS Mailing list (with me on CC please :) or email directly OK too.
> 
> I am definitely learning more about it and I appreciated your
> longer reply.

As you may notice, this is an area I'm passionate about. If there's specific interest,
I can volunteer to try cover "measuring OVS's SW datapath performance" talk at a
future OVS conference..

Regards, -Harry