[ovs-dev] [v13 09/12] dpif-netdev/dpcls-avx512: Enable 16 block processing.

Flavio Leitner fbl at sysclose.org
Thu Jun 24 03:39:23 UTC 2021


On Thu, Jun 17, 2021 at 05:18:22PM +0100, Cian Ferriter wrote:
> From: Harry van Haaren <harry.van.haaren at intel.com>
> 
> This commit implements larger subtable searches in avx512. A limitation
> of the previous implementation was that up to 8 blocks of miniflow
> data could be matched on (so a subtable with 8 blocks was handled
> in avx, but 9 blocks or more would fall back to scalar/generic).
> This limitation is removed in this patch, where up to 16 blocks
> of subtable can be matched on.
> 
> From an implementation perspective, the key to enabling 16 blocks
> over 8 blocks was to do bitmask calculation up front, and then use
> the pre-calculated bitmasks for 2x passes of the "blocks gather"
> routine. The bitmasks need to be shifted for k-mask usage in the
> upper (8-15) block range, but it is relatively trivial. This also
> helps in case expanding to 24 blocks is desired in future.
> 
> The implementation of the 2nd iteration to handle > 8 blocks is
> behind a conditional branch which checks the total number of bits.
> This helps the specialized versions of the function that have a
> miniflow fingerprint of less-than-or-equal 8 blocks, as the code
> can be statically stripped out of those functions. Specialized
> functions that do require more than 8 blocks will have the branch
> removed and unconditionally execute the 2nd blocks gather routine.
> 
> Lastly, the _any() flavour will have the conditional branch, and
> the branch predictor may mispredict a bit, but per burst will
> likely get most packets correct (particularly towards the middle
> and end of a burst).
> 
> The code has been run with unit tests under autovalidation and
> passes all cases, and unit test coverage has been checked to
> ensure the 16 block code paths are executing.
> 
> Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> 
> ---

The changes look good to me. I also introduced errors on the
first 8 blocks and on the second 8 blocks and both caused the
autovalidation to fail.

Acked-by: Flavio Leitner <fbl at sysclose.org>



More information about the dev mailing list