[ovs-dev] [PATCH v1] util: implement count_1bits with Neon intrinsics or gcc built-in for aarch64.

Ben Pfaff blp at ovn.org
Thu Jun 13 17:51:30 UTC 2019


On Thu, Jun 13, 2019 at 06:38:07PM +0800, Yanqin Wei wrote:
> Userspace datapath needs to traverse through miniflow values many times. In
> this process, 'count_1bits' operation for 'Flowmap' significantly impact
> performance. On arm, this function was defined by portable implementation
> because gcc for arm does not support popcnt feature.
> But in the aarch64, VCNT neon instruction can accelerate "count_1bits".
> From Gcc-7, the built-in function is implemented with neon intruction.
> In this patch, count_1bits function will be impelmented with gcc built-in
> from gcc-7 on, and with neon intrinsics in gcc-6.
> Performance test was run in two aarch64 machines. In the NIC2NIC test, one
> tuple dpcls lookup case achieves around 4% throughput improvement and
> 10(average) tuples case achieves around 5% improvement.
> 
> Tested-by: Malvika Gupta <malvika.gupta at arm.com>
> Signed-off-by: Yanqin Wei <Yanqin.Wei at arm.com>

Thanks!  I applied this to master.


More information about the dev mailing list