[ovs-dev] [PATCH] util: Better count_1bits().

Ben Pfaff blp at nicira.com
Fri Dec 6 00:44:58 UTC 2013


On Thu, Dec 05, 2013 at 04:36:26PM -0800, Jarno Rajahalme wrote:
> Inline, use another well-known algorithm for 64-bit builds, and use
> builtins when they are known to be fast at compile time.  A 32-bit
> version of the alternate algorithm is slower than the existing
> implementation, so the old one is used for 32-bit builds.  Inline
> assembler would be a bit faster on 32-bit i7 build, but we use the GCC
> builtin for portability.
> 
> It should be stressed builds for specific CPUs do not work on others
> CPUs, and that OVS build system or runtime does not currently support
> CPU detection.
> 
> Speed improvement v.s. existing implementation / GCC 4.7
> __builtin_popcountll():
> 
> i386:         64%  (inlining)                         / 380%
> i386 on i7:   240% (inlining + builtin)               / 820%
> x86_64:       59%  (inlining + different algorithm)   / 190%
> x86_64 on i7: 370% (inlining + builtin)               / 0%
> 
> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>

Wow.

How did you measure the benefit of inlining?



More information about the dev mailing list