[ovs-dev] [PATCH 10/18] lib/cfm: Use relaxed atomics and optimize cfm_should_process_flow().

Thu Aug 28 18:28:41 UTC 2014

On Fri, Aug 22, 2014 at 01:58:21PM -0700, Jarno Rajahalme wrote:
> The atomics here do not synchronize the state of any other variables,
> so we can use relaxed atomics.
> 
> cfm_should_process_flow() is rearranged to set the megaflow mask bits
> only if necessary, and to avoid the atomic operation on non-BFD
> packets.
> 
> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
> +    /* Most packets are not CFM. */
> +    if (OVS_LIKELY(ntohs(flow->dl_type) != ETH_TYPE_CFM)) {
> +        return false;
> +    }

The common recommendation is to htons the constant instead of ntohsing
the variable, to generate better code.  Out of curiosity, I decided to
find out whether this is still needed with modern compilers, so I
built the following with GCC and Clang:

    #include <netinet/in.h>
    #include <stdbool.h>

    bool
    is_ip(unsigned short int x)
    {
        return ntohs(x) == 0x0800;
    }

    bool
    is_ip2(unsigned short int x)
    {
        return x == htons(0x0800);
    }

With GCC 4.7.2:

    00000000 <is_ip>:
       0:	0f b7 44 24 04       	movzwl 0x4(%esp),%eax
       5:	66 c1 c8 08          	ror    $0x8,%ax
       9:	66 3d 00 08          	cmp    $0x800,%ax
       d:	0f 94 c0             	sete   %al
      10:	c3                   	ret    

    00000020 <is_ip2>:
      20:	66 83 7c 24 04 08    	cmpw   $0x8,0x4(%esp)
      26:	0f 94 c0             	sete   %al
      29:	c3                   	ret    

With Clang 3.5.0:

    00000000 <is_ip>:
       0:	66 8b 44 24 04       	mov    0x4(%esp),%ax
       5:	66 c1 c0 08          	rol    $0x8,%ax
       9:	0f b7 c0             	movzwl %ax,%eax
       c:	3d 00 08 00 00       	cmp    $0x800,%eax
      11:	0f 94 c0             	sete   %al
      14:	0f b6 c0             	movzbl %al,%eax
      17:	c3                   	ret    

    00000020 <is_ip2>:
      20:	0f b7 44 24 04       	movzwl 0x4(%esp),%eax
      25:	83 f8 08             	cmp    $0x8,%eax
      28:	0f 94 c0             	sete   %al
      2b:	0f b6 c0             	movzbl %al,%eax
      2e:	c3                   	ret    

so it's still worthwhile.  (I was surprised to see Clang generate
worse code overall.  It seems to regard a bool as a 32-bit value, at
least for the purpose of returning from a function.)

> +    atomic_read_relaxed(&cfm->check_tnl_key, &check_tnl_key);
> +
> +    /* Atomic read is always a memory access, do some work
> +     * before we need the result. */

I believe an atomic_read_relaxed() is just a compiler barrier.  That
means that unless cfm_should_process_flow() is getting inlined
somewhere (I doubt it--it has no callers within cfm.c), there isn't
any way that the memory access could be optimized out anyway.  So,
this optimization and comment seems a little silly to me.