[ovs-dev] [PATCH 10/18] lib/cfm: Use relaxed atomics and optimize cfm_should_process_flow().

Ben Pfaff blp at nicira.com
Thu Aug 28 18:29:31 UTC 2014


On Thu, Aug 28, 2014 at 11:28:41AM -0700, Ben Pfaff wrote:
> On Fri, Aug 22, 2014 at 01:58:21PM -0700, Jarno Rajahalme wrote:
> > The atomics here do not synchronize the state of any other variables,
> > so we can use relaxed atomics.
> > 
> > cfm_should_process_flow() is rearranged to set the megaflow mask bits
> > only if necessary, and to avoid the atomic operation on non-BFD
> > packets.
> > 
> > Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
> > +    /* Most packets are not CFM. */
> > +    if (OVS_LIKELY(ntohs(flow->dl_type) != ETH_TYPE_CFM)) {
> > +        return false;
> > +    }
> 
> The common recommendation is to htons the constant instead of ntohsing
> the variable, to generate better code.  Out of curiosity, I decided to
> find out whether this is still needed with modern compilers, so I
> built the following with GCC and Clang:
> 
>     #include <netinet/in.h>
>     #include <stdbool.h>
> 
>     bool
>     is_ip(unsigned short int x)
>     {
>         return ntohs(x) == 0x0800;
>     }
> 
>     bool
>     is_ip2(unsigned short int x)
>     {
>         return x == htons(0x0800);
>     }
> 
> With GCC 4.7.2:
> 
>     00000000 <is_ip>:
>        0:	0f b7 44 24 04       	movzwl 0x4(%esp),%eax
>        5:	66 c1 c8 08          	ror    $0x8,%ax
>        9:	66 3d 00 08          	cmp    $0x800,%ax
>        d:	0f 94 c0             	sete   %al
>       10:	c3                   	ret    
> 
>     00000020 <is_ip2>:
>       20:	66 83 7c 24 04 08    	cmpw   $0x8,0x4(%esp)
>       26:	0f 94 c0             	sete   %al
>       29:	c3                   	ret    
> 
> With Clang 3.5.0:
> 
>     00000000 <is_ip>:
>        0:	66 8b 44 24 04       	mov    0x4(%esp),%ax
>        5:	66 c1 c0 08          	rol    $0x8,%ax
>        9:	0f b7 c0             	movzwl %ax,%eax
>        c:	3d 00 08 00 00       	cmp    $0x800,%eax
>       11:	0f 94 c0             	sete   %al
>       14:	0f b6 c0             	movzbl %al,%eax
>       17:	c3                   	ret    
> 
>     00000020 <is_ip2>:
>       20:	0f b7 44 24 04       	movzwl 0x4(%esp),%eax
>       25:	83 f8 08             	cmp    $0x8,%eax
>       28:	0f 94 c0             	sete   %al
>       2b:	0f b6 c0             	movzbl %al,%eax
>       2e:	c3                   	ret    
> 
> so it's still worthwhile.  (I was surprised to see Clang generate
> worse code overall.  It seems to regard a bool as a 32-bit value, at
> least for the purpose of returning from a function.)
> 
> > +    atomic_read_relaxed(&cfm->check_tnl_key, &check_tnl_key);
> > +
> > +    /* Atomic read is always a memory access, do some work
> > +     * before we need the result. */
> 
> I believe an atomic_read_relaxed() is just a compiler barrier.  That
> means that unless cfm_should_process_flow() is getting inlined
> somewhere (I doubt it--it has no callers within cfm.c), there isn't
> any way that the memory access could be optimized out anyway.  So,
> this optimization and comment seems a little silly to me.

Oh, also:
Acked-by: Ben Pfaff <blp at nicira.com>
since none of this really matters.



More information about the dev mailing list