[ovs-dev] [PATCH 10/18] lib/cfm: Use relaxed atomics and optimize cfm_should_process_flow().
Ben Pfaff
blp at nicira.com
Thu Aug 28 18:28:41 UTC 2014
On Fri, Aug 22, 2014 at 01:58:21PM -0700, Jarno Rajahalme wrote:
> The atomics here do not synchronize the state of any other variables,
> so we can use relaxed atomics.
>
> cfm_should_process_flow() is rearranged to set the megaflow mask bits
> only if necessary, and to avoid the atomic operation on non-BFD
> packets.
>
> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
> + /* Most packets are not CFM. */
> + if (OVS_LIKELY(ntohs(flow->dl_type) != ETH_TYPE_CFM)) {
> + return false;
> + }
The common recommendation is to htons the constant instead of ntohsing
the variable, to generate better code. Out of curiosity, I decided to
find out whether this is still needed with modern compilers, so I
built the following with GCC and Clang:
#include <netinet/in.h>
#include <stdbool.h>
bool
is_ip(unsigned short int x)
{
return ntohs(x) == 0x0800;
}
bool
is_ip2(unsigned short int x)
{
return x == htons(0x0800);
}
With GCC 4.7.2:
00000000 <is_ip>:
0: 0f b7 44 24 04 movzwl 0x4(%esp),%eax
5: 66 c1 c8 08 ror $0x8,%ax
9: 66 3d 00 08 cmp $0x800,%ax
d: 0f 94 c0 sete %al
10: c3 ret
00000020 <is_ip2>:
20: 66 83 7c 24 04 08 cmpw $0x8,0x4(%esp)
26: 0f 94 c0 sete %al
29: c3 ret
With Clang 3.5.0:
00000000 <is_ip>:
0: 66 8b 44 24 04 mov 0x4(%esp),%ax
5: 66 c1 c0 08 rol $0x8,%ax
9: 0f b7 c0 movzwl %ax,%eax
c: 3d 00 08 00 00 cmp $0x800,%eax
11: 0f 94 c0 sete %al
14: 0f b6 c0 movzbl %al,%eax
17: c3 ret
00000020 <is_ip2>:
20: 0f b7 44 24 04 movzwl 0x4(%esp),%eax
25: 83 f8 08 cmp $0x8,%eax
28: 0f 94 c0 sete %al
2b: 0f b6 c0 movzbl %al,%eax
2e: c3 ret
so it's still worthwhile. (I was surprised to see Clang generate
worse code overall. It seems to regard a bool as a 32-bit value, at
least for the purpose of returning from a function.)
> + atomic_read_relaxed(&cfm->check_tnl_key, &check_tnl_key);
> +
> + /* Atomic read is always a memory access, do some work
> + * before we need the result. */
I believe an atomic_read_relaxed() is just a compiler barrier. That
means that unless cfm_should_process_flow() is getting inlined
somewhere (I doubt it--it has no callers within cfm.c), there isn't
any way that the memory access could be optimized out anyway. So,
this optimization and comment seems a little silly to me.
More information about the dev
mailing list