[ovs-dev] [PATCH 5/7] dpif-netdev: Use memcpy() to initialize pkt_metadata.

Pravin Shelar pshelar at nicira.com
Fri May 15 19:50:54 UTC 2015


On Thu, Apr 23, 2015 at 11:40 AM, Daniele Di Proietto
<diproiettod at vmware.com> wrote:
> Initializing the dp_packet's metadata can be a hot spot, especially
> for very simple pipelines.  Therefore improving the code here can
> sometimes make a difference.
>
> Using memcpy instead of a plain assignment helps GCC and clang generate
> faster code. Here's a comparison of the compiler generated code (GCC 4.8)
> with or without this commit.
>
> BEFORE (assignment)                 |     AFTER(memcpy)
>
> c8:  add    $0x8,%r8                |   d8:  mov    (%rsi),%r8
>      mov    (%rcx),%r9              |        mov    (%rdx),%rdi
>      mov    (%rbx),%r11d            |        add    $0x1,%ecx
>      mov    %r10,%rcx               |        add    $0x8,%rsi
>      cmp    %rsi,%r8                |        cmp    -0x870(%rbp),%ecx
>      lea    0x88(%r9),%rdi          |        mov    %rdi,0x88(%r8)
>      rep    stos %rax,%es:(%rdi)    |        mov    0x8(%rdx),%rdi
>      mov    %r11d,0xb8(%r9)         |        lea    0x88(%r8),%rax
>      mov    %r8,%rcx                |        mov    %rdi,0x90(%r8)
>      jne    c8                      |        mov    0x10(%rdx),%rdi
>                                     |        mov    %rdi,0x98(%r8)
>                                     |        mov    0x18(%rdx),%rdi
>                                     |        mov    %rdi,0xa0(%r8)
>                                     |        mov    0x20(%rdx),%r8
>                                     |        mov    %r8,0x20(%rax)
>                                     |        mov    0x28(%rdx),%r8
>                                     |        mov    %r8,0x28(%rax)
>                                     |        mov    0x30(%rdx),%r8
>                                     |        mov    %r8,0x30(%rax)
>                                     |        jl     d8
>
> The old code uses a 'rep stos' and fetches the 'port_no' value from
> the 'port' member at every iteration ('mov (%rbx),%r11d'), while the
> new code uses a series of mov operation to accomplish everything.
>
> I can measure a through improvement of ~7% on a single flow phy-phy test
> with 64 bytes UDP packets.
>
> The improvement has been observed on an Intel Xeon Sandy Bridge (2012)
> and on an Intel Xeon Westmere (2010).
>
> Signed-off-by: Daniele Di Proietto <diproiettod at vmware.com>
> ---
>  lib/dpif-netdev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index f1d65f5..7d55997 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -2507,13 +2507,16 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
>      error = netdev_rxq_recv(rxq, packets, &cnt);
>      cycles_count_end(pmd, PMD_CYCLES_POLLING);
>      if (!error) {
> +        const struct pkt_metadata md = PKT_METADATA_INITIALIZER(port->port_no);
This change looks good. But I think we can improve it even more by
replacing port->port_no with pkt_metadata. So that we do not need to
initialize this structure on even packet receive.



More information about the dev mailing list