[ovs-dev] [v12 15/16] dpif-netdev: Optimize dp output action.

Stokes, Ian ian.stokes at intel.com
Tue Jun 15 17:04:19 UTC 2021


> From: Harry van Haaren <harry.van.haaren at intel.com>
> 
> This commit optimizes the output action, by enabling the compiler to
> optimize the code better through reducing code complexity.
> 
> The core concept of this optimization is that the array-length checks
> have already been performed above the copying code, so can be removed.
> Removing of the per-packet length checks allows the compiler to auto-vectorize
> the stores using SIMD registers.
> 
> Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> 

Thanks for the patch Cian/Harry.

Comments inline.
> ---
> 
> v8: Add NEWS entry.
> ---
>  NEWS              |  1 +
>  lib/dpif-netdev.c | 23 ++++++++++++++++++-----
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index d04dac746..a7ec34af1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,6 +17,7 @@ Post-v2.15.0
>       * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if
> the
>         CPU supports it. This enhances performance by using the native vpopcount
>         instructions, instead of the emulated version of vpopcount.
> +     * Optimize dp_netdev_output by enhancing compiler optimization potential.
>     - ovs-ctl:
>       * New option '--no-record-hostname' to disable hostname configuration
>         in ovsdb on startup.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 04a4f71cb..f46269ae3 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7290,12 +7290,25 @@ dp_execute_output_action(struct
> dp_netdev_pmd_thread *pmd,
>          pmd->n_output_batches++;
>      }
> 
> -    struct dp_packet *packet;
> -    DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
> -        p->output_pkts_rxqs[dp_packet_batch_size(&p->output_pkts)] =
> -            pmd->ctx.last_rxq;
> -        dp_packet_batch_add(&p->output_pkts, packet);
> +    /* The above checks ensure that there is enough space in the output batch.
> +     * Using dp_packet_batch_add() has a branch to check if the batch is full.
> +     * This branch reduces the compiler's ability to optimize efficiently. The
> +     * below code implements packet movement between batches without
> checks,
> +     * with the required semantics of output batch perhaps containing packets.
> +     */

What is the performance gain recorded with this approach with compiler optimizations?

> +    int batch_size = dp_packet_batch_size(packets_);
> +    int out_batch_idx = dp_packet_batch_size(&p->output_pkts);
> +    struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq;
> +    struct dp_packet_batch *output_batch = &p->output_pkts;
> +
> +    for (int i = 0; i < batch_size; i++) {
> +        struct dp_packet *packet = packets_->packets[i];
> +        p->output_pkts_rxqs[out_batch_idx] = rxq;
> +        output_batch->packets[out_batch_idx] = packet;
> +        out_batch_idx++;
>      }
> +    output_batch->count += batch_size;

So I guess the counter argument here is that the previous approach uses the dp_packet standard functions.

There has been work over time to ensure that dp_packets are handled in a similar way using the dp_packet api.

This I think reverses the position. That's why I'd like to get a sense of the performance gain as I think it's important to justify breaking from code conformance.

Has there been thought to modifying the dp_packet_batch_add ?

Regards
Ian
> +
>      return true;
>  }
> 
> --
> 2.31.1
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list