[ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

Balazs Nemeth bnemeth at redhat.com
Fri Jun 25 12:09:26 UTC 2021


Ilya,

Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled()
is called once per batch. I don't expect any issue with that.

Regards,
Balazs

On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets <i.maximets at ovn.org> wrote:

> On 6/25/21 1:04 PM, Ferriter, Cian wrote:
> > Hi Eli,
> >
> > I have some concerns about how we plug the packet state recover logic
> into dfc_processing() here. My concerns are inline below.
> >
> > I'm concerned that we are hurting performance in the partial HWOL case,
> as this patchset is introducing new direct (non-inline) function calls per
> packet to the software datapath. We can reduce performance impact in this
> area, see specific suggestions below.
> >
> > Thanks,
> > Cian
> >
> >> -----Original Message-----
> >> From: Eli Britstein <elibr at nvidia.com>
> >> Sent: Wednesday 23 June 2021 16:53
> >> To: dev at openvswitch.org; Ilya Maximets <i.maximets at ovn.org>
> >> Cc: Gaetan Rivet <gaetanr at nvidia.com>; Majd Dibbiny <majd at nvidia.com>;
> Sriharsha Basavapatna
> >> <sriharsha.basavapatna at broadcom.com>; Hemal Shah <
> hemal.shah at broadcom.com>; Ivan Malov
> >> <Ivan.Malov at oktetlabs.ru>; Ferriter, Cian <cian.ferriter at intel.com>;
> Eli Britstein <elibr at nvidia.com>;
> >> Finn, Emma <emma.finn at intel.com>; Kovacevic, Marko <
> marko.kovacevic at intel.com>
> >> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover
> logic.
> >>
> >> Recover the packet if it was partially processed by the HW. Fallback to
> >> lookup flow by mark association.
> >>
> >> Signed-off-by: Eli Britstein <elibr at nvidia.com>
> >> Reviewed-by: Gaetan Rivet <gaetanr at nvidia.com>
> >> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>
> >> Tested-by: Emma Finn <emma.finn at intel.com>
> >> Tested-by: Marko Kovacevic <marko.kovacevic at intel.com>
> >> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> >> ---
> >>  lib/dpif-netdev.c | 45 +++++++++++++++++++++++++++++++++++++++++----
> >>  1 file changed, 41 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index 8fa7eb6d4..d5b7d5b73 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
> >>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
> >>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
> >>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
> >> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
> >>
> >>  /* Protects against changes to 'dp_netdevs'. */
> >>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
> >> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread
> *pmd,
> >>      pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT,
> n_smc_hit);
> >>  }
> >>
> >> +static struct tx_port * pmd_send_port_cache_lookup(
> >> +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
> >> +
> >> +static inline int
> >> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> >> +                  odp_port_t port_no,
> >> +                  struct dp_packet *packet,
> >> +                  struct dp_netdev_flow **flow)
> >> +{
> >> +    struct tx_port *p;
> >> +    uint32_t mark;
> >> +
> >
> >
> > Start of full HWOL recovery block
> >
> >> +    /* Restore the packet if HW processing was terminated before
> completion. */
> >> +    p = pmd_send_port_cache_lookup(pmd, port_no);
> >> +    if (OVS_LIKELY(p)) {
> >> +        int err = netdev_hw_miss_packet_recover(p->port->netdev,
> packet);
> >> +
> >> +        if (err && err != EOPNOTSUPP) {
> >> +            COVERAGE_INC(datapath_drop_hw_miss_recover);
> >> +            return -1;
> >> +        }
> >> +    }
> >
> > End of full HWOL recovery block
> >
> > IIUC, the above is only relevant to full HWOL where the packet is
> partially processed by the HW. In a partial HWOL testcase, we see a
> performance drop as a result of the above code. The performance after the
> patchset is applied is 0.94x what the performance was before.
>
> While reviewing the patch set I noticed this part too, but
> this code was tested twice by Intel engineers, so I figured
> that it doesn't hurt performance of partial offloading.
>
> In general, it should be easy to re-order partial and full
> offloading checks like this (didn't test):
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index c5ab35d2a..36a5976f2 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7113,6 +7113,14 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread
> *pmd,
>      struct tx_port *p;
>      uint32_t mark;
>
> +    /* If no mark, no flow to find. */
> +    if (dp_packet_has_flow_mark(packet, &mark)) {
> +        *flow = mark_to_flow_find(pmd, mark);
> +        return 0;
> +    }
> +
> +    *flow = NULL;
> +
>      /* Restore the packet if HW processing was terminated before
> completion. */
>      p = pmd_send_port_cache_lookup(pmd, port_no);
>      if (OVS_LIKELY(p)) {
> @@ -7123,14 +7131,6 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread
> *pmd,
>              return -1;
>          }
>      }
> -
> -    /* If no mark, no flow to find. */
> -    if (!dp_packet_has_flow_mark(packet, &mark)) {
> -        *flow = NULL;
> -        return 0;
> -    }
> -
> -    *flow = mark_to_flow_find(pmd, mark);
>      return 0;
>  }
> ---
>
> If everyone agrees, I can do this change.
>
> >
> > We should branch over this code in the partial HWOL scenario, where we
> don't need to get the call to pmd_send_port_cache_lookup() and
> netdev_hw_miss_packet_recover(). We want this branch to be transparent to
> the user. Since both full and partial HWOL falls under the
> other_config:hw-offload=true switch, we might need a configure time check
> NIC capabilities solution or something similar to branch over full HWOL
> packet recovery code. Does this make sense?
>
> Compile time check of capabilities doesn't make sense as it's unknown
> in vast majority of cases on which HW the package will run.  Code should
> be as generic as possible.
>
> >
> >
> > perf top showing cycles spent per function in my partial HWOL scenario.
> We can see netdev_hw_miss_packet_recover(),
> netdev_offload_dpdk_hw_miss_packet_recover() and
> netdev_is_flow_api_enabled() taking cycles:
> >     28.79%  pmd-c01/id:8  ovs-vswitchd        [.] dp_netdev_input__
> >     13.72%  pmd-c01/id:8  ovs-vswitchd        [.] parse_tcp_flags
> >     11.07%  pmd-c01/id:8  ovs-vswitchd        [.] i40e_recv_pkts_vec_avx2
> >     10.94%  pmd-c01/id:8  ovs-vswitchd        [.]
> i40e_xmit_fixed_burst_vec_avx2
> >      7.48%  pmd-c01/id:8  ovs-vswitchd        [.] cmap_find
> >      4.94%  pmd-c01/id:8  ovs-vswitchd        [.]
> netdev_hw_miss_packet_recover
> >      4.77%  pmd-c01/id:8  ovs-vswitchd        [.]
> dp_execute_output_action
> >      3.81%  pmd-c01/id:8  ovs-vswitchd        [.]
> dp_netdev_pmd_flush_output_on_port
> >      3.40%  pmd-c01/id:8  ovs-vswitchd        [.] netdev_send
> >      2.49%  pmd-c01/id:8  ovs-vswitchd        [.] netdev_dpdk_eth_send
> >      1.16%  pmd-c01/id:8  ovs-vswitchd        [.] netdev_dpdk_rxq_recv
> >      0.90%  pmd-c01/id:8  ovs-vswitchd        [.] pmd_perf_end_iteration
> >      0.75%  pmd-c01/id:8  ovs-vswitchd        [.]
> dp_netdev_process_rxq_port
> >      0.68%  pmd-c01/id:8  ovs-vswitchd        [.]
> netdev_is_flow_api_enabled
> >      0.55%  pmd-c01/id:8  ovs-vswitchd        [.]
> netdev_offload_dpdk_hw_miss_packet_recover
> >
> >> +
> >> +    /* If no mark, no flow to find. */
> >> +    if (!dp_packet_has_flow_mark(packet, &mark)) {
> >> +        *flow = NULL;
> >> +        return 0;
> >> +    }
> >> +
> >> +    *flow = mark_to_flow_find(pmd, mark);
> >> +    return 0;
> >> +}
> >> +
> >>  /* Try to process all ('cnt') the 'packets' using only the datapath
> flow cache
> >>   * 'pmd->flow_cache'. If a flow is not found for a packet
> 'packets[i]', the
> >>   * miniflow is copied into 'keys' and the packet pointer is moved at
> the
> >> @@ -7106,7 +7140,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
> >>
> >>      DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
> >>          struct dp_netdev_flow *flow;
> >> -        uint32_t mark;
> >>
> >>          if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
> >>              dp_packet_delete(packet);
> >> @@ -7125,9 +7158,13 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
> >>              pkt_metadata_init(&packet->md, port_no);
> >>          }
> >>
> >> -        if ((*recirc_depth_get() == 0) &&
> >> -            dp_packet_has_flow_mark(packet, &mark)) {
> >> -            flow = mark_to_flow_find(pmd, mark);
> >> +        if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) {
> >
> > Here we have a per packet call to netdev_is_flow_api_enabled(). I think
> that netdev_is_flow_api_enabled() should be inlined if it's going to be
> called per packet. We can see from the above "perf top" that it isn't
> inlined since it shows up as a separate function.
>
> I'd consider "inlining" and moving a lot of stuff to headers harmful
> for the code base as it makes it less readable and it's really hard
> to preserve this kind of things during code modification.
> It's much better to fix the logic instead of hammering the code with
> blind low level optimizations.
>
> For this particular issue we already have a solution here:
>
> https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnemeth@redhat.com/
> In short, we only need to check once per batch.  I think, Balazs
> will be able to re-base his patch on top of this series including
> the check for netdev_is_flow_api_enabled().
>
> Balazs, will you?
>
> Best regards, Ilya Maximets.
>
>


More information about the dev mailing list