[ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.

Eli Britstein elibr at nvidia.com
Fri Jun 25 13:56:55 UTC 2021


On 6/25/2021 3:09 PM, Balazs Nemeth wrote:
> *External email: Use caution opening links or attachments*
>
>
> Ilya,
>
> Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled()
> is called once per batch. I don't expect any issue with that.

Thanks Balazs.

>
> Regards,
> Balazs
>
> On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets <i.maximets at ovn.org 
> <mailto:i.maximets at ovn.org>> wrote:
>
>     On 6/25/21 1:04 PM, Ferriter, Cian wrote:
>     > Hi Eli,
>     >
>     > I have some concerns about how we plug the packet state recover
>     logic into dfc_processing() here. My concerns are inline below.
>     >
>     > I'm concerned that we are hurting performance in the partial
>     HWOL case, as this patchset is introducing new direct (non-inline)
>     function calls per packet to the software datapath. We can reduce
>     performance impact in this area, see specific suggestions below.
>     >
>     > Thanks,
>     > Cian
>     >
>     >> -----Original Message-----
>     >> From: Eli Britstein <elibr at nvidia.com <mailto:elibr at nvidia.com>>
>     >> Sent: Wednesday 23 June 2021 16:53
>     >> To: dev at openvswitch.org <mailto:dev at openvswitch.org>; Ilya
>     Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>>
>     >> Cc: Gaetan Rivet <gaetanr at nvidia.com
>     <mailto:gaetanr at nvidia.com>>; Majd Dibbiny <majd at nvidia.com
>     <mailto:majd at nvidia.com>>; Sriharsha Basavapatna
>     >> <sriharsha.basavapatna at broadcom.com
>     <mailto:sriharsha.basavapatna at broadcom.com>>; Hemal Shah
>     <hemal.shah at broadcom.com <mailto:hemal.shah at broadcom.com>>; Ivan Malov
>     >> <Ivan.Malov at oktetlabs.ru <mailto:Ivan.Malov at oktetlabs.ru>>;
>     Ferriter, Cian <cian.ferriter at intel.com
>     <mailto:cian.ferriter at intel.com>>; Eli Britstein <elibr at nvidia.com
>     <mailto:elibr at nvidia.com>>;
>     >> Finn, Emma <emma.finn at intel.com <mailto:emma.finn at intel.com>>;
>     Kovacevic, Marko <marko.kovacevic at intel.com
>     <mailto:marko.kovacevic at intel.com>>
>     >> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state
>     recover logic.
>     >>
>     >> Recover the packet if it was partially processed by the HW.
>     Fallback to
>     >> lookup flow by mark association.
>     >>
>     >> Signed-off-by: Eli Britstein <elibr at nvidia.com
>     <mailto:elibr at nvidia.com>>
>     >> Reviewed-by: Gaetan Rivet <gaetanr at nvidia.com
>     <mailto:gaetanr at nvidia.com>>
>     >> Acked-by: Sriharsha Basavapatna
>     <sriharsha.basavapatna at broadcom.com
>     <mailto:sriharsha.basavapatna at broadcom.com>>
>     >> Tested-by: Emma Finn <emma.finn at intel.com
>     <mailto:emma.finn at intel.com>>
>     >> Tested-by: Marko Kovacevic <marko.kovacevic at intel.com
>     <mailto:marko.kovacevic at intel.com>>
>     >> Signed-off-by: Ilya Maximets <i.maximets at ovn.org
>     <mailto:i.maximets at ovn.org>>
>     >> ---
>     >>  lib/dpif-netdev.c | 45
>     +++++++++++++++++++++++++++++++++++++++++----
>     >>  1 file changed, 41 insertions(+), 4 deletions(-)
>     >>
>     >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     >> index 8fa7eb6d4..d5b7d5b73 100644
>     >> --- a/lib/dpif-netdev.c
>     >> +++ b/lib/dpif-netdev.c
>     >> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>     >>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
>     >>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>     >>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
>     >> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
>     >>
>     >>  /* Protects against changes to 'dp_netdevs'. */
>     >>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
>     >> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct
>     dp_netdev_pmd_thread *pmd,
>     >> pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT,
>     n_smc_hit);
>     >>  }
>     >>
>     >> +static struct tx_port * pmd_send_port_cache_lookup(
>     >> +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>     >> +
>     >> +static inline int
>     >> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>     >> +                  odp_port_t port_no,
>     >> +                  struct dp_packet *packet,
>     >> +                  struct dp_netdev_flow **flow)
>     >> +{
>     >> +    struct tx_port *p;
>     >> +    uint32_t mark;
>     >> +
>     >
>     >
>     > Start of full HWOL recovery block
>     >
>     >> +    /* Restore the packet if HW processing was terminated
>     before completion. */
>     >> +    p = pmd_send_port_cache_lookup(pmd, port_no);
>     >> +    if (OVS_LIKELY(p)) {
>     >> +        int err =
>     netdev_hw_miss_packet_recover(p->port->netdev, packet);
>     >> +
>     >> +        if (err && err != EOPNOTSUPP) {
>     >> + COVERAGE_INC(datapath_drop_hw_miss_recover);
>     >> +            return -1;
>     >> +        }
>     >> +    }
>     >
>     > End of full HWOL recovery block
>     >
>     > IIUC, the above is only relevant to full HWOL where the packet
>     is partially processed by the HW. In a partial HWOL testcase, we
>     see a performance drop as a result of the above code. The
>     performance after the patchset is applied is 0.94x what the
>     performance was before.
>
General speaking, adding new code in the datapath is probable to have 
some degradation affect, that cannot be avoided completely.

I think performance optimizations for the partial offloads (or to SW 
datapath in general, even w/o offloads), can be done on top, like the 
one suggested by Balazs above, on top of it.


>     While reviewing the patch set I noticed this part too, but
>     this code was tested twice by Intel engineers, so I figured
>     that it doesn't hurt performance of partial offloading.
>
>     In general, it should be easy to re-order partial and full
>     offloading checks like this (didn't test):
>
>     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     index c5ab35d2a..36a5976f2 100644
>     --- a/lib/dpif-netdev.c
>     +++ b/lib/dpif-netdev.c
>     @@ -7113,6 +7113,14 @@ dp_netdev_hw_flow(const struct
>     dp_netdev_pmd_thread *pmd,
>          struct tx_port *p;
>          uint32_t mark;
>
>     +    /* If no mark, no flow to find. */
>     +    if (dp_packet_has_flow_mark(packet, &mark)) {
>     +        *flow = mark_to_flow_find(pmd, mark);
>     +        return 0;
>     +    }
>     +
>     +    *flow = NULL;
>     +
>          /* Restore the packet if HW processing was terminated before
>     completion. */
>          p = pmd_send_port_cache_lookup(pmd, port_no);
>          if (OVS_LIKELY(p)) {
>     @@ -7123,14 +7131,6 @@ dp_netdev_hw_flow(const struct
>     dp_netdev_pmd_thread *pmd,
>                  return -1;
>              }
>          }
>     -
>     -    /* If no mark, no flow to find. */
>     -    if (!dp_packet_has_flow_mark(packet, &mark)) {
>     -        *flow = NULL;
>     -        return 0;
>     -    }
>     -
>     -    *flow = mark_to_flow_find(pmd, mark);
>          return 0;
>      }
>     ---
>
>     If everyone agrees, I can do this change.
>
No, this is wrong.

mlx5 PMD uses mark (internally set) for the recover. Doing it like this 
will discover a mark (the internal one by the PMD), but won't find any 
flow associated with it.

>
>     >
>     > We should branch over this code in the partial HWOL scenario,
>     where we don't need to get the call to
>     pmd_send_port_cache_lookup() and netdev_hw_miss_packet_recover().
>     We want this branch to be transparent to the user. Since both full
>     and partial HWOL falls under the other_config:hw-offload=true
>     switch, we might need a configure time check NIC capabilities
>     solution or something similar to branch over full HWOL packet
>     recovery code. Does this make sense?
>
>     Compile time check of capabilities doesn't make sense as it's unknown
>     in vast majority of cases on which HW the package will run. Code
>     should
>     be as generic as possible.
>
>     >
>     >
>     > perf top showing cycles spent per function in my partial HWOL
>     scenario. We can see netdev_hw_miss_packet_recover(),
>     netdev_offload_dpdk_hw_miss_packet_recover() and
>     netdev_is_flow_api_enabled() taking cycles:
>     >     28.79%  pmd-c01/id:8  ovs-vswitchd        [.] dp_netdev_input__
>     >     13.72%  pmd-c01/id:8  ovs-vswitchd        [.] parse_tcp_flags
>     >     11.07%  pmd-c01/id:8  ovs-vswitchd        [.]
>     i40e_recv_pkts_vec_avx2
>     >     10.94%  pmd-c01/id:8  ovs-vswitchd        [.]
>     i40e_xmit_fixed_burst_vec_avx2
>     >      7.48%  pmd-c01/id:8  ovs-vswitchd        [.] cmap_find
>     >      4.94%  pmd-c01/id:8  ovs-vswitchd        [.]
>     netdev_hw_miss_packet_recover
>     >      4.77%  pmd-c01/id:8  ovs-vswitchd        [.]
>     dp_execute_output_action
>     >      3.81%  pmd-c01/id:8  ovs-vswitchd        [.]
>     dp_netdev_pmd_flush_output_on_port
>     >      3.40%  pmd-c01/id:8  ovs-vswitchd        [.] netdev_send
>     >      2.49%  pmd-c01/id:8  ovs-vswitchd        [.]
>     netdev_dpdk_eth_send
>     >      1.16%  pmd-c01/id:8  ovs-vswitchd        [.]
>     netdev_dpdk_rxq_recv
>     >      0.90%  pmd-c01/id:8  ovs-vswitchd        [.]
>     pmd_perf_end_iteration
>     >      0.75%  pmd-c01/id:8  ovs-vswitchd        [.]
>     dp_netdev_process_rxq_port
>     >      0.68%  pmd-c01/id:8  ovs-vswitchd        [.]
>     netdev_is_flow_api_enabled
>     >      0.55%  pmd-c01/id:8  ovs-vswitchd        [.]
>     netdev_offload_dpdk_hw_miss_packet_recover
>     >
>     >> +
>     >> +    /* If no mark, no flow to find. */
>     >> +    if (!dp_packet_has_flow_mark(packet, &mark)) {
>     >> +        *flow = NULL;
>     >> +        return 0;
>     >> +    }
>     >> +
>     >> +    *flow = mark_to_flow_find(pmd, mark);
>     >> +    return 0;
>     >> +}
>     >> +
>     >>  /* Try to process all ('cnt') the 'packets' using only the
>     datapath flow cache
>     >>   * 'pmd->flow_cache'. If a flow is not found for a packet
>     'packets[i]', the
>     >>   * miniflow is copied into 'keys' and the packet pointer is
>     moved at the
>     >> @@ -7106,7 +7140,6 @@ dfc_processing(struct
>     dp_netdev_pmd_thread *pmd,
>     >>
>     >>      DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
>     >>          struct dp_netdev_flow *flow;
>     >> -        uint32_t mark;
>     >>
>     >>          if (OVS_UNLIKELY(dp_packet_size(packet) <
>     ETH_HEADER_LEN)) {
>     >>              dp_packet_delete(packet);
>     >> @@ -7125,9 +7158,13 @@ dfc_processing(struct
>     dp_netdev_pmd_thread *pmd,
>     >>              pkt_metadata_init(&packet->md, port_no);
>     >>          }
>     >>
>     >> -        if ((*recirc_depth_get() == 0) &&
>     >> -            dp_packet_has_flow_mark(packet, &mark)) {
>     >> -            flow = mark_to_flow_find(pmd, mark);
>     >> +        if (netdev_is_flow_api_enabled() &&
>     *recirc_depth_get() == 0) {
>     >
>     > Here we have a per packet call to netdev_is_flow_api_enabled().
>     I think that netdev_is_flow_api_enabled() should be inlined if
>     it's going to be called per packet. We can see from the above
>     "perf top" that it isn't inlined since it shows up as a separate
>     function.
>
>     I'd consider "inlining" and moving a lot of stuff to headers harmful
>     for the code base as it makes it less readable and it's really hard
>     to preserve this kind of things during code modification.
>     It's much better to fix the logic instead of hammering the code with
>     blind low level optimizations.
>
>     For this particular issue we already have a solution here:
>     https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnemeth@redhat.com/
>     <https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnemeth@redhat.com/>
>     In short, we only need to check once per batch.  I think, Balazs
>     will be able to re-base his patch on top of this series including
>     the check for netdev_is_flow_api_enabled().
>
>     Balazs, will you?
>
>     Best regards, Ilya Maximets.
>


More information about the dev mailing list