[ovs-dev] [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.
Ilya Maximets
i.maximets at ovn.org
Fri Jun 25 15:25:26 UTC 2021
On 6/25/21 3:56 PM, Eli Britstein wrote:
>
> On 6/25/2021 3:09 PM, Balazs Nemeth wrote:
>> *External email: Use caution opening links or attachments*
>>
>>
>> Ilya,
>>
>> Sure, I'll rebase my patch and also ensure netdev_is_flow_api_enabled()
>> is called once per batch. I don't expect any issue with that.
Thanks, Balazs! The series is applied now, so you can work on updated patch.
>
> Thanks Balazs.
>
>>
>> Regards,
>> Balazs
>>
>> On Fri, Jun 25, 2021 at 2:00 PM Ilya Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>> wrote:
>>
>> On 6/25/21 1:04 PM, Ferriter, Cian wrote:
>> > Hi Eli,
>> >
>> > I have some concerns about how we plug the packet state recover logic into dfc_processing() here. My concerns are inline below.
>> >
>> > I'm concerned that we are hurting performance in the partial HWOL case, as this patchset is introducing new direct (non-inline) function calls per packet to the software datapath. We can reduce performance impact in this area, see specific suggestions below.
>> >
>> > Thanks,
>> > Cian
>> >
>> >> -----Original Message-----
>> >> From: Eli Britstein <elibr at nvidia.com <mailto:elibr at nvidia.com>>
>> >> Sent: Wednesday 23 June 2021 16:53
>> >> To: dev at openvswitch.org <mailto:dev at openvswitch.org>; Ilya Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>>
>> >> Cc: Gaetan Rivet <gaetanr at nvidia.com <mailto:gaetanr at nvidia.com>>; Majd Dibbiny <majd at nvidia.com <mailto:majd at nvidia.com>>; Sriharsha Basavapatna
>> >> <sriharsha.basavapatna at broadcom.com <mailto:sriharsha.basavapatna at broadcom.com>>; Hemal Shah <hemal.shah at broadcom.com <mailto:hemal.shah at broadcom.com>>; Ivan Malov
>> >> <Ivan.Malov at oktetlabs.ru <mailto:Ivan.Malov at oktetlabs.ru>>; Ferriter, Cian <cian.ferriter at intel.com <mailto:cian.ferriter at intel.com>>; Eli Britstein <elibr at nvidia.com <mailto:elibr at nvidia.com>>;
>> >> Finn, Emma <emma.finn at intel.com <mailto:emma.finn at intel.com>>; Kovacevic, Marko <marko.kovacevic at intel.com <mailto:marko.kovacevic at intel.com>>
>> >> Subject: [PATCH V7 06/13] dpif-netdev: Add HW miss packet state recover logic.
>> >>
>> >> Recover the packet if it was partially processed by the HW. Fallback to
>> >> lookup flow by mark association.
>> >>
>> >> Signed-off-by: Eli Britstein <elibr at nvidia.com <mailto:elibr at nvidia.com>>
>> >> Reviewed-by: Gaetan Rivet <gaetanr at nvidia.com <mailto:gaetanr at nvidia.com>>
>> >> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com <mailto:sriharsha.basavapatna at broadcom.com>>
>> >> Tested-by: Emma Finn <emma.finn at intel.com <mailto:emma.finn at intel.com>>
>> >> Tested-by: Marko Kovacevic <marko.kovacevic at intel.com <mailto:marko.kovacevic at intel.com>>
>> >> Signed-off-by: Ilya Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>>
>> >> ---
>> >> lib/dpif-netdev.c | 45 +++++++++++++++++++++++++++++++++++++++++----
>> >> 1 file changed, 41 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> >> index 8fa7eb6d4..d5b7d5b73 100644
>> >> --- a/lib/dpif-netdev.c
>> >> +++ b/lib/dpif-netdev.c
>> >> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>> >> COVERAGE_DEFINE(datapath_drop_invalid_bond);
>> >> COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>> >> COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
>> >> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
>> >>
>> >> /* Protects against changes to 'dp_netdevs'. */
>> >> static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
>> >> @@ -7062,6 +7063,39 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
>> >> pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
>> >> }
>> >>
>> >> +static struct tx_port * pmd_send_port_cache_lookup(
>> >> + const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>> >> +
>> >> +static inline int
>> >> +dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>> >> + odp_port_t port_no,
>> >> + struct dp_packet *packet,
>> >> + struct dp_netdev_flow **flow)
>> >> +{
>> >> + struct tx_port *p;
>> >> + uint32_t mark;
>> >> +
>> >
>> >
>> > Start of full HWOL recovery block
>> >
>> >> + /* Restore the packet if HW processing was terminated before completion. */
>> >> + p = pmd_send_port_cache_lookup(pmd, port_no);
>> >> + if (OVS_LIKELY(p)) {
>> >> + int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
>> >> +
>> >> + if (err && err != EOPNOTSUPP) {
>> >> + COVERAGE_INC(datapath_drop_hw_miss_recover);
>> >> + return -1;
>> >> + }
>> >> + }
>> >
>> > End of full HWOL recovery block
>> >
>> > IIUC, the above is only relevant to full HWOL where the packet is partially processed by the HW. In a partial HWOL testcase, we see a performance drop as a result of the above code. The performance after the patchset is applied is 0.94x what the performance was before.
>>
> General speaking, adding new code in the datapath is probable to have some degradation affect, that cannot be avoided completely.
>
> I think performance optimizations for the partial offloads (or to SW datapath in general, even w/o offloads), can be done on top, like the one suggested by Balazs above, on top of it.
+1
Anyway, I applied the v7 as-is. Performance optimizations could be done on
top of it.
>
>
>> While reviewing the patch set I noticed this part too, but
>> this code was tested twice by Intel engineers, so I figured
>> that it doesn't hurt performance of partial offloading.
>>
>> In general, it should be easy to re-order partial and full
>> offloading checks like this (didn't test):
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index c5ab35d2a..36a5976f2 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -7113,6 +7113,14 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>> struct tx_port *p;
>> uint32_t mark;
>>
>> + /* If no mark, no flow to find. */
>> + if (dp_packet_has_flow_mark(packet, &mark)) {
>> + *flow = mark_to_flow_find(pmd, mark);
>> + return 0;
>> + }
>> +
>> + *flow = NULL;
>> +
>> /* Restore the packet if HW processing was terminated before completion. */
>> p = pmd_send_port_cache_lookup(pmd, port_no);
>> if (OVS_LIKELY(p)) {
>> @@ -7123,14 +7131,6 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>> return -1;
>> }
>> }
>> -
>> - /* If no mark, no flow to find. */
>> - if (!dp_packet_has_flow_mark(packet, &mark)) {
>> - *flow = NULL;
>> - return 0;
>> - }
>> -
>> - *flow = mark_to_flow_find(pmd, mark);
>> return 0;
>> }
>> ---
>>
>> If everyone agrees, I can do this change.
>>
> No, this is wrong.
>
> mlx5 PMD uses mark (internally set) for the recover. Doing it like this will discover a mark (the internal one by the PMD), but won't find any flow associated with it.
Oh, and netdev_offload_dpdk_hw_miss_packet_recover() resets offloads for
that reason, right?
Thanks for explanation. I think, it would be great if you can prepare
a separate patch with some comments in netdev_offload_dpdk_hw_miss_packet_recover()
regarding offloads reset and, maybe, a small comment to the dp_netdev_hw_flow()
so we will not re-order checks in the future (it's not obvious that there is a
dependency). Something like:
"Flow marks can be used for HW miss recovery and could be re-set in the
process, therefore flow mark check should always be done after the
netdev_hw_miss_packet_recover()."
What do you think?
>
>>
>> >
>> > We should branch over this code in the partial HWOL scenario, where we don't need to get the call to pmd_send_port_cache_lookup() and netdev_hw_miss_packet_recover(). We want this branch to be transparent to the user. Since both full and partial HWOL falls under the other_config:hw-offload=true switch, we might need a configure time check NIC capabilities solution or something similar to branch over full HWOL packet recovery code. Does this make sense?
>>
>> Compile time check of capabilities doesn't make sense as it's unknown
>> in vast majority of cases on which HW the package will run. Code should
>> be as generic as possible.
>>
>> >
>> >
>> > perf top showing cycles spent per function in my partial HWOL scenario. We can see netdev_hw_miss_packet_recover(), netdev_offload_dpdk_hw_miss_packet_recover() and netdev_is_flow_api_enabled() taking cycles:
>> > 28.79% pmd-c01/id:8 ovs-vswitchd [.] dp_netdev_input__
>> > 13.72% pmd-c01/id:8 ovs-vswitchd [.] parse_tcp_flags
>> > 11.07% pmd-c01/id:8 ovs-vswitchd [.] i40e_recv_pkts_vec_avx2
>> > 10.94% pmd-c01/id:8 ovs-vswitchd [.] i40e_xmit_fixed_burst_vec_avx2
>> > 7.48% pmd-c01/id:8 ovs-vswitchd [.] cmap_find
>> > 4.94% pmd-c01/id:8 ovs-vswitchd [.] netdev_hw_miss_packet_recover
>> > 4.77% pmd-c01/id:8 ovs-vswitchd [.] dp_execute_output_action
>> > 3.81% pmd-c01/id:8 ovs-vswitchd [.] dp_netdev_pmd_flush_output_on_port
>> > 3.40% pmd-c01/id:8 ovs-vswitchd [.] netdev_send
>> > 2.49% pmd-c01/id:8 ovs-vswitchd [.] netdev_dpdk_eth_send
>> > 1.16% pmd-c01/id:8 ovs-vswitchd [.] netdev_dpdk_rxq_recv
>> > 0.90% pmd-c01/id:8 ovs-vswitchd [.] pmd_perf_end_iteration
>> > 0.75% pmd-c01/id:8 ovs-vswitchd [.] dp_netdev_process_rxq_port
>> > 0.68% pmd-c01/id:8 ovs-vswitchd [.] netdev_is_flow_api_enabled
>> > 0.55% pmd-c01/id:8 ovs-vswitchd [.] netdev_offload_dpdk_hw_miss_packet_recover
>> >
>> >> +
>> >> + /* If no mark, no flow to find. */
>> >> + if (!dp_packet_has_flow_mark(packet, &mark)) {
>> >> + *flow = NULL;
>> >> + return 0;
>> >> + }
>> >> +
>> >> + *flow = mark_to_flow_find(pmd, mark);
>> >> + return 0;
>> >> +}
>> >> +
>> >> /* Try to process all ('cnt') the 'packets' using only the datapath flow cache
>> >> * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
>> >> * miniflow is copied into 'keys' and the packet pointer is moved at the
>> >> @@ -7106,7 +7140,6 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>> >>
>> >> DP_PACKET_BATCH_REFILL_FOR_EACH (i, cnt, packet, packets_) {
>> >> struct dp_netdev_flow *flow;
>> >> - uint32_t mark;
>> >>
>> >> if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) {
>> >> dp_packet_delete(packet);
>> >> @@ -7125,9 +7158,13 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>> >> pkt_metadata_init(&packet->md, port_no);
>> >> }
>> >>
>> >> - if ((*recirc_depth_get() == 0) &&
>> >> - dp_packet_has_flow_mark(packet, &mark)) {
>> >> - flow = mark_to_flow_find(pmd, mark);
>> >> + if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) {
>> >
>> > Here we have a per packet call to netdev_is_flow_api_enabled(). I think that netdev_is_flow_api_enabled() should be inlined if it's going to be called per packet. We can see from the above "perf top" that it isn't inlined since it shows up as a separate function.
>>
>> I'd consider "inlining" and moving a lot of stuff to headers harmful
>> for the code base as it makes it less readable and it's really hard
>> to preserve this kind of things during code modification.
>> It's much better to fix the logic instead of hammering the code with
>> blind low level optimizations.
>>
>> For this particular issue we already have a solution here:
>> https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnemeth@redhat.com/ <https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnemeth@redhat.com/>
>> In short, we only need to check once per batch. I think, Balazs
>> will be able to re-base his patch on top of this series including
>> the check for netdev_is_flow_api_enabled().
>>
>> Balazs, will you?
>>
>> Best regards, Ilya Maximets.
>>
More information about the dev
mailing list