[ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

Ilya Maximets i.maximets at ovn.org
Mon Jun 28 14:33:03 UTC 2021


On 6/25/21 7:28 PM, Van Haaren, Harry wrote:
>> -----Original Message-----
>> From: dev <ovs-dev-bounces at openvswitch.org> On Behalf Of Ilya Maximets
>> Sent: Friday, June 25, 2021 4:26 PM
>> To: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>; Ilya Maximets
>> <i.maximets at ovn.org>
>> Cc: Eli Britstein <elibr at nvidia.com>; ovs dev <dev at openvswitch.org>; Ivan Malov
>> <Ivan.Malov at oktetlabs.ru>; Majd Dibbiny <majd at nvidia.com>
>> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> 
> <snip commit message detail>
> 
>>>> That looks good to me.  So, I guess, Harsha, we're waiting for
>>>> your review/tests here.
>>>
>>> Thanks Ilya and Eli, looks good to me; I've also tested it and it works fine.
>>> -Harsha
>>
>> Thanks, everyone.  Applied to master.
> 
> Hi Ilya and OVS Community,
> 
> There are open questions around this patchset, why has it been merged?
> 
> Earlier today, new concerns were raised by Cian around the negative performance impact of these code changes:
> - https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384445.html
> 
> Both you (Ilya) and Eli responded, and I was following the conversation. Various code changes were suggested,
> and some may seem like they might work, Eli mentioned some solutions might not work due to the hardware:
> I was processing both your comments and input, and planning a technical reply later today.
> - suggestions: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384446.html
> - concerns around hw: https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384464.html

Concerns not really about the hardware, but the API itself
that should be clarified a little bit to avoid confusion and
avoid incorrect changes like the one I suggested.
But this is a small enhancement that could be done on top.

> 
> Keep in mind that there are open performance issues to be worked out, that have not been resolved at this point in the conversation.

Performance issue that can be worked out, will be worked out
in a separate patch , v1 for which we already have on a mailing
list for some time, so it didn't make sense to to re-validate
the whole series again due to this one pretty obvious change.

> There is no agreement on solutions, nor an agreement to ignore the performance degradation, or to try resolve this degradation later.

Particular part of the packet restoration call seems hard
to avoid in a long term (I don't see a good solution for that),
but the short term solution might be implemented on top.
The part with multiple reads of recirc_id and checking if
offloading is enabled has a fix already (that needs a v2, but
anyway).

> 
> That these patches have been merged is inappropriate:
> 1) Not enough time given for responses (11 am concerns raised, 5pm merged without resolution? (Irish timezone))

I responded with suggestions and arguments against solutions
suggested in the report, Eli responded with rejection of one
one of my suggestions.  And it seems clear (for me) that
there is no good solution for this part at the moment.
Part of the performance could be won back, but the rest
seems to be inevitable.  As a short-term solution we can
guard the netdev_hw_miss_packet_recover() with experimental
API ifdef, but it will strike back anyway in the future.

> 2) Open question not addressed/resolved, resulting in a 6% known negative performance impact being merged.

I don't think it wasn't addressed.

> 3) Suggestions provided were not reviewed technically in detail (no technical collaboration or code-changes/patches reviewed)

Patches was heavily reviewed/tested by at least 4 different
parties including 2 test rounds from Intel engineers that,
I believe, included testing of partial offloading.  And that
bothers me the most.  If I can not trust performance test
reports, I'm not sure performance can be a gating factor here.

> 
> I feel that the OVS process of allowing time for community review and collaboration was not adhered to in this instance.
> As a result, code was merged that is known to cause performance degradation.
> 
> Therefore, this email is a request to revert these patches as they are not currently fit for inclusion in my opinion.
> 
> As next steps, I can propose the following:
> 1) Revert the patches from master branch
> 2) Continue technical discussion on how to avoid negative performance impact
> 3) Review solutions, allowing time for responses and replies
> 4) Merge a future revision of this patchset at a later date

I don't think that there is a necessity to revert the patch-set.
All performance issues can be addressed by small fixes on top:
1. Patch from Balazs to read certain information per-batch.
2. Optionally, guard packet miss recovery with ifdef of an
   experimental API. -- I'm not sure about this one as it will
   strike back in the future.

Best regards, Ilya Maximets.


More information about the dev mailing list