[ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

Ilya Maximets i.maximets at ovn.org
Wed Jul 7 14:53:14 UTC 2021


On 7/6/21 3:34 PM, Van Haaren, Harry wrote:
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets at ovn.org>
>> Sent: Thursday, July 1, 2021 11:32 AM
>> To: Van Haaren, Harry <harry.van.haaren at intel.com>; Ilya Maximets
>> <i.maximets at ovn.org>
>> Cc: Eli Britstein <elibr at nvidia.com>; ovs dev <dev at openvswitch.org>; Ivan Malov
>> <Ivan.Malov at oktetlabs.ru>; Majd Dibbiny <majd at nvidia.com>; Stokes, Ian
>> <ian.stokes at intel.com>; Ferriter, Cian <cian.ferriter at intel.com>; Ben Pfaff
>> <blp at ovn.org>; Balazs Nemeth <bnemeth at redhat.com>; Sriharsha Basavapatna
>> <sriharsha.basavapatna at broadcom.com>
>> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
>>
>> On 6/29/21 1:53 PM, Van Haaren, Harry wrote:
>>>> -----Original Message-----
>>>> From: Ilya Maximets <i.maximets at ovn.org>
>>>> Sent: Monday, June 28, 2021 3:33 PM
>>>> To: Van Haaren, Harry <harry.van.haaren at intel.com>; Ilya Maximets
>>>> <i.maximets at ovn.org>; Sriharsha Basavapatna
>>>> <sriharsha.basavapatna at broadcom.com>
>>>> Cc: Eli Britstein <elibr at nvidia.com>; ovs dev <dev at openvswitch.org>; Ivan
>> Malov
>>>> <Ivan.Malov at oktetlabs.ru>; Majd Dibbiny <majd at nvidia.com>; Stokes, Ian
>>>> <ian.stokes at intel.com>; Ferriter, Cian <cian.ferriter at intel.com>; Ben Pfaff
>>>> <blp at ovn.org>; Balazs Nemeth <bnemeth at redhat.com>
>>>> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
>>>>
>>>> On 6/25/21 7:28 PM, Van Haaren, Harry wrote:
>>>>>> -----Original Message-----
>>>>>> From: dev <ovs-dev-bounces at openvswitch.org> On Behalf Of Ilya Maximets
>>>>>> Sent: Friday, June 25, 2021 4:26 PM
>>>>>> To: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>; Ilya
>>>> Maximets
>>>>>> <i.maximets at ovn.org>
>>>>>> Cc: Eli Britstein <elibr at nvidia.com>; ovs dev <dev at openvswitch.org>; Ivan
>>>> Malov
>>>>>> <Ivan.Malov at oktetlabs.ru>; Majd Dibbiny <majd at nvidia.com>
>>>>>> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
>>>>>
>>>>> <snip commit message detail>
>>>>>
>>>>>>>> That looks good to me.  So, I guess, Harsha, we're waiting for
>>>>>>>> your review/tests here.
>>>>>>>
>>>>>>> Thanks Ilya and Eli, looks good to me; I've also tested it and it works fine.
>>>>>>> -Harsha
>>>>>>
>>>>>> Thanks, everyone.  Applied to master.
>>>>>
>>>>> Hi Ilya and OVS Community,
>>>>>
>>>>> There are open questions around this patchset, why has it been merged?
>>>>>
>>>>> Earlier today, new concerns were raised by Cian around the negative
>> performance
>>>> impact of these code changes:
>>>>> - https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384445.html
>>>>>
>>>>> Both you (Ilya) and Eli responded, and I was following the conversation. Various
>>>> code changes were suggested,
>>>>> and some may seem like they might work, Eli mentioned some solutions might
>> not
>>>> work due to the hardware:
>>>>> I was processing both your comments and input, and planning a technical reply
>>>> later today.
>>>>> - suggestions: https://mail.openvswitch.org/pipermail/ovs-dev/2021-
>>>> June/384446.html
>>>>> - concerns around hw: https://mail.openvswitch.org/pipermail/ovs-dev/2021-
>>>> June/384464.html
>>>>
>>>> Concerns not really about the hardware, but the API itself
>>>> that should be clarified a little bit to avoid confusion and
>>>> avoid incorrect changes like the one I suggested.
>>>> But this is a small enhancement that could be done on top.
>>>>
>>>>>
>>>>> Keep in mind that there are open performance issues to be worked out, that
>> have
>>>> not been resolved at this point in the conversation.
>>>>
>>>> Performance issue that can be worked out, will be worked out
>>>> in a separate patch , v1 for which we already have on a mailing
>>>> list for some time, so it didn't make sense to to re-validate
>>>> the whole series again due to this one pretty obvious change.
>>>>
>>>>> There is no agreement on solutions, nor an agreement to ignore the
>> performance
>>>> degradation, or to try resolve this degradation later.
>>>>
>>>> Particular part of the packet restoration call seems hard
>>>> to avoid in a long term (I don't see a good solution for that),
>>>> but the short term solution might be implemented on top.
>>>> The part with multiple reads of recirc_id and checking if
>>>> offloading is enabled has a fix already (that needs a v2, but
>>>> anyway).
>>>>
>>>>>
>>>>> That these patches have been merged is inappropriate:
>>>>> 1) Not enough time given for responses (11 am concerns raised, 5pm merged
>>>> without resolution? (Irish timezone))
>>>>
>>>> I responded with suggestions and arguments against solutions
>>>> suggested in the report, Eli responded with rejection of one
>>>> one of my suggestions.  And it seems clear (for me) that
>>>> there is no good solution for this part at the moment.
>>>> Part of the performance could be won back, but the rest
>>>> seems to be inevitable.  As a short-term solution we can
>>>> guard the netdev_hw_miss_packet_recover() with experimental
>>>> API ifdef, but it will strike back anyway in the future.
>>>>
>>>>> 2) Open question not addressed/resolved, resulting in a 6% known negative
>>>> performance impact being merged.
>>>>
>>>> I don't think it wasn't addressed.
>>>
>>> Was code merged that resulted in a known regression of 6%?  Yes. Facts are facts.
>>> I don't care for arguing over exactly what "addressed" means in this context.
>>>
>>>
>>>>> 3) Suggestions provided were not reviewed technically in detail (no technical
>>>> collaboration or code-changes/patches reviewed)
>>>>
>>>> Patches was heavily reviewed/tested by at least 4 different
>>>> parties including 2 test rounds from Intel engineers that,
>>>> I believe, included testing of partial offloading.  And that
>>>> bothers me the most.  If I can not trust performance test
>>>> reports, I'm not sure performance can be a gating factor here.
>>>>
>>>>>
>>>>> I feel that the OVS process of allowing time for community review and
>>>> collaboration was not adhered to in this instance.
>>>>> As a result, code was merged that is known to cause performance degradation.
>>>>>
>>>>> Therefore, this email is a request to revert these patches as they are not
>> currently
>>>> fit for inclusion in my opinion.
>>>>>
>>>>> As next steps, I can propose the following:
>>>>> 1) Revert the patches from master branch
>>>>> 2) Continue technical discussion on how to avoid negative performance impact
>>>>> 3) Review solutions, allowing time for responses and replies
>>>>> 4) Merge a future revision of this patchset at a later date
>>>>
>>>> I don't think that there is a necessity to revert the patch-set.
>>>> All performance issues can be addressed by small fixes on top:
>>>> 1. Patch from Balazs to read certain information per-batch.
>>>> 2. Optionally, guard packet miss recovery with ifdef of an
>>>>    experimental API. -- I'm not sure about this one as it will
>>>>    strike back in the future.
>>>
>>> Instead of replying point-by-point, let me summarize my stance:
>>> 1) I hold the OVS community and its maintainers to the highest standard
>>> 2) I feel the OVS review process of reaching a consensus and then merging is a
>> cornerstone of the community
>>> 3) I feel that when patches are merged without consensus, ignoring community
>> input, and causing known
>>> performance degradations, that is not in line with the OVS review process.
>>>
>>> To move forward in a pragmatic way, instead of ad infinitum discussing details, I
>> propose the following:
>>> 1) We accept that this patchset was merged without community consensus, and
>> this must never happen again in future.
>>
>> I would respectfully disagree with this statement again.
> 
> There is a disconnect somewhere. Let my try to simplify the conversation to per topic
> statements, and try to understand where your disagreement originates.
> 
> 
>> The patch set was reviewed and tested by many people from
>> 4 different companies.  Tests that was performed earlier
>> by other Intel engineers, as far as I understand, was much
>> closer to real-wold workloads and showed no measurable
>> performance degradation to my best knowledge.  Therefore,
>> the patch set was merged.
> 
> Yes, tested-by tags were given to a patchset. This is not the root issue
> being discussed here. Presence of a tested-by tag does not mean it is
> OK to ignore open questions, and merge code without consensus.
> Do you agree?

I agree that merge without consensus on large questions should
not happen.  However, in this case, there was consensus between
multiple people and very small incremental changes needs in order
to fix performance issues reported.  So, to my judgement merge
was justified.

I'm re-iterating these arguments several times already, so I'll
just refer them as 'my arguments' form this point.

> 
> 
>> Quick performance benchmarks on simple setups are good for
>> performance analysis, but they doesn't reflect real setups
>> most of the time.  Therefore, results should be taken into
>> consideration, but should not be necessarily gating.
> 
> The root issue being discussed is code merged without consensus.
> The performance regression introduced is the effect of that code merge.
> Please keep focused on the real topic "merging code without consensus".

I just wanted to highlight one of 'my arguments' why I think the
merge was justified in this particular case.  You're trying
to move away from the case and start to discuss abstract things.

> 
> 
>> I made a decision to merge taking into account that fairly
>> simple solutions for all exposed by the benchmark issues
>> could be implemented, and one of them already existed as a
>> patch and there was no much sense including that separate
>> patch into the patch-set and ask everyone to re-spin the
>> whole series again.
> 
> None of these solutions were agreed on. There were multiple proposals,
> and some discussion on technical merit, but there was no consensus or
> agreement from those who raised the concerns.
> 
> Do you agree the above statement is true?

There was no direct agreement on particular solution on one of
two highlighted performance problems in the code, I agree.
However, existence of very simple solutions for them (and even
existence of a patch for the second one) made it clear that
the issue could be fixed incrementally very easily, hence
no need to block the patch set.

> 
> 
>> Another point for merging the series is allowing everyone
>> to work on top of this code in order to avoid massive
>> re-bases at even later stages of our release cycle.
> 
> Rebasing patchsets is part of the normal development workflow. Suggesting that
> "avoid massive re-bases" is a valid justification for merging code without consensus makes no sense.

It's not a justification, it's just a handy bonus.  It wasn't
merged because of this.

> 
> 
>> Sadly,
>> as usual, there is a lot of activity few weeks before the
>> soft freeze, while no-one cares about patches/bug fixes
>> during the previous 4-5 months, and also only cares about
>> their own features, piling all the review work on
>> maintainers.
> 
> Reviews in the community and workload of maintainers is not the topic here.
> Even so, reviews/workload must never influence merging code without community consensus.
> Agreed?

We're all humans, so load inevitably influences the way we're working.
We should try to reduce the influence, but it's not possible to avoid
completely.

> 
> 
>> To be fair, unlike most of other big patch-sets, offloading
>> of the VXLAN decap is one of the very small number of
>> features that was consistently worked on (including reviews
>> and design discussions) throughout the whole release time
>> frame, multiple release time frames, actually.
> 
> Other patchsets were also in constant development, versions, and
> got little/no reviews from the community. And also for multiple release
> time frames. It is not a valid justification for merging the code without
> achieving consensus first.

I think, you've lost my point here as you're replying to the opposite
of what I tried to say.  Sorry, if I failed to formulate the sentence
in a clear way, but I don't know how to formulate it differently.

> 
> 
>>> 2) We focus our attentions (Ilya, Eli, Balazs, Cian, Harry, and others who want to
>> help) on fixing the performance degradation.
>>>      (To be very clear, I'm suggesting to _not_ revert the commits in question, but
>> build on them and solve the performance issue).
>>
>> For sure.  That was my original intention.  Balazs already
>> updated his patch a few days ago, but I didn't see any
>> interest in it from the Intel side.
> 
> I have no desire to be in the same position next week or next release, fixing performance issues that
> were knowingly introduced, while community feedback about this regression was ignored.
> 
> There has to be an agreement to ensure code is not merged without consensus again in future *first*.
> Once that agreement is in place that this type of "merge with open issues" will not re-occur, I will engage
> in reviewing/helping fix the performance issues.
> 
> Let me re-iterate my previous point, in order to move forward, we must agree to the following:
> "We accept that this patchset was merged without community consensus, and this must never happen again in future."

I don't think that demanding anything is a good way to lead
a conversation.  In this way of saying I will respectfully
disagree again with the first part of this quoted sentence with
exactly the same arguments that I presented in a previous email.

Best regards, Ilya Maximets.

> 
>> And, from what I heard on yesterday's public meeting, Eli
>> is working on patches for the missed packet lookup fix.
>>
>> Best regards, Ilya Maximets.
> 
> Regards, -Harry
> 



More information about the dev mailing list