[ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload

Van Haaren, Harry harry.van.haaren at intel.com
Thu Jul 1 16:44:30 UTC 2021


> -----Original Message-----
> From: Ilya Maximets <i.maximets at ovn.org>
> Sent: Thursday, July 1, 2021 11:32 AM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>; Ilya Maximets
> <i.maximets at ovn.org>
> Cc: Eli Britstein <elibr at nvidia.com>; ovs dev <dev at openvswitch.org>; Ivan Malov
> <Ivan.Malov at oktetlabs.ru>; Majd Dibbiny <majd at nvidia.com>; Stokes, Ian
> <ian.stokes at intel.com>; Ferriter, Cian <cian.ferriter at intel.com>; Ben Pfaff
> <blp at ovn.org>; Balazs Nemeth <bnemeth at redhat.com>; Sriharsha Basavapatna
> <sriharsha.basavapatna at broadcom.com>
> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> 
> On 6/29/21 1:53 PM, Van Haaren, Harry wrote:
> >> -----Original Message-----
> >> From: Ilya Maximets <i.maximets at ovn.org>
> >> Sent: Monday, June 28, 2021 3:33 PM
> >> To: Van Haaren, Harry <harry.van.haaren at intel.com>; Ilya Maximets
> >> <i.maximets at ovn.org>; Sriharsha Basavapatna
> >> <sriharsha.basavapatna at broadcom.com>
> >> Cc: Eli Britstein <elibr at nvidia.com>; ovs dev <dev at openvswitch.org>; Ivan
> Malov
> >> <Ivan.Malov at oktetlabs.ru>; Majd Dibbiny <majd at nvidia.com>; Stokes, Ian
> >> <ian.stokes at intel.com>; Ferriter, Cian <cian.ferriter at intel.com>; Ben Pfaff
> >> <blp at ovn.org>; Balazs Nemeth <bnemeth at redhat.com>
> >> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> >>
> >> On 6/25/21 7:28 PM, Van Haaren, Harry wrote:
> >>>> -----Original Message-----
> >>>> From: dev <ovs-dev-bounces at openvswitch.org> On Behalf Of Ilya Maximets
> >>>> Sent: Friday, June 25, 2021 4:26 PM
> >>>> To: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>; Ilya
> >> Maximets
> >>>> <i.maximets at ovn.org>
> >>>> Cc: Eli Britstein <elibr at nvidia.com>; ovs dev <dev at openvswitch.org>; Ivan
> >> Malov
> >>>> <Ivan.Malov at oktetlabs.ru>; Majd Dibbiny <majd at nvidia.com>
> >>>> Subject: Re: [ovs-dev] [PATCH V7 00/13] Netdev vxlan-decap offload
> >>>
> >>> <snip commit message detail>
> >>>
> >>>>>> That looks good to me.  So, I guess, Harsha, we're waiting for
> >>>>>> your review/tests here.
> >>>>>
> >>>>> Thanks Ilya and Eli, looks good to me; I've also tested it and it works fine.
> >>>>> -Harsha
> >>>>
> >>>> Thanks, everyone.  Applied to master.
> >>>
> >>> Hi Ilya and OVS Community,
> >>>
> >>> There are open questions around this patchset, why has it been merged?
> >>>
> >>> Earlier today, new concerns were raised by Cian around the negative
> performance
> >> impact of these code changes:
> >>> - https://mail.openvswitch.org/pipermail/ovs-dev/2021-June/384445.html
> >>>
> >>> Both you (Ilya) and Eli responded, and I was following the conversation. Various
> >> code changes were suggested,
> >>> and some may seem like they might work, Eli mentioned some solutions might
> not
> >> work due to the hardware:
> >>> I was processing both your comments and input, and planning a technical reply
> >> later today.
> >>> - suggestions: https://mail.openvswitch.org/pipermail/ovs-dev/2021-
> >> June/384446.html
> >>> - concerns around hw: https://mail.openvswitch.org/pipermail/ovs-dev/2021-
> >> June/384464.html
> >>
> >> Concerns not really about the hardware, but the API itself
> >> that should be clarified a little bit to avoid confusion and
> >> avoid incorrect changes like the one I suggested.
> >> But this is a small enhancement that could be done on top.
> >>
> >>>
> >>> Keep in mind that there are open performance issues to be worked out, that
> have
> >> not been resolved at this point in the conversation.
> >>
> >> Performance issue that can be worked out, will be worked out
> >> in a separate patch , v1 for which we already have on a mailing
> >> list for some time, so it didn't make sense to to re-validate
> >> the whole series again due to this one pretty obvious change.
> >>
> >>> There is no agreement on solutions, nor an agreement to ignore the
> performance
> >> degradation, or to try resolve this degradation later.
> >>
> >> Particular part of the packet restoration call seems hard
> >> to avoid in a long term (I don't see a good solution for that),
> >> but the short term solution might be implemented on top.
> >> The part with multiple reads of recirc_id and checking if
> >> offloading is enabled has a fix already (that needs a v2, but
> >> anyway).
> >>
> >>>
> >>> That these patches have been merged is inappropriate:
> >>> 1) Not enough time given for responses (11 am concerns raised, 5pm merged
> >> without resolution? (Irish timezone))
> >>
> >> I responded with suggestions and arguments against solutions
> >> suggested in the report, Eli responded with rejection of one
> >> one of my suggestions.  And it seems clear (for me) that
> >> there is no good solution for this part at the moment.
> >> Part of the performance could be won back, but the rest
> >> seems to be inevitable.  As a short-term solution we can
> >> guard the netdev_hw_miss_packet_recover() with experimental
> >> API ifdef, but it will strike back anyway in the future.
> >>
> >>> 2) Open question not addressed/resolved, resulting in a 6% known negative
> >> performance impact being merged.
> >>
> >> I don't think it wasn't addressed.
> >
> > Was code merged that resulted in a known regression of 6%?  Yes. Facts are facts.
> > I don't care for arguing over exactly what "addressed" means in this context.
> >
> >
> >>> 3) Suggestions provided were not reviewed technically in detail (no technical
> >> collaboration or code-changes/patches reviewed)
> >>
> >> Patches was heavily reviewed/tested by at least 4 different
> >> parties including 2 test rounds from Intel engineers that,
> >> I believe, included testing of partial offloading.  And that
> >> bothers me the most.  If I can not trust performance test
> >> reports, I'm not sure performance can be a gating factor here.
> >>
> >>>
> >>> I feel that the OVS process of allowing time for community review and
> >> collaboration was not adhered to in this instance.
> >>> As a result, code was merged that is known to cause performance degradation.
> >>>
> >>> Therefore, this email is a request to revert these patches as they are not
> currently
> >> fit for inclusion in my opinion.
> >>>
> >>> As next steps, I can propose the following:
> >>> 1) Revert the patches from master branch
> >>> 2) Continue technical discussion on how to avoid negative performance impact
> >>> 3) Review solutions, allowing time for responses and replies
> >>> 4) Merge a future revision of this patchset at a later date
> >>
> >> I don't think that there is a necessity to revert the patch-set.
> >> All performance issues can be addressed by small fixes on top:
> >> 1. Patch from Balazs to read certain information per-batch.
> >> 2. Optionally, guard packet miss recovery with ifdef of an
> >>    experimental API. -- I'm not sure about this one as it will
> >>    strike back in the future.
> >
> > Instead of replying point-by-point, let me summarize my stance:
> > 1) I hold the OVS community and its maintainers to the highest standard
> > 2) I feel the OVS review process of reaching a consensus and then merging is a
> cornerstone of the community
> > 3) I feel that when patches are merged without consensus, ignoring community
> input, and causing known
> > performance degradations, that is not in line with the OVS review process.
> >
> > To move forward in a pragmatic way, instead of ad infinitum discussing details, I
> propose the following:
> > 1) We accept that this patchset was merged without community consensus, and
> this must never happen again in future.
> 
> I would respectfully disagree with this statement again.
> The patch set was reviewed and tested by many people from
> 4 different companies.  Tests that was performed earlier
> by other Intel engineers, as far as I understand, was much
> closer to real-wold workloads and showed no measurable
> performance degradation to my best knowledge.  Therefore,
> the patch set was merged.
> 
> Quick performance benchmarks on simple setups are good for
> performance analysis, but they doesn't reflect real setups
> most of the time.  Therefore, results should be taken into
> consideration, but should not be necessarily gating.
> I made a decision to merge taking into account that fairly
> simple solutions for all exposed by the benchmark issues
> could be implemented, and one of them already existed as a
> patch and there was no much sense including that separate
> patch into the patch-set and ask everyone to re-spin the
> whole series again.
> 
> Another point for merging the series is allowing everyone
> to work on top of this code in order to avoid massive
> re-bases at even later stages of our release cycle.  Sadly,
> as usual, there is a lot of activity few weeks before the
> soft freeze, while no-one cares about patches/bug fixes
> during the previous 4-5 months, and also only cares about
> their own features, piling all the review work on
> maintainers.
> 
> To be fair, unlike most of other big patch-sets, offloading
> of the VXLAN decap is one of the very small number of
> features that was consistently worked on (including reviews
> and design discussions) throughout the whole release time
> frame, multiple release time frames, actually.
> 
> > 2) We focus our attentions (Ilya, Eli, Balazs, Cian, Harry, and others who want to
> help) on fixing the performance degradation.
> >      (To be very clear, I'm suggesting to _not_ revert the commits in question, but
> build on them and solve the performance issue).
> 
> For sure.  That was my original intention.  Balazs already
> updated his patch a few days ago, but I didn't see any
> interest in it from the Intel side.
> 
> And, from what I heard on yesterday's public meeting, Eli
> is working on patches for the missed packet lookup fix.
> 
> Best regards, Ilya Maximets.

I'm Out Of Office tomorrow Friday 2nd July, but will reply to this thread early
next week when back in the office.

Regards, -Harry


More information about the dev mailing list