[ovs-dev] 64Byte packet performance regression on 2.9 from 2.7

Ilya Maximets i.maximets at samsung.com
Mon Jul 2 16:55:04 UTC 2018


Sure, you need to collect perf records for the same binary, i.e. built with
the same compiler options (and on the same machine), to make them useful.

Unfortunately, I have no setup to test your case right now.
Data for 2.8 could help bisecting the issue.

On 02.07.2018 18:04, Shahaji Bhosle wrote:
> Hi Ilya,
> Thanks for the reply.
> For performance traffic testing we are running with -O2. You are right about the perf report, when were running with perf record we had set "-g -O0". Do you need us to run with just "-g -O2" and give you the profile, or any other optimization setting.
> Do you have a test setup for running 64B packets, and see the difference between 2.7 and 2.9? On our side we are trying to get 2.8 to work so we can give you an intermediate data point. Please let us know what we can do to help you debug this.
> Thanks, Shahaji
> 
> 
> On Mon, Jul 2, 2018 at 10:55 AM, Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com>> wrote:
> 
>     Hi.
>     Sorry for late response.
> 
>     Looking at your perf data, I see functions like "dp_packet_batch_size"
>     consuming ~0.5 - 0.7 % of time. Are you building with all compiler
>     optimizations disabled? Otherwise where should be no such symbols in
>     perf report. They should be completely inlined.
> 
>     Best regards, Ilya Maximets.
> 
>     On 27.06.2018 04:48, Shahaji Bhosle wrote:
>     > Hi Ilya,
>     > Just wanted to check if you found anything interesting. Or anything we can try. Thanks, Shahaji
>     > 
>     > On Wed, Jun 20, 2018 at 9:01 AM, Shahaji Bhosle <shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>> wrote:
>     > 
>     >     Thanks Ilya, 
>     >      Sorry for the confusion with the number, we used to get some different numbers on both ports so were recording it per port. You have to compare it with the two port number....
>     > 
>     >               CPU mask        Mpps
>     >     17.11 testpmd     6 queue 0xfe    21.5 + 21.5
>     >     OvS 2.9+DPDK17.11 6 queue 0xfe    15.5 + 15.5
>     >     16.11 testpmd     6 queue 0xfe    21.5 + 21.5
>     >     OvS 2.7+DPDK16.11 6 queue 0xfe    17.4+17.4
>     > 
>     > 
>     >     Thanks, Shahaji
>     > 
>     >     On Wed, Jun 20, 2018 at 8:34 AM, Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>> wrote:
>     > 
>     >         Ok, I'll look at the data later.
>     > 
>     >         But your testpmd results are much lower than OVS results. 21.5Mpps for testpmd
>     >         versus 33.8Mpps for OVS. OVS should work slower than testpmd, because it performs
>     >         a lot of parsing and processing while testpmd does not.
>     >         You probably tested testpmd in deifferent environment or allocated less amount
>     >         of resources for PMD htreads. Could you please recheck?
>     > 
>     >         What is your OVS configuration (pmd-cpu-mask, n_rxqs etc.)?
>     >         And what is your testpmd command-line?
>     > 
>     >         On 20.06.2018 14:54, Shahaji Bhosle wrote:
>     >         > Thanks Ilya,
>     >         > Attaching the two perf reports...We did run testpmd on its own, there were no red flags there. In some of the cases like flowgen 17.11 performs much better than 16.11, but for the macswap case, the numbers are below. Let me know if you cannot see the attached perf reports. I can just cut and paste them in the email if attachment does not work. Sorry I am not sure I can post these on any outside servers. Let me know
>     >         > Thanks, Shahaji
>     >         > 
>     >         > *DPDK on Maia (macswap)*      *Rings*         *Mpps*  *Cycles/Packet*
>     >         > 17.11 testpmd                 6 queue         21.5 + 21.5     60
>     >         >                               1 queue         10.4+10.4       14
>     >         > 16.11 testpmd                 6 queue         21.5 + 21.5     60
>     >         >                               1 queue         10.4+10.4       14
>     >         > 
>     >         > 
>     >         > On Wed, Jun 20, 2018 at 4:52 AM, Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>>> wrote:
>     >         >
>     >         >     Looking at your perf stats I see following:
>     >         >
>     >         >     OVS 2.7:
>     >         >
>     >         >       ??.??% - dp_netdev_process_rxq_port
>     >         >       |-- 93.36% - dp_netdev_input
>     >         >       |-- ??.??% - netdev_rxq_recv
>     >         >
>     >         >     OVS 2.9:
>     >         >
>     >         >       99.69% - dp_netdev_process_rxq_port
>     >         >       |-- 79.45% - dp_netdev_input
>     >         >       |-- 11.26% - dp_netdev_pmd_flush_output_packets
>     >         >       |-- ??.??% - netdev_rxq_recv
>     >         >
>     >         >     Could you please fill the missed (??.??) values?
>     >         >     This data I got from the picture attached to the previous mail, but pictures
>     >         >     are still not allowed in mail-list (i.e. stripped). It'll be good if you can
>     >         >     upload your raw data to some external resource and post the link here.
>     >         >
>     >         >     Anyway, from the data I have, I can see that total sum of time spent in
>     >         >     "dp_netdev_input" and "dp_netdev_pmd_flush_output_packets" for 2.9 is 90.71%,
>     >         >     which is less then 93.36% spent for 2.7. This means that processing + sending
>     >         >     become even faster or remains with the approximately same performance.
>     >         >     We definitely need all the missed values to be sure, but it seems that the
>     >         >     "netdev_rxq_recv()" could be the issue.
>     >         >
>     >         >     To check if DPDK itself causes the performance regression, I'd ask you
>     >         >     to check pure PHY-PHY test with testpmd app from DPDK 16.11 and DPDK 17.11.
>     >         >     Maybe it's the performance issue with bnxt driver that you're using.
>     >         >     There was too many changes in that driver:
>     >         >
>     >         >       30 files changed, 17189 insertions(+), 3358 deletions(-)
>     >         >
>     >         >     Best regards, Ilya Maximets.
>     >         >
>     >         >     On 20.06.2018 01:18, Shahaji Bhosle wrote:
>     >         >     > Hi Ilya,
>     >         >     > This issue is a release blocker for us, just wanted to check check if you need more details from us? Anything to expedite or root cause the problem we can help 
>     >         >     > Please let us know 
>     >         >     > Thanks Shahaji 
>     >         >     >
>     >         >     > On Mon, Jun 18, 2018 at 10:20 AM Shahaji Bhosle <shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>>>> wrote:
>     >         >     > 
>     >         >     >     Thanks Ilya, I will look at the commit, but not sure now how to tell how much real work is being done, I would have liked polling cycles to be treated as before and not towards packet processing. That does explain, as long as there are packets on the wire we are always 100%, basically cannot tell how efficiently the CPUs are being used.
>     >         >     >     Thanks, Shahaji
>     >         >     > 
>     >         >     >     On Mon, Jun 18, 2018 at 10:07 AM, Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>>>> wrote:
>     >         >     > 
>     >         >     >         Thanks for the data.
>     >         >     > 
>     >         >     >         I have to note additionally that the meaning of "processing cycles"
>     >         >     >         significantly changed since the following commit:
>     >         >     > 
>     >         >     >             commit a2ac666d5265c01661e189caac321d962f54649f
>     >         >     >             Author: Ciara Loftus <ciara.loftus at intel.com <mailto:ciara.loftus at intel.com> <mailto:ciara.loftus at intel.com <mailto:ciara.loftus at intel.com>> <mailto:ciara.loftus at intel.com <mailto:ciara.loftus at intel.com> <mailto:ciara.loftus at intel.com <mailto:ciara.loftus at intel.com>>> <mailto:ciara.loftus at intel.com <mailto:ciara.loftus at intel.com> <mailto:ciara.loftus at intel.com <mailto:ciara.loftus at intel.com>> <mailto:ciara.loftus at intel.com <mailto:ciara.loftus at intel.com> <mailto:ciara.loftus at intel.com <mailto:ciara.loftus at intel.com>>>>>
>     >         >     >             Date:   Mon Feb 20 12:53:00 2017 +0000
>     >         >     > 
>     >         >     >                 dpif-netdev: Change definitions of 'idle' & 'processing' cycles
>     >         >     > 
>     >         >     >                 Instead of counting all polling cycles as processing cycles, only count
>     >         >     >                 the cycles where packets were received from the polling.
>     >         >     > 
>     >         >     >         This could explain the difference in "PMD Processing Cycles" column,
>     >         >     >         because successful "POLLING" cycles are now included into "PROCESSING".
>     >         >     > 
>     >         >     >         Best regards, Ilya Maximets.
>     >         >     > 
>     >         >     >         On 18.06.2018 16:31, Shahaji Bhosle wrote:
>     >         >     >         > Hi Ilya,
>     >         >     >         > Thanks for the quick reply, 
>     >         >     >         > Please find the numbers for our PHY-PHY test, please note that with OVS 2.9.1 + DPDK 17.11 even a 10% of the below numbers will make the OVS 2.9+DPDK17.11 processing cycles to hit 100%, but 2.7 will on our setup never goes above 75% for processing cycles. I am also attaching the perf report between the two code bases and I think the  "11.26%--dp_netdev_pmd_flush_output_packets" is causing us to take the performance hit. Out testing is also SRIOV and CPUs are ARM A72 cores. We are happy to run more tests, it is not easy for use to move back to OVS 2.8, but could happy to try more experiments if it helps us narrow down further. Please note we have also tried increasing the tx-flush-interval and it helps a little but still not significant enough. Let us know.
>     >         >     >         > 
>     >         >     >         > Thanks, Shahaji 
>     >         >     >         > 
>     >         >     >         > 
>     >         >     >         > *Setup:*
>     >         >     >         > IXIA<----SFP28--->Port 0 {(PF0)==[OVS+DPDK]==(PF1)} Port 1<-----SFP28---->IXIA
>     >         >     >         > 
>     >         >     >         > release/version       config  Test    direction       MPPS    Ixia Line rate (%)      PMD Processing Cycles (%)
>     >         >     >         > OVS 2.9 + DPDK 17.11  OVS on Maia (PF0--PF1)  No drop port 1 to 2     31.3    85      99.9
>     >         >     >         >                                                       port 2 to 1     31.3    85      99.9
>     >         >     >         >                                                       bi      15.5 + 15.5     42      99.9
>     >         >     >         >                                               
>     >         >     >         >                                               
>     >         >     >         > OVS 2.7 + DPDK 16.11  OVS on Maia (PF0--PF1)  No drop port 1 to 2     33.8    90      71
>     >         >     >         >                                                       port 2 to 1     32.7    88      70
>     >         >     >         >                                                       bi      17.4+17.4       47      74
>     >         >     >         > 
>     >         >     >         > 
>     >         >     >         > 
>     >         >     >         > 
>     >         >     >         > 
>     >         >     >         > 
>     >         >     >         > 
>     >         >     >         > On Mon, Jun 18, 2018 at 4:25 AM, Nitin Katiyar <nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>>> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>>>> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>>> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com
>     <mailto:nitin.katiyar at ericsson.com>> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>>>>>> wrote:
>     >         >     >         > 
>     >         >     >         >     Hi,
>     >         >     >         >     We also experienced degradation from OVS2.6/2.7 to OVS2.8.2(with DPDK17.05.02). The drop is more for 64 bytes packet size (~8-10%) even with higher number of flows. I tried OVS 2.8 with DPDK17.11 and it improved for higher packet sizes but 64 bytes size is still the concern.
>     >         >     >         > 
>     >         >     >         >     Regards,
>     >         >     >         >     Nitin
>     >         >     >         > 
>     >         >     >         >     -----Original Message-----
>     >         >     >         >     From: Ilya Maximets [mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com
>     <mailto:i.maximets at samsung.com>>>>>]
>     >         >     >         >     Sent: Monday, June 18, 2018 1:32 PM
>     >         >     >         >     To: ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>>> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>>>> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>>> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>> <mailto:ovs-dev at openvswitch.org
>     <mailto:ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>>>>>; shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>>
>     >         <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>>>>
>     >         >     >         >     Subject: Re: [ovs-dev] 64Byte packet performance regression on 2.9 from 2.7
>     >         >     >         >
>     >         >     >         >     CC: Shahaji Bhosle
>     >         >     >         >
>     >         >     >         >     Sorry, missed you in CC list.
>     >         >     >         >
>     >         >     >         >     Best regards, Ilya Maximets.
>     >         >     >         >
>     >         >     >         >     On 15.06.2018 10:44, Ilya Maximets wrote:
>     >         >     >         >     >> Hi,
>     >         >     >         >     >> I just upgraded from OvS 2.7 + DPDK 16.11 to OvS2.9 + DPDK 17.11 and
>     >         >     >         >     >> running into performance issue with 64 Byte packet rate. One
>     >         >     >         >     >> interesting thing that I notice that even at very light load from
>     >         >     >         >     >> IXIA the processing cycles on all the PMD threads run close to 100%
>     >         >     >         >     >> of the cpu cycle on 2.9 OvS, but on OvS 2.7 even under full load the
>     >         >     >         >     >> processing cycles remain at 75% of the cpu cycles.
>     >         >     >         >     >>
>     >         >     >         >     >> Attaching the FlameGraphs of both the versions, the only thing that
>     >         >     >         >     >> pops out to me is the new way invoking netdev_send() is on 2.9 is
>     >         >     >         >     >> being invoked via  dp_netdev_pmd_flush_output_packets() which seems
>     >         >     >         >     >> to be adding another ~11% to the whole rx to tx path.
>     >         >     >         >     >>
>     >         >     >         >     >> I also did try the tx-flush-interval to 50 and more it does seem to
>     >         >     >         >     >> help, but not significant enough to match the 2.7 performance.
>     >         >     >         >     >>
>     >         >     >         >     >>
>     >         >     >         >     >> Any help or ideas would be really great. Thanks, Shahaji
>     >         >     >         >     >
>     >         >     >         >     > Hello, Shahaji.
>     >         >     >         >     > Could you, please, describe your testing scenario in more details?
>     >         >     >         >     > Also, mail-list filters attachments, so they are not available. You
>     >         >     >         >     > need to publish them somewhere else or write in text format inside the letter.
>     >         >     >         >     >
>     >         >     >         >     > About the performance itself: Some performance degradation because of
>     >         >     >         >     > output batching is expected for tests with low number of flows or
>     >         >     >         >     > simple PHY-PHY tests. It was mainly targeted for cases with relatively
>     >         >     >         >     > large number of flows, for amortizing of vhost-user penalties
>     >         >     >         >     > (PHY-VM-PHY, VM-VM cases), OVS bonding cases.
>     >         >     >         >     >
>     >         >     >         >     > If your test involves vhost-user ports, then you should also consider
>     >         >     >         >     > vhost-user performance regression in stable DPDK 17.11 because of
>     >         >     >         >     > fixes for CVE-2018-1059. Related bug:
>     >         >     >         >     >       https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48> <https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48>> <https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48> <https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48>>> <https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48> <https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48>> <https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48> <https://dpdk.org/tracker/show_bug.cgi?id=48 <https://dpdk.org/tracker/show_bug.cgi?id=48>>>>
>     >         >     >         >     >
>     >         >     >         >     > It'll be good if you'll be able to test OVS 2.8 + DPDK 17.05. There
>     >         >     >         >     > was too many changes since 2.7. It'll be hard to track down the root cause.
>     >         >     >         >     >
>     >         >     >         >     > Best regards, Ilya Maximets.
>     >         >     >         >     >
>     >         >     >         >
>     >         >     >         >
>     >         >     >
>     >         >     >
>     >         >
>     >         >
>     >
>     >
>     >
> 
> 


More information about the dev mailing list