[ovs-dev] 64Byte packet performance regression on 2.9 from 2.7

Shahaji Bhosle shahaji.bhosle at broadcom.com
Wed Jun 27 01:48:25 UTC 2018


Hi Ilya,
Just wanted to check if you found anything interesting. Or anything we can
try. Thanks, Shahaji

On Wed, Jun 20, 2018 at 9:01 AM, Shahaji Bhosle <shahaji.bhosle at broadcom.com
> wrote:

> Thanks Ilya,
>  Sorry for the confusion with the number, we used to get some different
> numbers on both ports so were recording it per port. You have to compare it
> with the two port number....
>
> CPU mask Mpps
> 17.11 testpmd 6 queue 0xfe 21.5 + 21.5
> OvS 2.9+DPDK17.11 6 queue 0xfe 15.5 + 15.5
> 16.11 testpmd 6 queue 0xfe 21.5 + 21.5
> OvS 2.7+DPDK16.11 6 queue 0xfe 17.4+17.4
> Thanks, Shahaji
>
> On Wed, Jun 20, 2018 at 8:34 AM, Ilya Maximets <i.maximets at samsung.com>
> wrote:
>
>> Ok, I'll look at the data later.
>>
>> But your testpmd results are much lower than OVS results. 21.5Mpps for
>> testpmd
>> versus 33.8Mpps for OVS. OVS should work slower than testpmd, because it
>> performs
>> a lot of parsing and processing while testpmd does not.
>> You probably tested testpmd in deifferent environment or allocated less
>> amount
>> of resources for PMD htreads. Could you please recheck?
>>
>> What is your OVS configuration (pmd-cpu-mask, n_rxqs etc.)?
>> And what is your testpmd command-line?
>>
>> On 20.06.2018 14:54, Shahaji Bhosle wrote:
>> > Thanks Ilya,
>> > Attaching the two perf reports...We did run testpmd on its own, there
>> were no red flags there. In some of the cases like flowgen 17.11 performs
>> much better than 16.11, but for the macswap case, the numbers are below.
>> Let me know if you cannot see the attached perf reports. I can just cut and
>> paste them in the email if attachment does not work. Sorry I am not sure I
>> can post these on any outside servers. Let me know
>> > Thanks, Shahaji
>> >
>> > *DPDK on Maia (macswap)*      *Rings*         *Mpps*  *Cycles/Packet*
>> > 17.11 testpmd                 6 queue         21.5 + 21.5     60
>> >                               1 queue         10.4+10.4       14
>> > 16.11 testpmd                 6 queue         21.5 + 21.5     60
>> >                               1 queue         10.4+10.4       14
>> >
>> >
>> > On Wed, Jun 20, 2018 at 4:52 AM, Ilya Maximets <i.maximets at samsung.com
>> <mailto:i.maximets at samsung.com>> wrote:
>> >
>> >     Looking at your perf stats I see following:
>> >
>> >     OVS 2.7:
>> >
>> >       ??.??% - dp_netdev_process_rxq_port
>> >       |-- 93.36% - dp_netdev_input
>> >       |-- ??.??% - netdev_rxq_recv
>> >
>> >     OVS 2.9:
>> >
>> >       99.69% - dp_netdev_process_rxq_port
>> >       |-- 79.45% - dp_netdev_input
>> >       |-- 11.26% - dp_netdev_pmd_flush_output_packets
>> >       |-- ??.??% - netdev_rxq_recv
>> >
>> >     Could you please fill the missed (??.??) values?
>> >     This data I got from the picture attached to the previous mail, but
>> pictures
>> >     are still not allowed in mail-list (i.e. stripped). It'll be good
>> if you can
>> >     upload your raw data to some external resource and post the link
>> here.
>> >
>> >     Anyway, from the data I have, I can see that total sum of time
>> spent in
>> >     "dp_netdev_input" and "dp_netdev_pmd_flush_output_packets" for 2.9
>> is 90.71%,
>> >     which is less then 93.36% spent for 2.7. This means that processing
>> + sending
>> >     become even faster or remains with the approximately same
>> performance.
>> >     We definitely need all the missed values to be sure, but it seems
>> that the
>> >     "netdev_rxq_recv()" could be the issue.
>> >
>> >     To check if DPDK itself causes the performance regression, I'd ask
>> you
>> >     to check pure PHY-PHY test with testpmd app from DPDK 16.11 and
>> DPDK 17.11.
>> >     Maybe it's the performance issue with bnxt driver that you're using.
>> >     There was too many changes in that driver:
>> >
>> >       30 files changed, 17189 insertions(+), 3358 deletions(-)
>> >
>> >     Best regards, Ilya Maximets.
>> >
>> >     On 20.06.2018 01:18, Shahaji Bhosle wrote:
>> >     > Hi Ilya,
>> >     > This issue is a release blocker for us, just wanted to check
>> check if you need more details from us? Anything to expedite or root cause
>> the problem we can help
>> >     > Please let us know
>> >     > Thanks Shahaji
>> >     >
>> >     > On Mon, Jun 18, 2018 at 10:20 AM Shahaji Bhosle <
>> shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:
>> shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>> wrote:
>> >     >
>> >     >     Thanks Ilya, I will look at the commit, but not sure now how
>> to tell how much real work is being done, I would have liked polling cycles
>> to be treated as before and not towards packet processing. That does
>> explain, as long as there are packets on the wire we are always 100%,
>> basically cannot tell how efficiently the CPUs are being used.
>> >     >     Thanks, Shahaji
>> >     >
>> >     >     On Mon, Jun 18, 2018 at 10:07 AM, Ilya Maximets <
>> i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:
>> i.maximets at samsung.com <mailto:i.maximets at samsung.com>>> wrote:
>> >     >
>> >     >         Thanks for the data.
>> >     >
>> >     >         I have to note additionally that the meaning of
>> "processing cycles"
>> >     >         significantly changed since the following commit:
>> >     >
>> >     >             commit a2ac666d5265c01661e189caac321d962f54649f
>> >     >             Author: Ciara Loftus <ciara.loftus at intel.com <mailto:
>> ciara.loftus at intel.com> <mailto:ciara.loftus at intel.com <mailto:
>> ciara.loftus at intel.com>>>
>> >     >             Date:   Mon Feb 20 12:53:00 2017 +0000
>> >     >
>> >     >                 dpif-netdev: Change definitions of 'idle' &
>> 'processing' cycles
>> >     >
>> >     >                 Instead of counting all polling cycles as
>> processing cycles, only count
>> >     >                 the cycles where packets were received from the
>> polling.
>> >     >
>> >     >         This could explain the difference in "PMD Processing
>> Cycles" column,
>> >     >         because successful "POLLING" cycles are now included into
>> "PROCESSING".
>> >     >
>> >     >         Best regards, Ilya Maximets.
>> >     >
>> >     >         On 18.06.2018 16:31, Shahaji Bhosle wrote:
>> >     >         > Hi Ilya,
>> >     >         > Thanks for the quick reply,
>> >     >         > Please find the numbers for our PHY-PHY test, please
>> note that with OVS 2.9.1 + DPDK 17.11 even a 10% of the below numbers will
>> make the OVS 2.9+DPDK17.11 processing cycles to hit 100%, but 2.7 will on
>> our setup never goes above 75% for processing cycles. I am also attaching
>> the perf report between the two code bases and I think the
>> "11.26%--dp_netdev_pmd_flush_output_packets" is causing us to take the
>> performance hit. Out testing is also SRIOV and CPUs are ARM A72 cores. We
>> are happy to run more tests, it is not easy for use to move back to OVS
>> 2.8, but could happy to try more experiments if it helps us narrow down
>> further. Please note we have also tried increasing the tx-flush-interval
>> and it helps a little but still not significant enough. Let us know.
>> >     >         >
>> >     >         > Thanks, Shahaji
>> >     >         >
>> >     >         >
>> >     >         > *Setup:*
>> >     >         > IXIA<----SFP28--->Port 0 {(PF0)==[OVS+DPDK]==(PF1)}
>> Port 1<-----SFP28---->IXIA
>> >     >         >
>> >     >         > release/version       config  Test    direction
>>  MPPS    Ixia Line rate (%)      PMD Processing Cycles (%)
>> >     >         > OVS 2.9 + DPDK 17.11  OVS on Maia (PF0--PF1)  No drop
>> port 1 to 2     31.3    85      99.9
>> >     >         >
>>  port 2 to 1     31.3    85      99.9
>> >     >         >
>>  bi      15.5 + 15.5     42      99.9
>> >     >         >
>> >     >         >
>> >     >         > OVS 2.7 + DPDK 16.11  OVS on Maia (PF0--PF1)  No drop
>> port 1 to 2     33.8    90      71
>> >     >         >
>>  port 2 to 1     32.7    88      70
>> >     >         >
>>  bi      17.4+17.4       47      74
>> >     >         >
>> >     >         >
>> >     >         >
>> >     >         >
>> >     >         >
>> >     >         >
>> >     >         >
>> >     >         > On Mon, Jun 18, 2018 at 4:25 AM, Nitin Katiyar <
>> nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:
>> nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>> <mailto:
>> nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:
>> nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>>>> wrote:
>> >     >         >
>> >     >         >     Hi,
>> >     >         >     We also experienced degradation from OVS2.6/2.7 to
>> OVS2.8.2(with DPDK17.05.02). The drop is more for 64 bytes packet size
>> (~8-10%) even with higher number of flows. I tried OVS 2.8 with DPDK17.11
>> and it improved for higher packet sizes but 64 bytes size is still the
>> concern.
>> >     >         >
>> >     >         >     Regards,
>> >     >         >     Nitin
>> >     >         >
>> >     >         >     -----Original Message-----
>> >     >         >     From: Ilya Maximets [mailto:i.maximets at samsung.com
>> <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:
>> i.maximets at samsung.com>> <mailto:i.maximets at samsung.com <mailto:
>> i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:
>> i.maximets at samsung.com>>>]
>> >     >         >     Sent: Monday, June 18, 2018 1:32 PM
>> >     >         >     To: ovs-dev at openvswitch.org <mailto:
>> ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:
>> ovs-dev at openvswitch.org>> <mailto:ovs-dev at openvswitch.org <mailto:
>> ovs-dev at openvswitch.org> <mailto:ovs-dev at openvswitch.org <mailto:
>> ovs-dev at openvswitch.org>>>; shahaji.bhosle at broadcom.com <mailto:
>> shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com <mailto:
>> shahaji.bhosle at broadcom.com>> <mailto:shahaji.bhosle at broadcom.com
>> <mailto:shahaji.bhosle at broadcom.com> <mailto:shahaji.bhosle at broadcom.com
>> <mailto:shahaji.bhosle at broadcom.com>>>
>> >     >         >     Subject: Re: [ovs-dev] 64Byte packet performance
>> regression on 2.9 from 2.7
>> >     >         >
>> >     >         >     CC: Shahaji Bhosle
>> >     >         >
>> >     >         >     Sorry, missed you in CC list.
>> >     >         >
>> >     >         >     Best regards, Ilya Maximets.
>> >     >         >
>> >     >         >     On 15.06.2018 10:44, Ilya Maximets wrote:
>> >     >         >     >> Hi,
>> >     >         >     >> I just upgraded from OvS 2.7 + DPDK 16.11 to
>> OvS2.9 + DPDK 17.11 and
>> >     >         >     >> running into performance issue with 64 Byte
>> packet rate. One
>> >     >         >     >> interesting thing that I notice that even at
>> very light load from
>> >     >         >     >> IXIA the processing cycles on all the PMD
>> threads run close to 100%
>> >     >         >     >> of the cpu cycle on 2.9 OvS, but on OvS 2.7 even
>> under full load the
>> >     >         >     >> processing cycles remain at 75% of the cpu
>> cycles.
>> >     >         >     >>
>> >     >         >     >> Attaching the FlameGraphs of both the versions,
>> the only thing that
>> >     >         >     >> pops out to me is the new way invoking
>> netdev_send() is on 2.9 is
>> >     >         >     >> being invoked via  dp_netdev_pmd_flush_output_packets()
>> which seems
>> >     >         >     >> to be adding another ~11% to the whole rx to tx
>> path.
>> >     >         >     >>
>> >     >         >     >> I also did try the tx-flush-interval to 50 and
>> more it does seem to
>> >     >         >     >> help, but not significant enough to match the
>> 2.7 performance.
>> >     >         >     >>
>> >     >         >     >>
>> >     >         >     >> Any help or ideas would be really great. Thanks,
>> Shahaji
>> >     >         >     >
>> >     >         >     > Hello, Shahaji.
>> >     >         >     > Could you, please, describe your testing scenario
>> in more details?
>> >     >         >     > Also, mail-list filters attachments, so they are
>> not available. You
>> >     >         >     > need to publish them somewhere else or write in
>> text format inside the letter.
>> >     >         >     >
>> >     >         >     > About the performance itself: Some performance
>> degradation because of
>> >     >         >     > output batching is expected for tests with low
>> number of flows or
>> >     >         >     > simple PHY-PHY tests. It was mainly targeted for
>> cases with relatively
>> >     >         >     > large number of flows, for amortizing of
>> vhost-user penalties
>> >     >         >     > (PHY-VM-PHY, VM-VM cases), OVS bonding cases.
>> >     >         >     >
>> >     >         >     > If your test involves vhost-user ports, then you
>> should also consider
>> >     >         >     > vhost-user performance regression in stable DPDK
>> 17.11 because of
>> >     >         >     > fixes for CVE-2018-1059. Related bug:
>> >     >         >     >       https://dpdk.org/tracker/show_bug.cgi?id=48
>> <https://dpdk.org/tracker/show_bug.cgi?id=48> <
>> https://dpdk.org/tracker/show_bug.cgi?id=48 <
>> https://dpdk.org/tracker/show_bug.cgi?id=48>>
>> >     >         >     >
>> >     >         >     > It'll be good if you'll be able to test OVS 2.8 +
>> DPDK 17.05. There
>> >     >         >     > was too many changes since 2.7. It'll be hard to
>> track down the root cause.
>> >     >         >     >
>> >     >         >     > Best regards, Ilya Maximets.
>> >     >         >     >
>> >     >         >
>> >     >         >
>> >     >
>> >     >
>> >
>> >
>>
>
>


More information about the dev mailing list