[ovs-dev] 64Byte packet performance regression on 2.9 from 2.7

Shahaji Bhosle shahaji.bhosle at broadcom.com
Wed Jun 20 11:54:17 UTC 2018


Thanks Ilya,
Attaching the two perf reports...We did run testpmd on its own, there were
no red flags there. In some of the cases like flowgen 17.11 performs much
better than 16.11, but for the macswap case, the numbers are below. Let me
know if you cannot see the attached perf reports. I can just cut and paste
them in the email if attachment does not work. Sorry I am not sure I can
post these on any outside servers. Let me know
Thanks, Shahaji

*DPDK on Maia (macswap)* *Rings* *Mpps* *Cycles/Packet*
17.11 testpmd 6 queue 21.5 + 21.5 60
1 queue 10.4+10.4 14
16.11 testpmd 6 queue 21.5 + 21.5 60
1 queue 10.4+10.4 14

On Wed, Jun 20, 2018 at 4:52 AM, Ilya Maximets <i.maximets at samsung.com>
wrote:

> Looking at your perf stats I see following:
>
> OVS 2.7:
>
>   ??.??% - dp_netdev_process_rxq_port
>   |-- 93.36% - dp_netdev_input
>   |-- ??.??% - netdev_rxq_recv
>
> OVS 2.9:
>
>   99.69% - dp_netdev_process_rxq_port
>   |-- 79.45% - dp_netdev_input
>   |-- 11.26% - dp_netdev_pmd_flush_output_packets
>   |-- ??.??% - netdev_rxq_recv
>
> Could you please fill the missed (??.??) values?
> This data I got from the picture attached to the previous mail, but
> pictures
> are still not allowed in mail-list (i.e. stripped). It'll be good if you
> can
> upload your raw data to some external resource and post the link here.
>
> Anyway, from the data I have, I can see that total sum of time spent in
> "dp_netdev_input" and "dp_netdev_pmd_flush_output_packets" for 2.9 is
> 90.71%,
> which is less then 93.36% spent for 2.7. This means that processing +
> sending
> become even faster or remains with the approximately same performance.
> We definitely need all the missed values to be sure, but it seems that the
> "netdev_rxq_recv()" could be the issue.
>
> To check if DPDK itself causes the performance regression, I'd ask you
> to check pure PHY-PHY test with testpmd app from DPDK 16.11 and DPDK 17.11.
> Maybe it's the performance issue with bnxt driver that you're using.
> There was too many changes in that driver:
>
>   30 files changed, 17189 insertions(+), 3358 deletions(-)
>
> Best regards, Ilya Maximets.
>
> On 20.06.2018 01:18, Shahaji Bhosle wrote:
> > Hi Ilya,
> > This issue is a release blocker for us, just wanted to check check if
> you need more details from us? Anything to expedite or root cause the
> problem we can help
> > Please let us know
> > Thanks Shahaji
> >
> > On Mon, Jun 18, 2018 at 10:20 AM Shahaji Bhosle <
> shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>> wrote:
> >
> >     Thanks Ilya, I will look at the commit, but not sure now how to tell
> how much real work is being done, I would have liked polling cycles to be
> treated as before and not towards packet processing. That does explain, as
> long as there are packets on the wire we are always 100%, basically cannot
> tell how efficiently the CPUs are being used.
> >     Thanks, Shahaji
> >
> >     On Mon, Jun 18, 2018 at 10:07 AM, Ilya Maximets <
> i.maximets at samsung.com <mailto:i.maximets at samsung.com>> wrote:
> >
> >         Thanks for the data.
> >
> >         I have to note additionally that the meaning of "processing
> cycles"
> >         significantly changed since the following commit:
> >
> >             commit a2ac666d5265c01661e189caac321d962f54649f
> >             Author: Ciara Loftus <ciara.loftus at intel.com <mailto:
> ciara.loftus at intel.com>>
> >             Date:   Mon Feb 20 12:53:00 2017 +0000
> >
> >                 dpif-netdev: Change definitions of 'idle' & 'processing'
> cycles
> >
> >                 Instead of counting all polling cycles as processing
> cycles, only count
> >                 the cycles where packets were received from the polling.
> >
> >         This could explain the difference in "PMD Processing Cycles"
> column,
> >         because successful "POLLING" cycles are now included into
> "PROCESSING".
> >
> >         Best regards, Ilya Maximets.
> >
> >         On 18.06.2018 16:31, Shahaji Bhosle wrote:
> >         > Hi Ilya,
> >         > Thanks for the quick reply,
> >         > Please find the numbers for our PHY-PHY test, please note that
> with OVS 2.9.1 + DPDK 17.11 even a 10% of the below numbers will make the
> OVS 2.9+DPDK17.11 processing cycles to hit 100%, but 2.7 will on our setup
> never goes above 75% for processing cycles. I am also attaching the perf
> report between the two code bases and I think the
> "11.26%--dp_netdev_pmd_flush_output_packets" is causing us to take the
> performance hit. Out testing is also SRIOV and CPUs are ARM A72 cores. We
> are happy to run more tests, it is not easy for use to move back to OVS
> 2.8, but could happy to try more experiments if it helps us narrow down
> further. Please note we have also tried increasing the tx-flush-interval
> and it helps a little but still not significant enough. Let us know.
> >         >
> >         > Thanks, Shahaji
> >         >
> >         >
> >         > *Setup:*
> >         > IXIA<----SFP28--->Port 0 {(PF0)==[OVS+DPDK]==(PF1)} Port
> 1<-----SFP28---->IXIA
> >         >
> >         > release/version       config  Test    direction       MPPS
> Ixia Line rate (%)      PMD Processing Cycles (%)
> >         > OVS 2.9 + DPDK 17.11  OVS on Maia (PF0--PF1)  No drop port 1
> to 2     31.3    85      99.9
> >         >                                                       port 2
> to 1     31.3    85      99.9
> >         >                                                       bi
> 15.5 + 15.5     42      99.9
> >         >
> >         >
> >         > OVS 2.7 + DPDK 16.11  OVS on Maia (PF0--PF1)  No drop port 1
> to 2     33.8    90      71
> >         >                                                       port 2
> to 1     32.7    88      70
> >         >                                                       bi
> 17.4+17.4       47      74
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > On Mon, Jun 18, 2018 at 4:25 AM, Nitin Katiyar <
> nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com> <mailto:
> nitin.katiyar at ericsson.com <mailto:nitin.katiyar at ericsson.com>>> wrote:
> >         >
> >         >     Hi,
> >         >     We also experienced degradation from OVS2.6/2.7 to
> OVS2.8.2(with DPDK17.05.02). The drop is more for 64 bytes packet size
> (~8-10%) even with higher number of flows. I tried OVS 2.8 with DPDK17.11
> and it improved for higher packet sizes but 64 bytes size is still the
> concern.
> >         >
> >         >     Regards,
> >         >     Nitin
> >         >
> >         >     -----Original Message-----
> >         >     From: Ilya Maximets [mailto:i.maximets at samsung.com
> <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:
> i.maximets at samsung.com>>]
> >         >     Sent: Monday, June 18, 2018 1:32 PM
> >         >     To: ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.
> org> <mailto:ovs-dev at openvswitch.org <mailto:ovs-dev at openvswitch.org>>;
> shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com> <mailto:
> shahaji.bhosle at broadcom.com <mailto:shahaji.bhosle at broadcom.com>>
> >         >     Subject: Re: [ovs-dev] 64Byte packet performance
> regression on 2.9 from 2.7
> >         >
> >         >     CC: Shahaji Bhosle
> >         >
> >         >     Sorry, missed you in CC list.
> >         >
> >         >     Best regards, Ilya Maximets.
> >         >
> >         >     On 15.06.2018 10:44, Ilya Maximets wrote:
> >         >     >> Hi,
> >         >     >> I just upgraded from OvS 2.7 + DPDK 16.11 to OvS2.9 +
> DPDK 17.11 and
> >         >     >> running into performance issue with 64 Byte packet
> rate. One
> >         >     >> interesting thing that I notice that even at very light
> load from
> >         >     >> IXIA the processing cycles on all the PMD threads run
> close to 100%
> >         >     >> of the cpu cycle on 2.9 OvS, but on OvS 2.7 even under
> full load the
> >         >     >> processing cycles remain at 75% of the cpu cycles.
> >         >     >>
> >         >     >> Attaching the FlameGraphs of both the versions, the
> only thing that
> >         >     >> pops out to me is the new way invoking netdev_send() is
> on 2.9 is
> >         >     >> being invoked via  dp_netdev_pmd_flush_output_packets()
> which seems
> >         >     >> to be adding another ~11% to the whole rx to tx path.
> >         >     >>
> >         >     >> I also did try the tx-flush-interval to 50 and more it
> does seem to
> >         >     >> help, but not significant enough to match the 2.7
> performance.
> >         >     >>
> >         >     >>
> >         >     >> Any help or ideas would be really great. Thanks, Shahaji
> >         >     >
> >         >     > Hello, Shahaji.
> >         >     > Could you, please, describe your testing scenario in
> more details?
> >         >     > Also, mail-list filters attachments, so they are not
> available. You
> >         >     > need to publish them somewhere else or write in text
> format inside the letter.
> >         >     >
> >         >     > About the performance itself: Some performance
> degradation because of
> >         >     > output batching is expected for tests with low number of
> flows or
> >         >     > simple PHY-PHY tests. It was mainly targeted for cases
> with relatively
> >         >     > large number of flows, for amortizing of vhost-user
> penalties
> >         >     > (PHY-VM-PHY, VM-VM cases), OVS bonding cases.
> >         >     >
> >         >     > If your test involves vhost-user ports, then you should
> also consider
> >         >     > vhost-user performance regression in stable DPDK 17.11
> because of
> >         >     > fixes for CVE-2018-1059. Related bug:
> >         >     >       https://dpdk.org/tracker/show_bug.cgi?id=48 <
> https://dpdk.org/tracker/show_bug.cgi?id=48>
> >         >     >
> >         >     > It'll be good if you'll be able to test OVS 2.8 + DPDK
> 17.05. There
> >         >     > was too many changes since 2.7. It'll be hard to track
> down the root cause.
> >         >     >
> >         >     > Best regards, Ilya Maximets.
> >         >     >
> >         >
> >         >
> >
> >
>
-------------- next part --------------
# Samples: 978  of event 'cycles:ppp'
# Event count (approx.): 29696620082
#
# Children      Self  Command  Shared Object         Symbol                                      
# ........  ........  .......  ....................  ............................................
#
   100.00%     0.00%  pmd17    libc-2.27.so          [.] 0xffff0000728734ec
            |
            ---0xd04ec
               0x7048
               ovsthread_wrapper
               pmd_thread_main
            --99.69%--dp_netdev_process_rxq_port
                |          
                |--79.45%--dp_netdev_input
                |          dp_netdev_input__
                |          |          
                |          |--73.64%--emc_processing
                |          |          |          
                |          |          |--38.31%--miniflow_extract
                |          |          |          |          
                |          |          |          |--3.10%--flow_tnl_dst_is_set
                |          |          |          |          |          
                |          |          |          |           --0.82%--ipv6_addr_equals
                |          |          |          |          
                |          |          |          |--1.64%--dp_packet_set_l2_pad_size
                |          |          |          |          
                |          |          |          |--1.12%--eth_type_mpls
                |          |          |          |          
                |          |          |          |--0.61%--__packet_data
                |          |          |          |          
                |          |          |           --0.51%--eth_type_vlan
                |          |          |          
                |          |          |--8.88%--dp_netdev_queue_batches
                |          |          |          |          
                |          |          |           --6.84%--packet_batch_per_flow_update
                |          |          |                     |          
                |          |          |                     |--5.20%--miniflow_get_tcp_flags
                |          |          |                     |          |          
                |          |          |                     |          |--2.35%--count_1bits
                |          |          |                     |          |          
                |          |          |                     |           --1.33%--miniflow_get__
                |          |          |                     |                     |          
                |          |          |                     |                      --0.92%--count_1bits
                |          |          |                     |          
                |          |          |                      --0.61%--flowmap_is_set
                |          |          |          
                |          |          |--7.04%--flowmap_set
                |          |          |          
                |          |          |--4.18%--emc_lookup
                |          |          |          |          
                |          |          |           --1.33%--memcmp
                |          |          |          
                |          |          |--1.97%--bytes_to_be32
                |          |          |          
                |          |          |--1.74%--data_pull
                |          |          |          
                |          |          |--1.53%--dpif_netdev_packet_get_rss_hash_orig_pkt
                |          |          |          
                |          |          |--1.12%--dp_packet_size
                |          |          |          
                |          |          |--1.02%--dp_packet_reset_offsets
                |          |          |          
                |          |           --0.82%--__packet_data
                |          |          
                |           --3.57%--packet_batch_per_flow_execute
                |                     |          
                |                      --3.26%--dp_netdev_execute_actions
                |                                odp_execute_actions
                |                                |          
                |                                |--2.24%--dp_execute_cb
                |                                |          |          
                |                                |          |--0.92%--pmd_send_port_cache_lookup
                |                                |          |          |          
                |                                |          |           --0.82%--tx_port_lookup
                |                                |          |                     |          
                |                                |          |                      --0.51%--hash_port_no
                |                                |          |          
                |                                |           --0.61%--dp_packet_batch_add
                |                                |                     |          
                |                                |                      --0.51%--dp_packet_batch_add__
                |                                |          
                |                                 --0.71%--dp_packet_batch_size
                |          
                |--11.26%--dp_netdev_pmd_flush_output_packets
                |          |          
                |           --9.73%--dp_netdev_pmd_flush_output_on_port
                |                     |          
                |                     |--7.65%--netdev_send
                |                     |          |          
                |                     |           --7.24%--netdev_dpdk_eth_send
                |                     |                     netdev_dpdk_send__
                |                     |                     |          
                |                     |                     |--5.00%--netdev_dpdk_eth_tx_burst
                |                     |                     |          rte_eth_tx_burst
                |                     |                     |          |          
                |                     |                     |          |--3.16%--bnxt_xmit_representor_vf_pkts
                |                     |                     |          |          |          
                |                     |                     |          |           --3.06%--bnxt_xmit_pkts
                |                     |                     |          |                     |          
                |                     |                     |          |                      --1.84%--bnxt_handle_tx_cp
                |                     |                     |          |          
                |                     |                     |           --1.53%--bnxt_xmit_pkts
                |                     |                     |                     |          
                |                     |                     |                      --0.71%--bnxt_handle_tx_cp
                |                     |                     |          
                |                     |                      --1.63%--netdev_dpdk_filter_packet_len
                |                     |          
                |                      --1.16%--non_atomic_ullong_add
                |          
                |--8.27%--netdev_rxq_recv
                |          |          
                |           --7.96%--netdev_dpdk_rxq_recv
                |                     |          
                |                     |--6.74%--rte_eth_rx_burst
                |                     |          |          
                |                     |           --5.31%--bnxt_recv_pkts
                |                     |          
                |                      --0.51%--dp_packet_batch_size
                |          
                 --0.51%--pmd_thread_ctx_time_update
                           time_usec
                           time_usec__
                           time_timespec__
-------------- next part --------------
# Samples: 980  of event 'cycles:ppp'
# Event count (approx.): 29737489531
#
# Children      Self  Command  Shared Object       Symbol                                   
# ........  ........  .......  ..................  .........................................
#
   100.00%     0.00%  pmd26    libc-2.27.so        [.] 0xffff000050e1c4ec
            |
            ---0xd04ec
               0x7048
               ovsthread_wrapper
               pmd_thread_main
               dp_netdev_process_rxq_port
               |          
               |--93.36%--dp_netdev_input
               |          dp_netdev_input__
               |          |          
               |          |--83.38%--emc_processing
               |          |          |          
               |          |          |--42.51%--miniflow_extract
               |          |          |          |          
               |          |          |          |--3.27%--flow_tnl_dst_is_set
               |          |          |          |          |          
               |          |          |          |          |--1.63%--ipv6_addr_equals
               |          |          |          |          |          
               |          |          |          |           --1.13%--ipv6_addr_is_set
               |          |          |          |          
               |          |          |          |--1.53%--dp_packet_set_l2_pad_size
               |          |          |          |          
               |          |          |          |--0.82%--__packet_data
               |          |          |          |          
               |          |          |           --0.61%--dp_packet_data
               |          |          |          
               |          |          |--11.45%--dp_netdev_queue_batches
               |          |          |          |          
               |          |          |           --8.40%--packet_batch_per_flow_update
               |          |          |                     |          
               |          |          |                      --7.48%--miniflow_get_tcp_flags
               |          |          |                                |          
               |          |          |                                |--2.80%--miniflow_get__
               |          |          |                                |          |          
               |          |          |                                |           --1.68%--count_1bits
               |          |          |                                |          
               |          |          |                                 --2.04%--count_1bits
               |          |          |          
               |          |          |--5.65%--flowmap_set
               |          |          |          
               |          |          |--5.53%--emc_lookup
               |          |          |          |          
               |          |          |           --1.02%--memcmp
               |          |          |          
               |          |          |--1.94%--dpif_netdev_packet_get_rss_hash
               |          |          |          
               |          |          |--1.32%--ovs_prefetch_range
               |          |          |          
               |          |          |--1.22%--dp_packet_size
               |          |          |          
               |          |          |--1.22%--dp_packet_data
               |          |          |          
               |          |          |--1.02%--data_pull
               |          |          |          
               |          |          |--0.92%--__packet_data
               |          |          |          
               |          |          |--0.82%--htons
               |          |          |          
               |          |          |--0.71%--dp_packet_reset_offsets
               |          |          |          
               |          |          |--0.61%--emc_entry_alive
               |          |          |          
               |          |          |--0.51%--dp_packet_rss_valid
               |          |          |          
               |          |          |--0.51%--pkt_metadata_prefetch_init
               |          |          |          
               |          |           --0.51%--pkt_metadata_init
               |          |          
               |          |--8.48%--packet_batch_per_flow_execute
               |          |          |          
               |          |           --7.76%--dp_netdev_execute_actions
               |          |                     odp_execute_actions
               |          |                     |          
               |          |                      --7.46%--dp_execute_cb
               |          |                                |          
               |          |                                |--6.85%--netdev_send
               |          |                                |          |          
               |          |                                |           --6.64%--netdev_dpdk_eth_send
               |          |                                |                     |          
               |          |                                |                      --6.44%--netdev_dpdk_send__
               |          |                                |                                |          
               |          |                                |                                |--5.01%--netdev_dpdk_eth_tx_burst
               |          |                                |                                |          |          
               |          |                                |                                |           --4.91%--rte_eth_tx_burst
               |          |                                |                                |                     |          
               |          |                                |                                |                     |--3.46%--bnxt_xmit_representor_vf_pkts
               |          |                                |                                |                     |          |          
               |          |                                |                                |                     |           --3.36%--bnxt_xmit_pkts
               |          |                                |                                |                     |                     |          
               |          |                                |                                |                     |                      --1.73%--bnxt_handle_tx_cp
               |          |                                |                                |                     |          
               |          |                                |                                |                      --1.04%--bnxt_xmit_pkts
               |          |                                |                                |                                |          
               |          |                                |                                |                                 --0.92%--bnxt_handle_tx_cp
               |          |                                |                                |          
               |          |                                |                                 --1.22%--netdev_dpdk_filter_packet_len
               |          |                                |          
               |          |                                 --0.51%--pmd_send_port_cache_lookup
               |          |                                           tx_port_lookup
               |          |          
               |           --0.51%--time_msec
               |                     time_msec__
               |                     time_timespec__
               |          
                --6.23%--netdev_rxq_recv
                          |          
                           --5.82%--netdev_dpdk_rxq_recv
                                     rte_eth_rx_burst
                                     |          
                                      --5.41%--bnxt_recv_pkts



More information about the dev mailing list