[ovs-dev] OVS-DPDK performance problem on ixgbe vector PMD

Zoltan Kiss zoltan.kiss at linaro.org
Wed Aug 26 17:07:54 UTC 2015


Hi,

On 24/08/15 12:43, Traynor, Kevin wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Zoltan Kiss
>> Sent: Friday, August 21, 2015 7:05 PM
>> To: dev at dpdk.org; dev at openvswitch.org
>> Cc: Richardson, Bruce; Ananyev, Konstantin
>> Subject: [ovs-dev] OVS-DPDK performance problem on ixgbe vector PMD
>>
>> Hi,
>>
>> I've set up a simple packet forwarding perf test on a dual-port 10G
>> 82599ES: one port receives 64 byte UDP packets, the other sends it out,
>> one core used. I've used latest OVS with DPDK 2.1, and the first result
>> was only 13.2 Mpps, which was a bit far from the 13.9 I've seen last
>> year with the same test. The first thing I've changed was to revert back
>> to the old behaviour about this issue:
>>
>> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731
>>
>> So instead of the new default I've passed 2048 + RTE_PKTMBUF_HEADROOM.
>> That increased the performance to 13.5, but to figure out what's wrong
>> started to play with the receive functions. First I've disabled vector
>> PMD, but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So
>> then I've enabled scattered RX, and with
>> ixgbe_recv_pkts_lro_bulk_alloc() I could manage to get 13.98 Mpps, which
>> is I guess as close as possible to the 14.2 line rate (on my HW at
>> least, with one core)
>> Does anyone has a good explanation about why the vector PMD performs so
>> significantly worse? I would expect that on a 3.2 GHz i5-4570 one core
>> should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a
>> difference.
>
> I've previously turned on/off vectorisation and found that for tx it makes
> a significant difference. For Rx it didn't make a much of a difference but
> rx bulk allocation which gets enabled with it did improve performance.
>
> Is there is something else also running on the current pmd core? did you
> try moving it to another?
I've tied the pmd to the second core, as far as I can see from top and 
profiling outputs hardly anything else runs there.

Also, did you compile OVS with -O3/-Ofast, they
> tend to give a performance boost.
Yes

>
> Are you hitting 3.2 GHz for the core with the pmd? I think that is only
> with turbo boost, so it may not be achievable all the time.
The turbo boost freq is 3.6 GHz.

>
>> I've tried to look into it with oprofile, but the results were quite
>> strange: 35% of the samples were from miniflow_extract, the part where
>> parse_vlan calls data_pull to jump after the MAC addresses. The oprofile
>> snippet (1M samples):
>>
>>     511454 19        0.0037  flow.c:511
>>     511458 149       0.0292  dp-packet.h:266
>>     51145f 4264      0.8357  dp-packet.h:267
>>     511466 18        0.0035  dp-packet.h:268
>>     51146d 43        0.0084  dp-packet.h:269
>>     511474 172       0.0337  flow.c:511
>>     51147a 4320      0.8467  string3.h:51
>>     51147e 358763   70.3176  flow.c:99
>>     511482 2        3.9e-04  string3.h:51
>>     511485 3060      0.5998  string3.h:51
>>     511488 1693      0.3318  string3.h:51
>>     51148c 2933      0.5749  flow.c:326
>>     511491 47        0.0092  flow.c:326
>>
>> And the corresponding disassembled code:
>>
>>     511454:       49 83 f9 0d             cmp    r9,0xd
>>     511458:       c6 83 81 00 00 00 00    mov    BYTE PTR [rbx+0x81],0x0
>>     51145f:       66 89 83 82 00 00 00    mov    WORD PTR [rbx+0x82],ax
>>     511466:       66 89 93 84 00 00 00    mov    WORD PTR [rbx+0x84],dx
>>     51146d:       66 89 8b 86 00 00 00    mov    WORD PTR [rbx+0x86],cx
>>     511474:       0f 86 af 01 00 00       jbe    511629
>> <miniflow_extract+0x279>
>>     51147a:       48 8b 45 00             mov    rax,QWORD PTR [rbp+0x0]
>>     51147e:       4c 8d 5d 0c             lea    r11,[rbp+0xc]
>>     511482:       49 89 00                mov    QWORD PTR [r8],rax
>>     511485:       8b 45 08                mov    eax,DWORD PTR [rbp+0x8]
>>     511488:       41 89 40 08             mov    DWORD PTR [r8+0x8],eax
>>     51148c:       44 0f b7 55 0c          movzx  r10d,WORD PTR [rbp+0xc]
>>     511491:       66 41 81 fa 81 00       cmp    r10w,0x81
>>
>> My only explanation to this so far is that I misunderstand something
>> about the oprofile results.
>>
>> Regards,
>>
>> Zoltan
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list