[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Fri Aug 23 16:59:35 UTC 2019


On 23.08.2019 19:08, William Tu wrote:
> On Wed, Aug 21, 2019 at 2:31 AM Eelco Chaudron <echaudro at redhat.com> wrote:
>>
>>
>>
>>>>> William, Eelco, which HW NIC you're using? Which kernel driver?
>>>>
>>>> I’m using the below on the latest bpf-next driver:
>>>>
>>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>>>> SFI/SFP+ Network Connection (rev 01)
>>>> 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>>>> SFI/SFP+ Network Connection (rev 01)
>>>
>>> Thanks for information.
>>> I found one suspicious place inside the ixgbe driver that could break
>>> the completion queue ring and prepared a patch:
>>>     https://protect2.fireeye.com/url?k=ac2418ed930ec67f.ac2593a2-94283087c2dd9833&u=https://patchwork.ozlabs.org/patch/1150244/
>>>
>>> It'll be good if you can test it.
>>
>> Hi Ilya, I was doping some testing of my own, and also concluded it was
>> in the drivers' completion ring. I noticed after sending 512 packets the
>> drivers TX counters kept increasing, which looks related to your fix.
>>
>> Will try it out, and sent results to the upstream mailing list…
>>
>> Thanks,
>>
>> Eelco
> 
> Hi,
> 
> I'm comparing the performance of netdev-afxdp.c on current master and
> the DPDK's AF_XDP implementation in OVS dpdk-latest branch.
> I'm using ixgbe and doing physical port to physical port forwarding, sending
> 64 byte packets, with OpenFlow rule:
>   ovs-ofctl  add-flow br0  "in_port=eth2, actions=output:eth3"
> 
> In short
> A. OVS's netdev-afxdp: 6.1Mpps
> B. OVS-DPDK  AF_XDP pmd: 8Mpps
> So I start to think about how to optimize lib/netdev-afxdp.c. Any comments are
> welcomed! Below is the analysis:

One major difference is that DPDK implementation supports XDP_USE_NEED_WAKEUP
and it will be in use if you're building kernel from latest bpf-next tree.
This allowes to significantly decrease number of syscalls.
According to below perf stats, OVS implementation unlike dpdk one wastes ~11%
of time inside the kernel and this could be fixed by need_wakeup feature.

BTW, there are a lot of pmd threads in case A, but only one in case B.
Was the test setup really equal?

Best regards, Ilya Maximets.

> 
> A. OVS netdev-afxdp Physical to physical 6.1Mpps
> # pstree -p 702
> ovs-vswitchd(702)-+-{ct_clean1) S 1 7(706)
>                   |-{handler4}(712)
>                   |-{ipf_clean2}(707)
>                   |-{pmd6}(790)
>                   |-{pmd7}(791)
>                   |-{pmd8}(792)
>                   |-{pmd9}(793)
>                 |-{revalidator5}(713)
>                   `-{urcu3}(708)
> 
> # ovs-appctl  dpif-netdev/pmd-stats-show
> pmd thread numa_id 0 core_id 6:
>   packets received: 92290351
>   packet recirculations: 0
>   avg. datapath passes per packet: 1.00
>   emc hits: 92290319
>   smc hits: 0
>   megaflow hits: 31
>   avg. subtable lookups per megaflow hit: 1.00
>   miss with success upcall: 1
>   miss with failed upcall: 0
>   avg. packets per output batch: 31.88
>   idle cycles: 20835727677 (34.86%)           --> pretty high!?
>   processing cycles: 38932097052 (65.14%)
>   avg cycles per packet: 647.61 (59767824729/92290351)
>   avg processing cycles per packet: 421.84 (38932097052/92290351)
> 
> # ./perf record -t 790 sleep 10
>   13.80%  pmd6 ovs-vswitchd        [.] miniflow_extract
>   13.58%  pmd6 ovs-vswitchd        [.] __netdev_afxdp_batch_send
>    9.64%  pmd6   ovs-vswitchd        [.] dp_netdev_input__
>    9.07%  pmd6   ovs-vswitchd        [.] dp_packet_init__
>    8.91%  pmd6   ovs-vswitchd        [.] netdev_afxdp_rxq_recv
>    7.40%  pmd6   ovs-vswitchd        [.] miniflow_hash_5tuple
>    5.32%  pmd6   libc-2.23.so        [.] __memcpy_avx_unaligned
>    4.60%  pmd6   [kernel.vmlinux]    [k] do_syscall_64
>    3.72%  pmd6   ovs-vswitchd        [.] dp_packet_use_afxdp    -->
> maybe optimize?
>    2.74%  pmd6   libpthread-2.23.so  [.] __pthread_enable_asynccancel
>    2.43%  pmd6   [kernel.vmlinux]    [k] fput_many
>    2.18%  pmd6   libc-2.23.so        [.] __memcmp_sse4_1
>    2.06%  pmd6   [kernel.vmlinux]    [k] entry_SYSCALL_64
>    1.79%  pmd6   [kernel.vmlinux]    [k] syscall_return_via_sysret
>    1.71%  pmd6   ovs-vswitchd        [.] dp_execute_cb
>    1.03%  pmd6   ovs-vswitchd        [.] non_atomic_ullong_add
>    0.86%  pmd6   ovs-vswitchd        [.]dp_netdev_pmd_flush_output_on_port
> 
> B. OVS-DPDK afxdp using dpdk-latest 8Mpps
> ovs-vswitchd(19462)-+-{ct_clean3}(19470)
>                     |-{dpdk_watchdog1}(19468)
>                     |-{eal-intr-thread}(19466)
>                     |-{handler16}(19501)
>                     |-{handler17}(19505)
>                     |-{handler18}(19506)
>                     |-{handler19}(19507)
>                     |-{handler20}(19508)
>                     |-{handler22}(19502)
>                     |-{handler24}(19504)
>                     |-{handler26}(19503)
>                     |-{ipf_clean4}(19471)
>                     |-{pmd27}(19536)
>                     |-{revalidator21}(19509)
>                     |-{revalidator23}(19511)
>                     |-{revalidator25}(19510)
>                     |-{rte_mp_handle}(19467)
>                     `-{urcu2}(19469)
> 
> # ovs-appctl  dpif-netdev/pmd-stats-show
> pmd thread numa_id 0 core_id 11:
>   packets received: 1813689117
>   packet recirculations: 0
>   avg. datapath passes per packet: 1.00
>   emc hits: 1813689053
>   smc hits: 0
>   megaflow hits: 63
>   avg. subtable lookups per megaflow hit: 1.00
>   miss with success upcall: 1
>   miss with failed upcall: 0
>   avg. packets per output batch: 31.85
>   idle cycles: 13848892341 (2.50%)
>   processing cycles: 541064826249 (97.50%)
>   avg cycles per packet: 305.96 (554913718590/1813689117)
>   avg processing cycles per packet: 298.32 (541064826249/1813689117)
> 
> #  ./perf record -t 19536 sleep 10
>   24.84%  pmd27 ovs-vswitchd        [.] eth_af_xdp_rx
>   16.27%  pmd27 ovs-vswitchd        [.] eth_af_xdp_tx
>   13.20%  pmd27 ovs-vswitchd        [.] dp_netdev_input__
>   12.54%  pmd27 ovs-vswitchd        [.] pull_umem_cq
>   10.85%  pmd27 ovs-vswitchd        [.] miniflow_extract
>    5.67%  pmd27   ovs-vswitchd        [.] miniflow_hash_5tuple
>    3.41%  pmd27   libc-2.23.so        [.] __memcmp_sse4_1
>    2.14%  pmd27   ovs-vswitchd        [.] netdev_dpdk_rxq_recv
>    2.13%  pmd27   ovs-vswitchd        [.] dp_execute_cb
>    1.50%  pmd27   ovs-vswitchd        [.] non_atomic_ullong_add
>    1.49%  pmd27   ovs-vswitchd        [.] dp_netdev_pmd_flush_output_on_port
>    1.05%  pmd27   ovs-vswitchd        [.] netdev_dpdk_filter_packet_len
>    0.79%  pmd27   ovs-vswitchd        [.] pmd_perf_end_iteration
>    0.74%  pmd27   ovs-vswitchd        [.] dp_netdev_process_rxq_port
>    0.47%  pmd27   ovs-vswitchd        [.] memcmp at plt
>    0.42%  pmd27   ovs-vswitchd        [.] netdev_dpdk_eth_send
> 
> 


More information about the dev mailing list