[ovs-dev] 答复: Why is ovs DPDK much worse than ovs in my test case?

Ilya Maximets i.maximets at samsung.com
Thu Jul 11 07:34:57 UTC 2019


On 11.07.2019 3:27, Yi Yang (杨燚)-云服务集团 wrote:
> BTW, offload features are on in my test client1 and server1 (iperf server)
> 
> vagrant at client1:~$ ethtool -k enp0s8
> Features for enp0s8:
> rx-checksumming: on [fixed]
> tx-checksumming: on
>         tx-checksum-ipv4: off [fixed]
>         tx-checksum-ip-generic: on
>         tx-checksum-ipv6: off [fixed]
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: off [fixed]
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: off [fixed]
> tx-vlan-offload: off [fixed]
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on [fixed]
> rx-vlan-filter: on [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: on [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-ipip-segmentation: off [fixed]
> tx-sit-segmentation: off [fixed]
> tx-udp_tnl-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: on [fixed]
> hw-tc-offload: off [fixed]
> vagrant at client1:~$
> 
> vagrant at server1:~$ ifconfig enp0s8
> enp0s8    Link encap:Ethernet  HWaddr 08:00:27:c0:a6:0b
>           inet addr:192.168.230.101  Bcast:192.168.230.255  Mask:255.255.255.0
>           inet6 addr: fe80::a00:27ff:fec0:a60b/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
>           RX packets:4228443 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:2484988 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:34527894301 (34.5 GB)  TX bytes:528944799 (528.9 MB)
> 
> vagrant at server1:~$ ethtool -k enp0s8
> Features for enp0s8:
> rx-checksumming: on [fixed]
> tx-checksumming: on
>         tx-checksum-ipv4: off [fixed]
>         tx-checksum-ip-generic: on
>         tx-checksum-ipv6: off [fixed]
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: off [fixed]
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: on
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: off [fixed]
> tx-vlan-offload: off [fixed]
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on [fixed]
> rx-vlan-filter: on [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: on [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: off [fixed]
> tx-ipip-segmentation: off [fixed]
> tx-sit-segmentation: off [fixed]
> tx-udp_tnl-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: on [fixed]
> hw-tc-offload: off [fixed]
> vagrant at server1:~$
> 
> -----邮件原件-----
> 发件人: Yi Yang (杨燚)-云服务集团 
> 发送时间: 2019年7月11日 8:22
> 收件人: i.maximets at samsung.com; ovs-dev at openvswitch.org
> 抄送: Yi Yang (杨燚)-云服务集团 <yangyi01 at inspur.com>
> 主题: 答复: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case?
> 重要性: 高
> 
> Ilya, thank you so much, using 9K MTU for all the virtio interfaces in transport path does help (including DPDK port), the data is here.

8K usually works a bit better for me than 9K. Probably, because of
the page size.

Have you configured MTU for the tap interfaces on host side too just
in case that host kernel doesn't negotiate the MTU with guest?

> 
> vagrant at client1:~$ iperf -t 60 -i 10 -c 192.168.230.101
> ------------------------------------------------------------
> Client connecting to 192.168.230.101, TCP port 5001
> TCP window size:  325 KByte (default)
> ------------------------------------------------------------
> [  3] local 192.168.200.101 port 53956 connected with 192.168.230.101 port 5001
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec   315 MBytes   264 Mbits/sec
> [  3] 10.0-20.0 sec   333 MBytes   280 Mbits/sec
> [  3] 20.0-30.0 sec   300 MBytes   252 Mbits/sec
> [  3] 30.0-40.0 sec   307 MBytes   258 Mbits/sec
> [  3] 40.0-50.0 sec   322 MBytes   270 Mbits/sec
> [  3] 50.0-60.0 sec   316 MBytes   265 Mbits/sec
> [  3]  0.0-60.0 sec  1.85 GBytes   265 Mbits/sec
> vagrant at client1:~$
> 
> But it is still much worse than ovs kernel. In my test case, I used VirtualBox network, the whole transport path traverses several different VMs, every VM has turned on offload features except ovs DPDK VM, I understand tso offload should be done on send side, so when the packet is sent out from the send side or receive side, it has been segmented by tso to adapt to path MTU, so in ovs kernel VM/ovs DPDK VM, the packet size has been MTU of ovs port/DPDK port, so it needn't do tso work, right?

Not sure if I understand the question correctly, but I'll try to
clarify. I assume that all your VMs located on the same physical host.
Linux kernel is smart and it will not segment the packets until it
is unavoidable. If all the interfaces on a packet path supports TSO,
kernel will never segment packets and will always traverse 64K packets
all the way from the iperf client to iperf server.
In case of OVS with DPDK its VM doesn't support TSO. This way packets
will be splitted into segments to fit MTU before sending to that VM.

The key point here is the virtio interfaces you're using for VMs.
virtio-net is a para-virtual network interface. This means that the
guest knows that interface is virtual and it knows that host is able
to receive packets larger than MTU if offloading was negotiated.
At the same time host knows that guest is able to receive packets larger
than MTU too. So, nothing will be segmented.

In case of OVS with DPDK host knows that guest is not able to receive
packets larger than MTU and splits them before sending.

You can't send packets larger than MTU to physical network, but you
able to do that with virtual network if it was negotiated.


Best regards, Ilya Maximets.

> 
> -----邮件原件-----
> 发件人: Ilya Maximets [mailto:i.maximets at samsung.com] 
> 发送时间: 2019年7月10日 18:11
> 收件人: ovs-dev at openvswitch.org; Yi Yang (杨燚)-云服务集团 <yangyi01 at inspur.com>
> 主题: Re: [ovs-dev] Why is ovs DPDK much worse than ovs in my test case?
> 
>> Hi, all
>>
>> I just use ovs as a static router in my test case, ovs is ran in 
>> vagrant VM, ethernet interfaces uses virtio driver, I create two ovs 
>> bridges, each one adds one ethernet interface, two bridges are 
>> connected by patch port, only default openflow rule is there.
>>
>> table=0, priority=0 actions=NORMAL
>>     Bridge br-int
>>         Port patch-br-ex
>>             Interface patch-br-ex
>>                 type: patch
>>                 options: {peer=patch-br-int}
>>         Port br-int
>>             Interface br-int
>>                 type: internal
>>         Port "dpdk0"
>>             Interface "dpdk0"
>>                 type: dpdk
>>                 options: {dpdk-devargs="0000:00:08.0"}
>>     Bridge br-ex
>>         Port "dpdk1"
>>             Interface "dpdk1"
>>                 type: dpdk
>>                 options: {dpdk-devargs="0000:00:09.0"}
>>         Port patch-br-int
>>             Interface patch-br-int
>>                 type: patch
>>                 options: {peer=patch-br-ex}
>>         Port br-ex
>>             Interface br-ex
>>                 type: internal
>>
>> But when I run iperf to do performance benchmark, the result shocked me.
>>
>> For ovs nondpdk, the result is
>>
>> vagrant at client1:~$ iperf -t 60 -i 10 -c 192.168.230.101
>>
>> ------------------------------------------------------------
>> Client connecting to 192.168.230.101, TCP port 5001 TCP window size: 
>> 85.0 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 192.168.200.101 port 53900 connected with 192.168.230.101 
>> port
>> 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0-10.0 sec  1.05 GBytes   905 Mbits/sec
>> [  3] 10.0-20.0 sec  1.02 GBytes   877 Mbits/sec
>> [  3] 20.0-30.0 sec  1.07 GBytes   922 Mbits/sec
>> [  3] 30.0-40.0 sec  1.08 GBytes   927 Mbits/sec
>> [  3] 40.0-50.0 sec  1.06 GBytes   914 Mbits/sec
>> [  3] 50.0-60.0 sec  1.07 GBytes   922 Mbits/sec
>> [  3]  0.0-60.0 sec  6.37 GBytes   911 Mbits/sec
>>
>> vagrant at client1:~$
>>
>> For ovs dpdk, the bandwidth is just about 45Mbits/sec, why? I really 
>> don’t understand what happened.
>>
>> vagrant at client1:~$ iperf -t 60 -i 10 -c 192.168.230.101
>>
>> ------------------------------------------------------------
>> Client connecting to 192.168.230.101, TCP port 5001 TCP window size: 
>> 85.0 KByte (default)
>> ------------------------------------------------------------
>> [  3] local 192.168.200.101 port 53908 connected with 192.168.230.101 
>> port
>> 5001
>> [ ID] Interval       Transfer     Bandwidth
>> [  3]  0.0-10.0 sec  54.6 MBytes  45.8 Mbits/sec [  3] 10.0-20.0 sec  
>> 55.5 MBytes  46.6 Mbits/sec [  3] 20.0-30.0 sec  52.5 MBytes  44.0 
>> Mbits/sec [  3] 30.0-40.0 sec  53.6 MBytes  45.0 Mbits/sec [  3] 
>> 40.0-50.0 sec  54.0 MBytes  45.3 Mbits/sec [  3] 50.0-60.0 sec  53.9 
>> MBytes  45.2 Mbits/sec
>> [  3]  0.0-60.0 sec   324 MBytes  45.3 Mbits/sec
>>
>> vagrant at client1:~$
>>
>> By the way, I tried to pin physical cores to qemu processes which 
>> correspond to ovs pmd threads, but it hardly affects on performance.
>>
>>   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
>> P
>> 16303 yangyi     20   0   9207120 209700 107500 R 99.9  0.1       63:02.37
>> EMT-1          1
>> 16304 yangyi     20   0   9207120 209700 107500 R 99.9  0.1       69:16.16
>> EMT-2          2
> 
> 
> Hi.
> There might be a lot of reasons for a bad performance, but the most likely this is just because of disabled offloading capabilities on the VM interface (TSO mostly).
> 
> Try using UDP flow for testing. You should have almost same results for kernel and DPDK in UDP case. Try:
>     # iperf -t 60 -i 10 -c 192.168.230.101 -u -b 10G -l 1460
> 
> The bottleneck in your setup is the tap interface that connects VM with the host network, that is why I'm expecting similar results in both cases.
> 
> In case of kernel OVS, the tap interface on host will have TSO and checksum offloading enabled so iperf will use huge 64K packets that will never be segmented, since everything happens on the same host and all kernels (guest and host) by default has TSO and checksum offloading support.
> While using OVS with DPDK in guest, tap interface on host will have no TSO/chksum offloading and the host kernel will have to split each 64K TCP packet into MTU sized frames and re-calculate checksums for all the chunks. This significantly slows down everything.
> 
> To partially mitigate the issue with TCP you could increase the MTU to 8K on all your interfaces (all the host and gest interfaces ). Use 'mtu_request' to set the MTU for DPDK ports. This should give a good performance boost in DPDK case.
> 
> Best regards, Ilya Maximets.
> 


More information about the dev mailing list