[ovs-discuss] Fwd: [ovs-dpdk] bandwidth issue of vhostuserclient virtio ovs-dpdk

Lam, Tiago tiago.lam at intel.com
Fri Nov 30 09:36:27 UTC 2018


On 30/11/2018 02:07, LIU Yulong wrote:
> Hi,
> 
> Thanks for the reply, please see my inline comments below.
> 
> 
> On Thu, Nov 29, 2018 at 6:00 PM Lam, Tiago <tiago.lam at intel.com
> <mailto:tiago.lam at intel.com>> wrote:
> 
>     On 29/11/2018 08:24, LIU Yulong wrote:
>     > Hi,
>     >
>     > We recently tested ovs-dpdk, but we met some bandwidth issue. The
>     bandwidth
>     > from VM to VM was not close to the physical NIC, it's about
>     4.3Gbps on a
>     > 10Gbps NIC. For no dpdk (virtio-net) VMs, the iperf3 test can easily
>     > reach 9.3Gbps. We enabled the virtio multiqueue for all guest VMs.
>     In the
>     > dpdk vhostuser guest, we noticed that the interrupts are
>     centralized to
>     > only one queue. But for no dpdk VM, interrupts can hash to all queues.
>     > For those dpdk vhostuser VMs, we also noticed that the PMD usages were
>     > also centralized to one no matter server(tx) or client(rx). And no
>     matter
>     > one PMD or multiple PMDs, this behavior always exists.
>     >
>     > Furthuremore, my colleague add some systemtap hook on the openvswitch
>     > function, he found something interesting. The function
>     > __netdev_dpdk_vhost_send will send all the packets to one
>     virtionet-queue.
>     > Seems that there are some algorithm/hash table/logic does not do
>     the hash
>     > very well. 
>     >
> 
>     Hi,
> 
>     When you say "no dpdk VMs", you mean that within your VM you're relying
>     on the Kernel to get the packets, using virtio-net. And when you say
>     "dpdk vhostuser guest", you mean you're using DPDK inside the VM to get
>     the packets. Is this correct?
> 
> 
> Sorry for the inaccurate description. I'm really new to DPDK. 
> No DPDK inside VM, all these settings are for host only.
> (`host` means the hypervisor physical machine in the perspective of
> virtualization.
> On the other hand `guest` means the virtual machine.)
> "no dpdk VMs" means the host does not setup DPDK (ovs is working in
> traditional way),
> the VMs were boot on that. Maybe a new name `VMs-on-NO-DPDK-host`?

Got it. Your "no dpdk VMs" really is referred to as OvS-Kernel, while
your "dpdk vhostuser guest" is referred to as OvS-DPDK.

> 
>     If so, could you also tell us which DPDK app you're using inside of
>     those VMs? Is it testpmd? If so, how are you setting the `--rxq` and
>     `--txq` args? Otherwise, how are you setting those in your app when
>     initializing DPDK?
> 
> 
> Inside VM, there is no DPDK app, VM kernel also
> does not set any config related to DPDK. `iperf3` is the tool for
> bandwidth testing.
> 
>     The information below is useful in telling us how you're setting your
>     configurations in OvS, but we are still missing the configurations
>     inside the VM.
> 
>     This should help us in getting more information,
> 
> 
> Maybe you have noticed that, we only setup one PMD in the pasted
> configurations.
> But VM has 8 queues. Should the pmd quantity match the queues?

It shouldn't match the queues inside the VM per say. But in this case,
since you have configured 8 rx queues on your physical NICs as well, and
since you're looking for higher throughputs, you should increase that
number of PMDs and pin those rxqs - take a look at [1] on how to do
that. Later on, increasing the size of your queues could also help.

Just as a curiosity, I see you have a configured MTU of 1500B on the
physical interfaces. Is that the same MTU you're using inside the VM?
And are you using the same configurations (including that 1500B MTU)
when running your OvS-Kernel setup?

Hope this helps,

Tiago.

[1]
http://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#port-rx-queue-assigment-to-pmd-threads

> 
>     Tiago.
> 
>     > So I'd like to find some help from the community. Maybe I'm
>     missing some
>     > configrations.
>     >
>     > Thanks.
>     >
>     >
>     > Here is the list of the environment and some configrations:
>     > # uname -r
>     > 3.10.0-862.11.6.el7.x86_64
>     > # rpm -qa|grep dpdk
>     > dpdk-17.11-11.el7.x86_64
>     > # rpm -qa|grep openvswitch
>     > openvswitch-2.9.0-3.el7.x86_64
>     > # ovs-vsctl list open_vswitch
>     > _uuid               : a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e
>     > bridges             : [531e4bea-ce12-402a-8a07-7074c31b978e,
>     > 5c1675e2-5408-4c1f-88bc-6d9c9b932d47]
>     > cur_cfg             : 1305
>     > datapath_types      : [netdev, system]
>     > db_version          : "7.15.1"
>     > external_ids        : {hostname="cq01-compute-10e112e5e140",
>     > rundir="/var/run/openvswitch",
>     > system-id="e2cc84fe-a3c8-455f-8c64-260741c141ee"}
>     > iface_types         : [dpdk, dpdkr, dpdkvhostuser,
>     dpdkvhostuserclient,
>     > geneve, gre, internal, lisp, patch, stt, system, tap, vxlan]
>     > manager_options     : [43803994-272b-49cb-accc-ab672d1eefc8]
>     > next_cfg            : 1305
>     > other_config        : {dpdk-init="true", dpdk-lcore-mask="0x1",
>     > dpdk-socket-mem="1024,1024", pmd-cpu-mask="0x100000",
>     > vhost-iommu-support="true"}
>     > ovs_version         : "2.9.0"
>     > ssl                 : []
>     > statistics          : {}
>     > system_type         : centos
>     > system_version      : "7"
>     > # lsmod |grep vfio
>     > vfio_pci               41312  2 
>     > vfio_iommu_type1       22300  1 
>     > vfio                   32695  7 vfio_iommu_type1,vfio_pci
>     > irqbypass              13503  23 kvm,vfio_pci
>     >
>     > # ovs-appctl dpif/show
>     > netdev at ovs-netdev: hit:759366335 missed:754283
>     > br-ex:
>     > bond1108 4/6: (tap)
>     > br-ex 65534/3: (tap)
>     > nic-10G-1 5/4: (dpdk: configured_rx_queues=8,
>     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>     > requested_txq_descriptors=2048, rx_csum_offload=true)
>     > nic-10G-2 6/5: (dpdk: configured_rx_queues=8,
>     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>     > requested_txq_descriptors=2048, rx_csum_offload=true)
>     > phy-br-ex 3/none: (patch: peer=int-br-ex)
>     > br-int:
>     > br-int 65534/2: (tap)
>     > int-br-ex 1/none: (patch: peer=phy-br-ex)
>     > vhu76f9a623-9f 2/1: (dpdkvhostuserclient: configured_rx_queues=8,
>     > configured_tx_queues=8, mtu=1500, requested_rx_queues=8,
>     > requested_tx_queues=8)
>     >
>     > # ovs-appctl dpctl/show -s
>     > netdev at ovs-netdev:
>     > lookups: hit:759366335 missed:754283 lost:72
>     > flows: 186
>     > port 0: ovs-netdev (tap)
>     > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>     > TX packets:0 errors:0 dropped:0 aborted:0 carrier:0
>     > collisions:0
>     > RX bytes:0  TX bytes:0
>     > port 1: vhu76f9a623-9f (dpdkvhostuserclient: configured_rx_queues=8,
>     > configured_tx_queues=8, mtu=1500, requested_rx_queues=8,
>     > requested_tx_queues=8)
>     > RX packets:718391758 errors:0 dropped:0 overruns:? frame:?
>     > TX packets:30372410 errors:? dropped:719200 aborted:? carrier:?
>     > collisions:?
>     > RX bytes:1086995317051 (1012.3 GiB)  TX bytes:2024893540 (1.9 GiB)
>     > port 2: br-int (tap)
>     > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>     > TX packets:1393992 errors:0 dropped:4 aborted:0 carrier:0
>     > collisions:0
>     > RX bytes:0  TX bytes:2113616736 (2.0 GiB)
>     > port 3: br-ex (tap)
>     > RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>     > TX packets:6660091 errors:0 dropped:967 aborted:0 carrier:0
>     > collisions:0
>     > RX bytes:0  TX bytes:2451440870 (2.3 GiB)
>     > port 4: nic-10G-1 (dpdk: configured_rx_queues=8,
>     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>     > requested_txq_descriptors=2048, rx_csum_offload=true)
>     > RX packets:36409466 errors:0 dropped:0 overruns:? frame:?
>     > TX packets:718371472 errors:0 dropped:20276 aborted:? carrier:?
>     > collisions:?
>     > RX bytes:2541593983 (2.4 GiB)  TX bytes:1089838136919 (1015.0 GiB)
>     > port 5: nic-10G-2 (dpdk: configured_rx_queues=8,
>     > configured_rxq_descriptors=2048, configured_tx_queues=2,
>     > configured_txq_descriptors=2048, mtu=1500, requested_rx_queues=8,
>     > requested_rxq_descriptors=2048, requested_tx_queues=2,
>     > requested_txq_descriptors=2048, rx_csum_offload=true)
>     > RX packets:5319466 errors:0 dropped:0 overruns:? frame:?
>     > TX packets:0 errors:0 dropped:0 aborted:? carrier:?
>     > collisions:?
>     > RX bytes:344903551 (328.9 MiB)  TX bytes:0
>     > port 6: bond1108 (tap)
>     > RX packets:228 errors:0 dropped:0 overruns:0 frame:0
>     > TX packets:5460 errors:0 dropped:18 aborted:0 carrier:0
>     > collisions:0
>     > RX bytes:21459 (21.0 KiB)  TX bytes:341087 (333.1 KiB)
>     >
>     > # ovs-appctl dpif-netdev/pmd-stats-show
>     > pmd thread numa_id 0 core_id 20:
>     > packets received: 760120690
>     > packet recirculations: 0
>     > avg. datapath passes per packet: 1.00
>     > emc hits: 750787577
>     > megaflow hits: 8578758
>     > avg. subtable lookups per megaflow hit: 1.05
>     > miss with success upcall: 754283
>     > miss with failed upcall: 72
>     > avg. packets per output batch: 2.21
>     > idle cycles: 210648140144730 (99.13%)
>     > processing cycles: 1846745927216 (0.87%)
>     > avg cycles per packet: 279554.14 (212494886071946/760120690)
>     > avg processing cycles per packet: 2429.54 (1846745927216/760120690)
>     > main thread:
>     > packets received: 0
>     > packet recirculations: 0
>     > avg. datapath passes per packet: 0.00
>     > emc hits: 0
>     > megaflow hits: 0
>     > avg. subtable lookups per megaflow hit: 0.00
>     > miss with success upcall: 0
>     > miss with failed upcall: 0
>     > avg. packets per output batch: 0.00
>     >
>     > # ovs-appctl dpif-netdev/pmd-rxq-show
>     > pmd thread numa_id 0 core_id 20:
>     > isolated : false
>     > port: nic-10G-1       queue-id:  0pmd usage:  0 %
>     > port: nic-10G-1       queue-id:  1pmd usage:  0 %
>     > port: nic-10G-1       queue-id:  2pmd usage:  0 %
>     > port: nic-10G-1       queue-id:  3pmd usage:  0 %
>     > port: nic-10G-1       queue-id:  4pmd usage:  0 %
>     > port: nic-10G-1       queue-id:  5pmd usage:  0 %
>     > port: nic-10G-1       queue-id:  6pmd usage:  0 %
>     > port: nic-10G-1       queue-id:  7pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  0pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  1pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  2pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  3pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  4pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  5pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  6pmd usage:  0 %
>     > port: nic-10G-2       queue-id:  7pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  0pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  1pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  2pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  3pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  4pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  5pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  6pmd usage:  0 %
>     > port: vhu76f9a623-9f  queue-id:  7pmd usage:  0 %
>     >
>     >
>     > # virsh dumpxml instance-5c5191ff-c1a2-4429-9a8b-93ddd939583d
>     > ...
>     >     <interface type='vhostuser'>
>     >       <mac address='fa:16:3e:77:ab:fb'/>
>     >       <source type='unix' path='/var/lib/vhost_sockets/vhu76f9a623-9f'
>     > mode='server'/>
>     >       <target dev='vhu76f9a623-9f'/>
>     >       <model type='virtio'/>
>     >       <driver name='vhost' queues='8'/>
>     >       <alias name='net0'/>
>     >       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
>     > function='0x0'/>
>     >     </interface>
>     > ...
>     >
>     > # ovs-vsctl show
>     > a6a3d9eb-28a8-4bf0-a8b4-94577b5ffe5e
>     >     Manager "ptcp:6640:127.0.0.1"
>     >         is_connected: true
>     >     Bridge br-int
>     >         Controller "tcp:127.0.0.1:6633 <http://127.0.0.1:6633>
>     <http://127.0.0.1:6633>"
>     >             is_connected: true
>     >         fail_mode: secure
>     >         Port int-br-ex
>     >             Interface int-br-ex
>     >                 type: patch
>     >                 options: {peer=phy-br-ex}
>     >         Port br-int
>     >             Interface br-int
>     >                 type: internal
>     >         Port "vhu76f9a623-9f"
>     >             tag: 1
>     >             Interface "vhu76f9a623-9f"
>     >                 type: dpdkvhostuserclient
>     >                 options: {n_rxq="8",
>     > vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}
>     >     Bridge br-ex
>     >         Controller "tcp:127.0.0.1:6633 <http://127.0.0.1:6633>
>     <http://127.0.0.1:6633>"
>     >             is_connected: true
>     >         fail_mode: secure
>     >         Port dpdkbond
>     >             Interface "nic-10G-1"
>     >                 type: dpdk
>     >                 options: {dpdk-devargs="0000:01:00.0", n_rxq="8",
>     n_txq="8"}
>     >             Interface "nic-10G-2"
>     >                 type: dpdk
>     >                 options: {dpdk-devargs="0000:05:00.1", n_rxq="8",
>     n_txq="8"}
>     >         Port phy-br-ex
>     >             Interface phy-br-ex
>     >                 type: patch
>     >                 options: {peer=int-br-ex}
>     >         Port br-ex
>     >             Interface br-ex
>     >                 type: internal
>     >
>     > # numactl --hardware
>     > available: 2 nodes (0-1)
>     > node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
>     > node 0 size: 130978 MB
>     > node 0 free: 7539 MB
>     > node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
>     > node 1 size: 131072 MB
>     > node 1 free: 6886 MB
>     > node distances:
>     > node   0   1 
>     >   0:  10  21 
>     >   1:  21  10
>     >
>     > # grep HugePages_ /proc/meminfo
>     > HugePages_Total:     232
>     > HugePages_Free:       10
>     > HugePages_Rsvd:        0
>     > HugePages_Surp:        0
>     >
>     >
>     > # cat /proc/cmdline 
>     > BOOT_IMAGE=/boot/vmlinuz-3.10.0-862.11.6.el7.x86_64
>     > root=UUID=220ee106-5e00-4809-91a0-641e045a4c21 ro
>     > intel_idle.max_cstate=0 crashkernel=auto rhgb quiet
>     > default_hugepagesz=1G hugepagesz=1G hugepages=232 iommu=pt
>     intel_iommu=on
>     >
>     >
>     > Best regards,
>     > LIU Yulong
>     >
>     > _______________________________________________
>     > discuss mailing list
>     > discuss at openvswitch.org <mailto:discuss at openvswitch.org>
>     > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>     >
> 
> 
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 


More information about the discuss mailing list