[ovs-discuss] Fwd: [ovs-dpdk] bandwidth issue of vhostuserclient virtio ovs-dpdk

Lam, Tiago tiago.lam at intel.com
Thu Dec 6 16:52:45 UTC 2018


On 03/12/2018 10:18, LIU Yulong wrote:
> 
> 
> On Sat, Dec 1, 2018 at 1:17 AM LIU Yulong <liuyulong.xa at gmail.com
> <mailto:liuyulong.xa at gmail.com>> wrote:
> 
> 
> 
>     On Fri, Nov 30, 2018 at 5:36 PM Lam, Tiago <tiago.lam at intel.com
>     <mailto:tiago.lam at intel.com>> wrote:
> 
>         On 30/11/2018 02:07, LIU Yulong wrote:
>         > Hi,
>         >
>         > Thanks for the reply, please see my inline comments below.
>         >
>         >
>         > On Thu, Nov 29, 2018 at 6:00 PM Lam, Tiago
>         <tiago.lam at intel.com <mailto:tiago.lam at intel.com>
>         > <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>> wrote:
>         >
>         >     On 29/11/2018 08:24, LIU Yulong wrote:
>         >     > Hi,
>         >     >
>         >     > We recently tested ovs-dpdk, but we met some bandwidth
>         issue. The
>         >     bandwidth
>         >     > from VM to VM was not close to the physical NIC, it's about
>         >     4.3Gbps on a
>         >     > 10Gbps NIC. For no dpdk (virtio-net) VMs, the iperf3
>         test can easily
>         >     > reach 9.3Gbps. We enabled the virtio multiqueue for all
>         guest VMs.
>         >     In the
>         >     > dpdk vhostuser guest, we noticed that the interrupts are
>         >     centralized to
>         >     > only one queue. But for no dpdk VM, interrupts can hash
>         to all queues.
>         >     > For those dpdk vhostuser VMs, we also noticed that the
>         PMD usages were
>         >     > also centralized to one no matter server(tx) or
>         client(rx). And no
>         >     matter
>         >     > one PMD or multiple PMDs, this behavior always exists.
>         >     >
>         >     > Furthuremore, my colleague add some systemtap hook on
>         the openvswitch
>         >     > function, he found something interesting. The function
>         >     > __netdev_dpdk_vhost_send will send all the packets to one
>         >     virtionet-queue.
>         >     > Seems that there are some algorithm/hash table/logic
>         does not do
>         >     the hash
>         >     > very well. 
>         >     >
>         >
>         >     Hi,
>         >
>         >     When you say "no dpdk VMs", you mean that within your VM
>         you're relying
>         >     on the Kernel to get the packets, using virtio-net. And
>         when you say
>         >     "dpdk vhostuser guest", you mean you're using DPDK inside
>         the VM to get
>         >     the packets. Is this correct?
>         >
>         >
>         > Sorry for the inaccurate description. I'm really new to DPDK. 
>         > No DPDK inside VM, all these settings are for host only.
>         > (`host` means the hypervisor physical machine in the
>         perspective of
>         > virtualization.
>         > On the other hand `guest` means the virtual machine.)
>         > "no dpdk VMs" means the host does not setup DPDK (ovs is
>         working in
>         > traditional way),
>         > the VMs were boot on that. Maybe a new name `VMs-on-NO-DPDK-host`?
> 
>         Got it. Your "no dpdk VMs" really is referred to as OvS-Kernel,
>         while
>         your "dpdk vhostuser guest" is referred to as OvS-DPDK.
> 
>         >
>         >     If so, could you also tell us which DPDK app you're using
>         inside of
>         >     those VMs? Is it testpmd? If so, how are you setting the
>         `--rxq` and
>         >     `--txq` args? Otherwise, how are you setting those in your
>         app when
>         >     initializing DPDK?
>         >
>         >
>         > Inside VM, there is no DPDK app, VM kernel also
>         > does not set any config related to DPDK. `iperf3` is the tool for
>         > bandwidth testing.
>         >
>         >     The information below is useful in telling us how you're
>         setting your
>         >     configurations in OvS, but we are still missing the
>         configurations
>         >     inside the VM.
>         >
>         >     This should help us in getting more information,
>         >
>         >
>         > Maybe you have noticed that, we only setup one PMD in the pasted
>         > configurations.
>         > But VM has 8 queues. Should the pmd quantity match the queues?
> 
>         It shouldn't match the queues inside the VM per say. But in this
>         case,
>         since you have configured 8 rx queues on your physical NICs as
>         well, and
>         since you're looking for higher throughputs, you should increase
>         that
>         number of PMDs and pin those rxqs - take a look at [1] on how to do
>         that. Later on, increasing the size of your queues could also help.
> 
> 
>     I'll test it. 
>     Yes, as you noticed that the vhostuserclient  port has n_rxq="8",
>     options:
>     {n_rxq="8",vhost-server-path="/var/lib/vhost_sockets/vhu76f9a623-9f"}.
>     And the physical NIC has both n_rxq="8", n_txq="8".
>     options: {dpdk-devargs="0000:01:00.0", n_rxq="8", n_txq="8"}
>     options: {dpdk-devargs="0000:05:00.1", n_rxq="8", n_txq="8"}
>     But, furthermore, when remove such configuration for
>     vhostuserclient  port and physical NIC,
>     the bandwidth is same to 4.3Gbps no matter one PMD or multiple PMDs.
> 
> 
> Bad news, the bandwidth does not increase so much, it's about ~4.9Gps -
> 5.3Gbps.
> The followings are the new configrations. VM still has 8 queues.
> But now I have 4 PMDs.
> 
> # ovs-vsctl get interface nic-10G-1 other_config
> {pmd-rxq-affinity="0:2,1:4,3:20"}
> # ovs-vsctl get interface nic-10G-2 other_config
> {pmd-rxq-affinity="0:2,1:4,3:20"}
> # ovs-vsctl get interface vhuc8febeff-56 other_config
> {pmd-rxq-affinity="0:2,1:4,3:20"}
> 
> # ovs-appctl dpif-netdev/pmd-rxq-show
> pmd thread numa_id 0 core_id 2:
>         isolated : true
>         port: nic-10G-1         queue-id:  0    pmd usage:  0 %
>         port: nic-10G-2         queue-id:  0    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  0    pmd usage:  0 %
> pmd thread numa_id 0 core_id 4:
>         isolated : true
>         port: nic-10G-1         queue-id:  1    pmd usage:  0 %
>         port: nic-10G-2         queue-id:  1    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  1    pmd usage:  0 %
> pmd thread numa_id 0 core_id 8:
>         isolated : false
>         port: nic-10G-1         queue-id:  2    pmd usage:  0 %
>         port: nic-10G-2         queue-id:  2    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  2    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  4    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  5    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  6    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  7    pmd usage:  0 %
> pmd thread numa_id 0 core_id 20:
>         isolated : true
>         port: nic-10G-1         queue-id:  3    pmd usage:  0 %
>         port: nic-10G-2         queue-id:  3    pmd usage:  0 %
>         port: vhuc8febeff-56    queue-id:  3    pmd usage:  0 %
> 
> 
> # ovs-vsctl show
> ...
>         Port dpdkbond
>             Interface "nic-10G-2"
>                 type: dpdk
>                 options: {dpdk-devargs="0000:05:00.1",
> mtu_request="9000", n_rxq="4", n_txq="4"}
>             Interface "nic-10G-1"
>                 type: dpdk
>                 options: {dpdk-devargs="0000:01:00.0",
> mtu_request="9000", n_rxq="4", n_txq="4"}
>         Port br-ex
>             Interface br-ex
>                 type: internal
>     Bridge br-int
>         Controller "tcp:127.0.0.1:6633 <http://127.0.0.1:6633>"
>             is_connected: true
>         fail_mode: secure
>         Port int-br-ex
>             Interface int-br-ex
>                 type: patch
>                 options: {peer=phy-br-ex}
>         Port br-int
>             Interface br-int
>                 type: internal
>         Port "vhuc8febeff-56"
>             tag: 1
>             Interface "vhuc8febeff-56"
>                 type: dpdkvhostuserclient
>                 options: {n_rxq="4", n_txq="4",
> vhost-server-path="/var/lib/vhost_sockets/vhuc8febeff-56"}
> 
>  
> 
>      
> 
>         Just as a curiosity, I see you have a configured MTU of 1500B on the
>         physical interfaces. Is that the same MTU you're using inside
>         the VM?
>         And are you using the same configurations (including that 1500B MTU)
>         when running your OvS-Kernel setup?
> 
> 
>     MTU inside VM is 1450. Is that OK for the high throughput?
> 
> 
> Inside VM the MTU is 1500, the dpdk physical NIC (OvS-Kernel) is 9000 now.
> Bandwidth is ~5.1Gbps now.

Thanks for the update.

This means you will still be segmenting the packets inside the VM at
1500B, meaning you'll transmit smaller segments rather than larger
frames at 9000B. Could you try setting the MTU inside the VM to 9000B as
well?

That way you would transmit ~9000B sized packets all the way and get
higher throughputs. How much higher, though, I'm not sure. I'll set up a
similar setup and see what I can get as well, but would be good to know
what you'd be able to get with that.

But after this you might be walking on the limits here, and after that
features such as TSO [1] would help considerably.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-August/350832.html

Tiago.


More information about the discuss mailing list