[ovs-discuss] bug in ovs-vswitchd?!
Mechthild Buescher
mechthild.buescher at ericsson.com
Mon Jul 11 19:24:52 UTC 2016
Hi Aaron,
Sorry for being unclear regarding the VM: I meant the DPDK usage inside the VM. So, the fault happens when using the VM. Inside the VM I can either bind the interfaces to DPDK or to linux - in both cases, the fault occurs.
And I haven't applied any patch. I used the latest available version from the master branch - I don't know whether any patch is upstreamed to the master branch.
Thanks in advance for your help,
BR/Mechthild
-----Original Message-----
From: Aaron Conole [mailto:aconole at redhat.com]
Sent: Monday, July 11, 2016 7:22 PM
To: Mechthild Buescher
Cc: Stokes, Ian; bugs at openvswitch.org
Subject: Re: [ovs-discuss] bug in ovs-vswitchd?!
Mechthild Buescher <mechthild.buescher at ericsson.com> writes:
> Hi Ian,
>
> Thanks for the fast reply! I also did some further investigations
> where I could see that ovs-vswitchd usually keeps alive for receiving
> packets but crashes for sending packets.
>
> Regarding your question:
> 1. We are running 1 VM with 2 vhost ports (in the simplified setup; in
> the complete setup we use 1 VM & 5 vhost ports)
>
> 2. We are using libvirt to start the VM which is configured to use qemu:
> /usr/bin/qemu-system-x86_64 -name guest=ubuntu11_try,debug-threads=on
> -S -machine pc-i440fx-wily,accel=kvm,usb=off -cpu host -m 8192
> -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object
> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge_1G/li
> bvirt/qemu,share=yes,size=8589934592
> -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid
> 8a2ad7a3-9da1-4c69-a2ff-c7a680d9bc4a -no-user-config -nodefaults
> -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-300-ubuntu11_t
> ry/monitor.sock,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> -boot strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
> file=/root/perf/vms/ubuntu11.qcow2,format=qcow2,if=none,id=drive-virti
> o-disk0
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id
> =virtio-disk0,bootindex=1
> -netdev tap,fd=21,id=hostnet0 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:92:47,bus=pci.0
> ,addr=0x3
> -netdev tap,fd=23,id=hostnet1 -device
> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:3c:a3:47,bus=pci.0
> ,addr=0x4 -chardev
> socket,id=charnet2,path=/var/run/openvswitch/vhost111 -netdev
> type=vhost-user,id=hostnet2,chardev=charnet2 -device
> virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:a0:11:02,bus=pci.0
> ,addr=0x5 -chardev
> socket,id=charnet3,path=/var/run/openvswitch/vhost112 -netdev
> type=vhost-user,id=hostnet3,chardev=charnet3 -device
> virtio-net-pci,netdev=hostnet3,id=net3,mac=52:54:00:a0:11:03,bus=pci.0
> ,addr=0x6
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
>
> 3. In the VM we have both kind of interface bindings, virtio-pci and
> igb_uio. For both type of interfaces, the crash of ovs-vswitchd can be
> observed (The VM is still alive).
>
> 4. The ovs-vswitchd is started as follows and is configured to use vxlan tunnel:
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> ovs-vsctl --no-wait set Open_vSwitch .
> other_config:dpdk-lcore-mask=0x1 ovs-vsctl --no-wait set Open_vSwitch
> . other_config:dpdk-socket-mem=4096,0
> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=6
> ovs-vsctl --no-wait set Interface dpdk0 options:n_rxq=2 ovs-vsctl
> --no-wait set Interface vhost111 options:n_rxq=1 ovs-vsctl --no-wait
> set Interface vhost112 options:n_rxq=1 ovs-vsctl --no-wait set
> Open_vSwitch . other_config:vhost-sock-permissions=766
Have you, perchance, applied some extra patches? This was proposed, but not accepted, as a possible workaround for a permissions issue with ovs dpdk.
> ovs-vswitchd --pidfile=$DB_PID --detach --monitor --log-file=$LOG_FILE
> -vfile:dbg --no-chdir -vconsole:emer --mlockall unix:$DB_SOCK
>
> ovs-vsctl add-port br-int vhost111 -- set Interface vhost111
> type=dpdkvhostuser ofport_request=11 ovs-vsctl add-port br-int
> vhost112 -- set Interface vhost112 type=dpdkvhostuser
> ofport_request=12 ovs-vsctl add-br br-int -- set bridge br-int
> datapath_type=netdev ovs-vsctl set Bridge br-int
> other_config:datapath-id=0000f2b811144f41
> ovs-vsctl set Bridge br-int protocols=OpenFlow13 ovs-vsctl add-port
> br-int vxlan0 -- set interface vxlan0 type=vxlan
> options:remote_ip=10.1.2.2 options:key=flow ofport_request=100
>
> 5. The ovs-log is attached - it contains the log from start to crash
> (with debug information). The crash has been provoked by setting up an
> virtio-pci in the VM, so no DPDK is used in the VM for this scenario.
>
> 6. The DPDK versions are:
> Host: dpdk 16.04 latest commit
> b3b9719f18ee83773c6ed7adda300c5ac63c37e9
> VM: (not used in this scenario) dpdk 2.2.0
For confirmation, this happens whether or not you use a VM? I just want to make sure. It's usually best to pair dpdk versions whenever possible.
> BR/Mechthild
>
> -----Original Message-----
> From: Stokes, Ian [mailto:ian.stokes at intel.com]
> Sent: Thursday, July 07, 2016 1:57 PM
> To: Mechthild Buescher; bugs at openvswitch.org
> Subject: RE: bug in ovs-vswitchd?!
>
> Hi Mechthild,
>
> I've tried to reproduce this issue on my setup (Fedora 22 kerenl
> 4.1.8) but have not been able to reproduce it.
>
> A few questions to help the investigation
>
> 1. Are you running 1 or 2 VMs in the setup (i.e. 1 vm with 2 vhost
> user ports or 2 vms with 1 vhost user port each?) 2. What are the
> parameters being used to launch the VM/s attached to the vhost user
> ports?
> 3. Inside the VM, are the interfaces bound to? igb_uio (i.e. using
> dpdk app inside the guest) or are the interfaces being used as kernel
> devices inside the VM?
> 4. What parameters are you launching OVS with?
> 5. Can you provide an ovs log?
> 6. Can you confirm the DPDK version you are using in the host/VM (If
> being used in the VM).
>
> Thanks
> Ian
>
>> From: discuss [mailto:discuss-bounces at openvswitch.org] On Behalf Of
>> Mechthild Buescher
>> Sent: Wednesday, July 06, 2016 1:54 PM
>> To: bugs at openvswitch.org
>> Subject: [ovs-discuss] bug in ovs-vswitchd?!
>>
>> Hi all,
>>
>> we are using ovs with dpdk-interfaces and vhostuser-interfaces and
>> want to try the VMs with different multi-queue settings. When we specify 2 cores and 2 multi-queues for a dpdk-interface but only one queue >for the vhost-interfaces, ovs-vswitchd crashes at start of the VM (or latest when traffic is sent).
>>
>> The version of ovs is : 2.5.90, master branch, latest commit
>> 7a15be69b00fe8f66a3f3929434b39676f325a7a)
>> It has been built and is running on: Linux version 3.13.0-87-generic
>> (buildd at lgw01-25) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
>> ) #133-Ubuntu SMP Tue May 24 18:32:09 UTC 2016
>>
>> The configuration is:
>> ovs-vsctl show
>> 0e191ed4-040b-458c-bad8-feb6f7c90e3a
>> Bridge br-prv
>> Port br-prv
>> Interface br-prv
>> type: internal
>> Port "dpdk0"
>> Interface "dpdk0"
>> type: dpdk
>> options: {n_rxq="2"}
>> Bridge br-int
>> Port br-int
>> Interface br-int
>> type: internal
>> Port "vhost112"
>> Interface "vhost112"
>> type: dpdkvhostuser
>> options: {n_rxq="1"}
>> Port "vhost111"
>> Interface "vhost111"
>> type: dpdkvhostuser
>> options: {n_rxq="1"}
>> Port "vxlan0"
>> Interface "vxlan0"
>> type: vxlan
>> options: {key=flow, remote_ip="10.1.2.2"}
>>
>> ovs-appctl dpif-netdev/pmd-rxq-show
>> pmd thread numa_id 0 core_id 1:
>> port: vhost112 queue-id: 0
>> port: dpdk0 queue-id: 1 pmd thread numa_id 0
>>core_id 2:
>> port: dpdk0 queue-id: 0
>> port: vhost111 queue-id: 0
>>
>> appctl-m dpif/show
>> br-int:
>> br-int 65534/6: (tap)
>> vhost111 11/3: (dpdkvhostuser:
>>configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1,
>>requested_tx_queues=21)
>> vhost112 12/5: (dpdkvhostuser:
>>configured_rx_queues=1, configured_tx_queues=1, requested_rx_queues=1,
>>requested_tx_queues=21)
>> vxlan0 100/4: (vxlan: key=flow,
>>remote_ip=10.1.2.2)
>> br-prv:
>> br-prv 65534/1: (tap)
>> dpdk0 1/2: (dpdk:
>>configured_rx_queues=2, configured_tx_queues=21,
>>requested_rx_queues=2,
>>requested_tx_queues=21)
I'm a little concerned about the numbers reported here. 21 tx queues is a bit much, I think. I haven't tried reproducing this yet, but can you confirm this is desired?
>>
>> (gdb) bt
>> #0 0x00000000005356e4 in ixgbe_xmit_pkts_vec ()
>> #1 0x00000000006df384 in rte_eth_tx_burst (nb_pkts=<optimized out>,
>> tx_pkts=<optimized out>, queue_id=1, port_id=<optimized out>)
>> at
>> /opt/dpdk-16.04/x86_64-native-linuxapp-gcc//include/rte_ethdev.h:2791
>> #2 dpdk_queue_flush__ (qid=<optimized out>, dev=<optimized out>) at
>> lib/netdev-dpdk.c:1099
>> #3 dpdk_queue_flush (qid=<optimized out>, dev=<optimized out>) at
>> lib/netdev-dpdk.c:1133
>> #4 netdev_dpdk_rxq_recv (rxq=0x7fbe127ad4c0, packets=0x7fc26761e408,
>> c=0x7fc26761e400) at lib/netdev-dpdk.c:1312
>> #5 0x000000000061be98 in netdev_rxq_recv (rx=<optimized out>,
>> batch=batch at entry=0x7fc26761e400) at lib/netdev.c:628
>> #6 0x00000000005f17bb in dp_netdev_process_rxq_port
>> (pmd=pmd at entry=0x29ea810, rxq=<optimized out>, port=<optimized out>,
>> port=<optimized out>)
>> at lib/dpif-netdev.c:2619
>> #7 0x00000000005f1b27 in pmd_thread_main (f_=0x29ea810) at
>> lib/dpif-netdev.c:2864
>> #8 0x000000000067dde4 in ovsthread_wrapper (aux_=<optimized out>) at
>> lib/ovs-thread.c:342
>> #9 0x00007fc26b90e184 in start_thread (arg=0x7fc26761f700) at
>> pthread_create.c:312
>> #10 0x00007fc26af2237d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> This is the minimal configuration which leads to the fault. Our
>> complete configuration contains more vhostuser interfaces than above. We observed that only the combination of 2 cores/queues for dpdk- interface and 1 queue for vhostuser interfaces results in an ovs-vswitchd crash, in detail:
>> Dpdk0: 1 cores/queues & all vhost-ports: 1 queue => successful
>> Dpdk0: 2 cores/queues & all vhost-ports: 1 queue => crash
>> Dpdk0: 2 cores/queues & all vhost-ports: 2 queue => successful
>> Dpdk0: 4 cores/queues & all vhost-ports: 1 queue => successful
>> Dpdk0: 4 cores/queues & all vhost-ports: 2 queue => successful
>> Dpdk0: 4 cores/queues & all vhost-ports: 4 queue => successful
>>
>> Do you have any suggestions?
Can you please also supply the cpu (model number) that you're using?
Thanks,
Aaron
>> Best regards,
>>
>> Mechthild Buescher
More information about the discuss
mailing list