[ovs-discuss] bug in ovs-vswitchd?!

Mechthild Buescher mechthild.buescher at ericsson.com
Tue Jul 12 12:29:57 UTC 2016


Hi Aaron,

I think that the vhost-sock-permissions is not needed - I will check whether it makes a different. It's a left-over from an earlier configuration where we had the problem that qemu (started by libvirt) wasn't able to access the socket. This problem has been solved.
 
The tx queues are not configured by us, so I don't know where this value comes from. Maybe it's the default value?! So, it's not intended.

cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2658 v2 @ 2.40GHz
stepping	: 4
microcode	: 0x40c
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 0
siblings	: 20
core id		: 0
cpu cores	: 10
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 4799.93
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Thanks again,

BR/Mechthild

-----Original Message-----
From: Aaron Conole [mailto:aconole at redhat.com] 
Sent: Monday, July 11, 2016 11:06 PM
To: Mechthild Buescher
Cc: Stokes, Ian; bugs at openvswitch.org
Subject: Re: [ovs-discuss] bug in ovs-vswitchd?!

Mechthild Buescher <mechthild.buescher at ericsson.com> writes:

> Hi Aaron,
>
> Sorry for being unclear regarding the VM: I meant the DPDK usage 
> inside the VM. So, the fault happens when using the VM. Inside the VM 
> I can either bind the interfaces to DPDK or to linux - in both cases, 
> the fault occurs.
>
> And I haven't applied any patch. I used the latest available version 
> from the master branch - I don't know whether any patch is upstreamed 
> to the master branch.

Okay - I wonder what the vhost-sock-permissions command line is all about, then?

Can you confirm that 21 tx queues is not intended, then (21 tx queues is showing in your configuration output)?

Also, please send the cpu information (cat /proc/cpuinfo on the host).

> Thanks in advance for your help,
>
> BR/Mechthild
>
> -----Original Message-----
> From: Aaron Conole [mailto:aconole at redhat.com]
> Sent: Monday, July 11, 2016 7:22 PM
> To: Mechthild Buescher
> Cc: Stokes, Ian; bugs at openvswitch.org
> Subject: Re: [ovs-discuss] bug in ovs-vswitchd?!
>
> Mechthild Buescher <mechthild.buescher at ericsson.com> writes:
>
>> Hi Ian,
>>
>> Thanks for the fast reply! I also did some further investigations 
>> where I could see that ovs-vswitchd usually keeps alive for receiving 
>> packets but crashes for sending packets.
>>
>> Regarding your question:
>> 1. We are running 1 VM with 2 vhost ports (in the simplified setup; 
>> in the complete setup we use 1 VM & 5 vhost ports)
>>
>> 2. We are using libvirt to start the VM which is configured to use qemu:
>> /usr/bin/qemu-system-x86_64 -name guest=ubuntu11_try,debug-threads=on
>> -S -machine pc-i440fx-wily,accel=kvm,usb=off -cpu host -m 8192 
>> -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object 
>> memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/mnt/huge_1G/l
>> i
>> bvirt/qemu,share=yes,size=8589934592
>> -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid 
>> 8a2ad7a3-9da1-4c69-a2ff-c7a680d9bc4a -no-user-config -nodefaults 
>> -chardev 
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-300-ubuntu11_
>> t
>> ry/monitor.sock,server,nowait -mon
>> chardev=charmonitor,id=monitor,mode=control -rtc base=utc 
>> -no-shutdown -boot strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
>> file=/root/perf/vms/ubuntu11.qcow2,format=qcow2,if=none,id=drive-virt
>> i
>> o-disk0
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,i
>> d
>> =virtio-disk0,bootindex=1
>> -netdev tap,fd=21,id=hostnet0 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3c:92:47,bus=pci.
>> 0
>> ,addr=0x3
>> -netdev tap,fd=23,id=hostnet1 -device
>> virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:3c:a3:47,bus=pci.
>> 0
>> ,addr=0x4 -chardev
>> socket,id=charnet2,path=/var/run/openvswitch/vhost111 -netdev
>> type=vhost-user,id=hostnet2,chardev=charnet2 -device
>> virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:a0:11:02,bus=pci.
>> 0
>> ,addr=0x5 -chardev
>> socket,id=charnet3,path=/var/run/openvswitch/vhost112 -netdev
>> type=vhost-user,id=hostnet3,chardev=charnet3 -device
>> virtio-net-pci,netdev=hostnet3,id=net3,mac=52:54:00:a0:11:03,bus=pci.
>> 0
>> ,addr=0x6
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
>>
>> 3. In the VM we have both kind of interface bindings, virtio-pci and 
>> igb_uio. For both type of interfaces, the crash of ovs-vswitchd can 
>> be observed (The VM is still alive).
>>
>> 4.  The ovs-vswitchd is started as follows and is configured to use vxlan tunnel:
>> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true 
>> ovs-vsctl --no-wait set Open_vSwitch .
>> other_config:dpdk-lcore-mask=0x1 ovs-vsctl --no-wait set Open_vSwitch 
>> . other_config:dpdk-socket-mem=4096,0
>> ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask=6 
>> ovs-vsctl --no-wait set Interface dpdk0 options:n_rxq=2 ovs-vsctl 
>> --no-wait set Interface vhost111 options:n_rxq=1 ovs-vsctl --no-wait 
>> set Interface vhost112 options:n_rxq=1 ovs-vsctl --no-wait set 
>> Open_vSwitch . other_config:vhost-sock-permissions=766
>
> Have you, perchance, applied some extra patches?  This was proposed,
>> but not accepted, as a possible workaround for a permissions issue 
>> with ovs dpdk.
>
>> ovs-vswitchd --pidfile=$DB_PID --detach --monitor 
>> --log-file=$LOG_FILE -vfile:dbg --no-chdir -vconsole:emer --mlockall 
>> unix:$DB_SOCK
>>
>> ovs-vsctl add-port br-int vhost111 -- set Interface vhost111 
>> type=dpdkvhostuser ofport_request=11 ovs-vsctl add-port br-int
>> vhost112 -- set Interface vhost112 type=dpdkvhostuser
>> ofport_request=12 ovs-vsctl add-br br-int -- set bridge br-int 
>> datapath_type=netdev ovs-vsctl set Bridge br-int
>> other_config:datapath-id=0000f2b811144f41
>> ovs-vsctl set Bridge br-int protocols=OpenFlow13 ovs-vsctl add-port 
>> br-int vxlan0 -- set interface vxlan0 type=vxlan
>> options:remote_ip=10.1.2.2 options:key=flow ofport_request=100
>>
>> 5. The ovs-log is attached - it contains the log from start to crash 
>> (with debug information). The crash has been provoked by setting up 
>> an virtio-pci in the VM, so no DPDK is used in the VM for this scenario.
>>
>> 6. The DPDK versions are:
>> Host: dpdk 16.04 latest commit
>> b3b9719f18ee83773c6ed7adda300c5ac63c37e9
>> VM: (not used in this scenario) dpdk 2.2.0
>
> For confirmation, this happens whether or not you use a VM?  I just
>> want to make sure.  It's usually best to pair dpdk versions whenever 
>> possible.
>
>> BR/Mechthild
>>
>> -----Original Message-----
>> From: Stokes, Ian [mailto:ian.stokes at intel.com]
>> Sent: Thursday, July 07, 2016 1:57 PM
>> To: Mechthild Buescher; bugs at openvswitch.org
>> Subject: RE: bug in ovs-vswitchd?!
>>
>> Hi Mechthild,
>>
>> I've tried to reproduce this issue on my setup (Fedora 22 kerenl
>> 4.1.8) but have not been able to reproduce it.
>>
>> A few questions to help the investigation
>>
>> 1. Are you running 1 or 2 VMs in the setup (i.e. 1 vm with 2 vhost 
>> user ports or 2 vms with 1 vhost user port each?) 2. What are the 
>> parameters being used to launch the VM/s attached to the vhost user 
>> ports?
>> 3. Inside the VM, are the interfaces bound to? igb_uio (i.e. using 
>> dpdk app inside the guest) or are the interfaces being used as kernel 
>> devices inside the VM?
>> 4. What parameters are you launching OVS with?
>> 5. Can you provide an ovs log?
>> 6. Can you confirm the DPDK version you are using in the host/VM (If 
>> being used in the VM).
>>
>> Thanks
>> Ian
>>
>>> From: discuss [mailto:discuss-bounces at openvswitch.org] On Behalf Of 
>>> Mechthild Buescher
>>> Sent: Wednesday, July 06, 2016 1:54 PM
>>> To: bugs at openvswitch.org
>>> Subject: [ovs-discuss] bug in ovs-vswitchd?!
>>> 
>>> Hi all,
>>> 
>>> we are using ovs with dpdk-interfaces and vhostuser-interfaces and 
>>> want to try the VMs with different multi-queue settings. When we specify 2 cores and 2 multi-queues for a dpdk-interface but only one queue >for the vhost-interfaces, ovs-vswitchd crashes at start of the VM (or latest when traffic is sent).
>>>
>>> The version of ovs is : 2.5.90, master branch, latest commit
>>> 7a15be69b00fe8f66a3f3929434b39676f325a7a)
>>> It has been built and is running on: Linux version 3.13.0-87-generic
>>> (buildd at lgw01-25) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
>>> ) #133-Ubuntu SMP Tue May 24 18:32:09 UTC 2016
>>>
>>> The configuration is:
>>> ovs-vsctl show
>>> 0e191ed4-040b-458c-bad8-feb6f7c90e3a
>>>     Bridge br-prv
>>>         Port br-prv
>>>             Interface br-prv
>>>                 type: internal
>>>         Port "dpdk0"
>>>             Interface "dpdk0"
>>>                 type: dpdk
>>>                 options: {n_rxq="2"}
>>>     Bridge br-int
>>>         Port br-int
>>>             Interface br-int
>>>                 type: internal
>>>         Port "vhost112"
>>>             Interface "vhost112"
>>>                 type: dpdkvhostuser
>>>                 options: {n_rxq="1"}
>>>         Port "vhost111"
>>>             Interface "vhost111"
>>>                 type: dpdkvhostuser
>>>                options: {n_rxq="1"}
>>>         Port "vxlan0"
>>>             Interface "vxlan0"
>>>                 type: vxlan
>>>                 options: {key=flow, remote_ip="10.1.2.2"}
>>>
>>> ovs-appctl dpif-netdev/pmd-rxq-show
>>> pmd thread numa_id 0 core_id 1:
>>>                 port: vhost112   queue-id: 0
>>>                 port: dpdk0        queue-id: 1  pmd thread numa_id 0 
>>>core_id 2:
>>>                 port: dpdk0        queue-id: 0
>>>                 port: vhost111   queue-id: 0
>>>
>>> appctl-m dpif/show
>>>                 br-int:
>>>                                 br-int 65534/6: (tap)
>>>                                 vhost111 11/3: (dpdkvhostuser: 
>>>configured_rx_queues=1, configured_tx_queues=1, 
>>>requested_rx_queues=1,
>>>requested_tx_queues=21)
>>>                                 vhost112 12/5: (dpdkvhostuser: 
>>>configured_rx_queues=1, configured_tx_queues=1, 
>>>requested_rx_queues=1,
>>>requested_tx_queues=21)
>>>                                 vxlan0 100/4: (vxlan: key=flow,
>>>remote_ip=10.1.2.2)
>>>                 br-prv:
>>>                                 br-prv 65534/1: (tap)
>>>                                 dpdk0 1/2: (dpdk: 
>>>configured_rx_queues=2, configured_tx_queues=21, 
>>>requested_rx_queues=2,
>>>requested_tx_queues=21)
>
> I'm a little concerned about the numbers reported here.  21 tx queues is a bit much, I think.  I haven't tried reproducing this yet, but can you confirm this is desired?
>
>>> 
>>>  (gdb) bt
>>> #0  0x00000000005356e4 in ixgbe_xmit_pkts_vec ()
>>> #1  0x00000000006df384 in rte_eth_tx_burst (nb_pkts=<optimized out>, 
>>> tx_pkts=<optimized out>, queue_id=1, port_id=<optimized out>)
>>>     at
>>> /opt/dpdk-16.04/x86_64-native-linuxapp-gcc//include/rte_ethdev.h:279
>>> 1
>>> #2  dpdk_queue_flush__ (qid=<optimized out>, dev=<optimized out>) at
>>> lib/netdev-dpdk.c:1099
>>> #3  dpdk_queue_flush (qid=<optimized out>, dev=<optimized out>) at
>>> lib/netdev-dpdk.c:1133
>>> #4  netdev_dpdk_rxq_recv (rxq=0x7fbe127ad4c0, 
>>> packets=0x7fc26761e408,
>>> c=0x7fc26761e400) at lib/netdev-dpdk.c:1312
>>> #5  0x000000000061be98 in netdev_rxq_recv (rx=<optimized out>,
>>> batch=batch at entry=0x7fc26761e400) at lib/netdev.c:628
>>> #6  0x00000000005f17bb in dp_netdev_process_rxq_port 
>>> (pmd=pmd at entry=0x29ea810, rxq=<optimized out>, port=<optimized out>, 
>>> port=<optimized out>)
>>>     at lib/dpif-netdev.c:2619
>>> #7  0x00000000005f1b27 in pmd_thread_main (f_=0x29ea810) at
>>> lib/dpif-netdev.c:2864
>>> #8  0x000000000067dde4 in ovsthread_wrapper (aux_=<optimized out>) 
>>> at
>>> lib/ovs-thread.c:342
>>> #9  0x00007fc26b90e184 in start_thread (arg=0x7fc26761f700) at
>>> pthread_create.c:312
>>> #10 0x00007fc26af2237d in clone () at
>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>> 
>>> This is the minimal configuration which leads to the fault. Our 
>>> complete configuration contains more vhostuser interfaces than above. We observed that only the combination of 2 cores/queues for dpdk- interface and 1 queue for vhostuser interfaces results in an ovs-vswitchd crash, in detail:
>>> Dpdk0: 1 cores/queues & all vhost-ports: 1 queue => successful
>>> Dpdk0: 2 cores/queues & all vhost-ports: 1 queue => crash
>>> Dpdk0: 2 cores/queues & all vhost-ports: 2 queue => successful
>>> Dpdk0: 4 cores/queues & all vhost-ports: 1 queue => successful
>>> Dpdk0: 4 cores/queues & all vhost-ports: 2 queue => successful
>>> Dpdk0: 4 cores/queues & all vhost-ports: 4 queue => successful
>>> 
>>> Do you have any suggestions? 
>
> Can you please also supply the cpu (model number) that you're using?
>
> Thanks,
> Aaron
>
>>> Best regards,
>>>
>>> Mechthild Buescher


More information about the discuss mailing list