[ovs-discuss] ovs get stuck when running traffic from VM to VM on same compute

Yi Ba yby.developer at yahoo.com
Thu Apr 28 22:20:09 UTC 2016


Hi,The problem doesn't seem to happen with latest ovs 2.5.0 git branch (commit ac93328273238b5dc86353222264fa4f30ad95e8, dpdk 16.04).
This is the stack traces we got with 2.5.0 release:
before running traffic:

(gdb) info threads  Id   TargetId         Frame  29   Thread 0x7fb44f004700 (LWP 4498)"dpdk_watchdog1" 0x00007fb44f0bef2d in nanosleep () at../sysdeps/unix/syscall-template.S:81  28   Thread 0x7fb44e803700 (LWP 4499)"vhost_thread2" 0x00007fb44f0e6ae3 in select () at../sysdeps/unix/syscall-template.S:81  27   Thread 0x7fb44e002700 (LWP 4500)"urcu3" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  26   Thread 0x7fb3fbfff700 (LWP 4601)"handler82" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  25   Thread 0x7fb418ff9700 (LWP 4602)"handler79" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  24   Thread 0x7fb4197fa700 (LWP 4603)"handler78" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81  23   Thread 0x7fb419ffb700 (LWP 4604)"handler77" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  22   Thread 0x7fb44d400700 (LWP 4605)"handler80" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81  21   Thread 0x7fb44cbff700 (LWP 4606)"handler81" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  20   Thread 0x7fb43ffff700 (LWP 4607)"handler83" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  19   Thread 0x7fb43f7fe700 (LWP 4608)"handler84" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  18   Thread 0x7fb43effd700 (LWP 4609)"handler85" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  17   Thread 0x7fb43e7fc700 (LWP 4610)"handler86" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  16   Thread 0x7fb43dffb700 (LWP 4611)"handler87" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  15   Thread 0x7fb43d7fa700 (LWP 4612)"handler89" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  14   Thread 0x7fb43cff9700 (LWP 4613)"handler88" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  13   Thread 0x7fb423fff700 (LWP 4614)"handler91" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  12   Thread 0x7fb4237fe700 (LWP 4615)"handler90" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  11   Thread 0x7fb422ffd700 (LWP 4616)"handler92" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  10   Thread 0x7fb4227fc700 (LWP 4617)"handler93" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  9    Thread 0x7fb421ffb700 (LWP 4618)"revalidator94" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  8    Thread 0x7fb4217fa700 (LWP 4619)"revalidator95" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  7    Thread 0x7fb420ff9700 (LWP 4620) "revalidator96"0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81  6    Thread 0x7fb41bfff700 (LWP 4621)"revalidator97" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  5    Thread 0x7fb41b7fe700 (LWP 4622)"revalidator98" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  4    Thread 0x7fb41affd700 (LWP 4623)"revalidator99" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  3    Thread 0x7fb41a7fc700 (LWP 4624)"revalidator100" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  2    Thread 0x7fb3fb7fe700 (LWP 4625)"pmd101" 0x00000000005c8088 in dp_netdev_process_rxq_port.isra ()* 1    Thread 0x7fb45074eb00 (LWP 4497)"ovs-vswitchd" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81 (gdb) thread 2[Switching to thread 2 (Thread 0x7fb3fb7fe700 (LWP 4625))]#0  0x00000000005c8088 indp_netdev_process_rxq_port.isra () (gdb) bt#0  0x00000000005c8088 in dp_netdev_process_rxq_port.isra()#1  0x00000000005c84aa in pmd_thread_main ()#2  0x0000000000648c54 in ovsthread_wrapper ()#3  0x00007fb44f8c10a4 in start_thread(arg=0x7fb3fb7fe700) at pthread_create.c:309#4  0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111 After running traffic and ovs stuck: (gdb) info threads  Id   TargetId         Frame  29   Thread 0x7fb44f004700 (LWP 4498)"dpdk_watchdog1" 0x00007fb44f0bef2d in nanosleep () at../sysdeps/unix/syscall-template.S:81  28   Thread 0x7fb44e803700 (LWP 4499)"vhost_thread2" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  27   Thread 0x7fb44e002700 (LWP 4500)"urcu3" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  26   Thread 0x7fb3fbfff700 (LWP 4601)"handler82" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  25   Thread 0x7fb418ff9700 (LWP 4602)"handler79" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  24   Thread 0x7fb4197fa700 (LWP 4603)"handler78" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  23   Thread 0x7fb419ffb700 (LWP 4604)"handler77" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  22   Thread 0x7fb44d400700 (LWP 4605)"handler80" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  21   Thread 0x7fb44cbff700 (LWP 4606)"handler81" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  20   Thread 0x7fb43ffff700 (LWP 4607)"handler83" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  19   Thread 0x7fb43f7fe700 (LWP 4608)"handler84" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  18   Thread 0x7fb43effd700 (LWP 4609)"handler85" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  17   Thread 0x7fb43e7fc700 (LWP 4610)"handler86" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  16   Thread 0x7fb43dffb700 (LWP 4611)"handler87" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  15   Thread 0x7fb43d7fa700 (LWP 4612)"handler89" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  14   Thread 0x7fb43cff9700 (LWP 4613)"handler88" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  13   Thread 0x7fb423fff700 (LWP 4614)"handler91" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  12   Thread 0x7fb4237fe700 (LWP 4615)"handler90" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  11   Thread 0x7fb422ffd700 (LWP 4616)"handler92" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  10   Thread 0x7fb4227fc700 (LWP 4617)"handler93" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81  9    Thread 0x7fb421ffb700 (LWP 4618)"revalidator94" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  8    Thread 0x7fb4217fa700 (LWP 4619)"revalidator95" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81  7    Thread 0x7fb420ff9700 (LWP 4620)"revalidator96" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  6    Thread 0x7fb41bfff700 (LWP 4621)"revalidator97" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81  5    Thread 0x7fb41b7fe700 (LWP 4622)"revalidator98" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  4    Thread 0x7fb41affd700 (LWP 4623)"revalidator99" 0x00007fb44f0e4d3d in poll () at ../sysdeps/unix/syscall-template.S:81  3    Thread 0x7fb41a7fc700 (LWP 4624)"revalidator100" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81  2    Thread 0x7fb3fb7fe700 (LWP 4625)"pmd101" 0x00000000004664c9 in rte_vhost_dequeue_burst ()* 1    Thread 0x7fb45074eb00 (LWP 4497)"ovs-vswitchd" 0x00007fb44f0e4d3d in poll () at../sysdeps/unix/syscall-template.S:81(gdb) thread 2[Switching to thread 2 (Thread 0x7fb3fb7fe700 (LWP 4625))]#0  0x00000000004664c9 in rte_vhost_dequeue_burst ()(gdb) bt#0  0x00000000004664c9 in rte_vhost_dequeue_burst ()#1  0x000000000069fb85 in netdev_dpdk_vhost_rxq_recv ()#2  0x00000000005f25d1 in netdev_rxq_recv ()#3  0x00000000005c8076 indp_netdev_process_rxq_port.isra ()#4  0x00000000005c84aa in pmd_thread_main ()#5  0x0000000000648c54 in ovsthread_wrapper ()#6  0x00007fb44f8c10a4 in start_thread(arg=0x7fb3fb7fe700) at pthread_create.c:309#7  0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Detach and re-attach: (gdb) thread 2[Switching to thread 2 (Thread 0x7fb450719b00 (LWP 5315))]#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238238    ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No such fileor directory.(gdb) bt#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238#1  0x00007fb44f6b3ce3 in handle_fildes_io(arg=<optimized out>) at ../sysdeps/pthread/aio_misc.c:645#2  0x00007fb44f8c10a4 in start_thread(arg=0x7fb450719b00) at pthread_create.c:309#3  0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Detach and re-attach: (gdb) thread 2[Switching to thread 2 (Thread 0x7fb3fb7fe700 (LWP 4625))]#0  0x0000000000463844 in rte_pktmbuf_free ()(gdb) bt#0  0x0000000000463844 in rte_pktmbuf_free ()#1  0x00000000004668f4 in rte_vhost_dequeue_burst ()#2  0x000000000069fb85 in netdev_dpdk_vhost_rxq_recv ()#3  0x00000000005f25d1 in netdev_rxq_recv ()#4  0x00000000005c8076 indp_netdev_process_rxq_port.isra ()#5  0x00000000005c84aa in pmd_thread_main ()#6  0x0000000000648c54 in ovsthread_wrapper ()#7  0x00007fb44f8c10a4 in start_thread(arg=0x7fb3fb7fe700) at pthread_create.c:309#8  0x00007fb44f0ed87d in clone () at../sysdeps/unix/sysv/linux/x86_64/clone.S:111  Thanks
 

    On Tuesday, 26 April 2016 11:29 AM, Daniele Di Proietto <diproiettod at ovn.org> wrote:
 

 

2016-04-26 9:08 GMT-07:00 Traynor, Kevin <kevin.traynor at intel.com>:

> -----Original Message-----
> From: discuss [mailto:discuss-bounces at openvswitch.org] On Behalf Of
> Kochba, Alon
> Sent: Tuesday, April 26, 2016 4:38 PM
> To: Ben Pfaff <blp at ovn.org>; Yi Ba <yby.developer at yahoo.com>
> Cc: bugs at openvswitch.org
> Subject: Re: [ovs-discuss] ovs get stuck when running traffic from VM
> to VM on same compute
>
> Hi Ben,
>
> Could you point us to the commit that fixed this issue?
> We already tried patching with this commit which seemed relevant, but
> the issue still recreated -
> https://github.com/openvswitch/ovs/commit/f519a72d9a3708fbc5f796f176e7
> c8bd3dcfb738
>
> We will retry with your suggestion of using the 2.5 branch code, but
> we might want to backport the specific fix unless there is a 2.5.1
> release including it.
> If the commit linked above is the one you were thinking of, please
> note a small difference - in the commit the rcu is blocked waiting for
> vhost_thread to quiesce, while in our case rcu is blocked waiting for
> pmd to quiesce.

It sounds similar to the problem that this commit fixed. If so the fix
is applied to master and 2.5 branches.

https://github.com/openvswitch/ovs/commit/61c4e39460a7db3be7262a3b2af767a84167a9d8


Could you try applying the above commit and see if it fixes the problem?

If you manage to reproduce the problem, could you get a backtrace of the blocked thread (pmd101 in this case)?

Thanks,

Daniele 


  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20160428/a74948ca/attachment-0002.html>


More information about the discuss mailing list