[ovs-dev] [PATCH] netlink-socket: Do not make flow_dump block on netlink socket.

Alex Wang alexw at nicira.com
Fri Jul 18 22:10:27 UTC 2014


*Sure, when I tried to delete my br-int, ovs hangs*

*Basically, main thread joins the revalidator thread, revalidator threads
are either blocking at recvmsg() or the mutex.*

*Following is the trace:*

  Id   Target Id         Frame
  47   Thread 0x7f18bed6a700 (LWP 338) "revalidator57" 0x00007f18bf8528ad
in recvmsg ()
    at ../sysdeps/unix/syscall-template.S:81
  46   Thread 0x7f18be569700 (LWP 337) "revalidator56" __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  6    Thread 0x7f18bcd66700 (LWP 32584) "urcu5" 0x00007f18bf05cfbd in poll
()
    at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x7f18c02ab980 (LWP 32553) "ovs-vswitchd" 0x00007f18bf84c66b
in pthread_join (threadid=139744249288448, thread_return=thread_return at entry
=0x0)
    at pthread_join.c:92



*The main thread will udpif_flush, which stops all revalidators...*

 #0  0x00007f18bf84c66b in pthread_join (threadid=139744249288448,
    thread_return=thread_return at entry=0x0) at pthread_join.c:92
#1  0x0000000000495d39 in xpthread_join (arg1=<optimized out>,
arg2=arg2 at entry=0x0)
    at lib/ovs-thread.c:173
#2  0x000000000042ca8a in udpif_stop_threads (udpif=0x2294bb0)
    at ofproto/ofproto-dpif-upcall.c:317
#3  0x000000000042e48c in udpif_flush (udpif=0x2294bb0)
    at ofproto/ofproto-dpif-upcall.c:482
#4  0x0000000000419101 in ofproto_flush__ (ofproto=ofproto at entry=0x227dc80)
    at ofproto/ofproto.c:1281
#5  0x0000000000419285 in ofproto_destroy (p=0x227dc80) at
ofproto/ofproto.c:1363
#6  0x0000000000408a90 in bridge_destroy (br=br at entry=0x21fe2c0)
    at vswitchd/bridge.c:2716
#7  0x0000000000408e81 in add_del_bridges (cfg=0x21b8a90, cfg=0x21b8a90)
    at vswitchd/bridge.c:1428
#8  0x000000000040a977 in bridge_reconfigure (ovs_cfg=ovs_cfg at entry
=0x21b8a90)
    at vswitchd/bridge.c:538
#9  0x000000000040e214 in bridge_run () at vswitchd/bridge.c:2387
#10 0x0000000000405e15 in main (argc=6, argv=0x7fff70c4a268)
    at vswitchd/ovs-vswitchd.c:116


*At the same time, revalidator is dumping flow and blocked,*


*revalidator57*

#0  0x00007f18bf8528ad in recvmsg () at
../sysdeps/unix/syscall-template.S:81
#1  0x00000000004d7bfb in nl_sock_recv__ (buf=buf at entry=0x7f18b0000a10,
    wait=wait at entry=true, sock=<optimized out>, sock=<optimized out>)
    at lib/netlink-socket.c:337
#2  0x00000000004d8d1d in nl_dump_refill (dump=<optimized out>,
dump=<optimized out>,
    buffer=<optimized out>) at lib/netlink-socket.c:727
#3  nl_dump_next (dump=dump at entry=0x7f18b4004ac8, reply=reply at entry
=0x7f18bed67170,
    buffer=buffer at entry=0x7f18b0000a10) at lib/netlink-socket.c:804
#4  0x00000000004ce628 in dpif_linux_flow_dump_next
(thread_=0x7f18b0000980,
    flows=0x7f18bed67370, max_flows=50) at lib/dpif-linux.c:1279
#5  0x0000000000450152 in dpif_flow_dump_next (thread=thread at entry
=0x7f18b0000980,
    flows=flows at entry=0x7f18bed67370, max_flows=max_flows at entry=50) at
lib/dpif.c:1048
#6  0x000000000042d97f in revalidate (revalidator=0x227cb50)
    at ofproto/ofproto-dpif-upcall.c:1375
#7  0x000000000042ddcb in udpif_revalidator (arg=0x227cb50)
    at ofproto/ofproto-dpif-upcall.c:599
#8  0x0000000000495531 in ovsthread_wrapper (aux_=<optimized out>)
    at lib/ovs-thread.c:329
#9  0x00007f18bf84b182 in start_thread (arg=0x7f18bed6a700) at
pthread_create.c:312
#10 0x00007f18bf06a30d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111





*revalidator56*

#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f18bf84d657 in _L_lock_909 () from
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f18bf84d480 in __GI___pthread_mutex_lock (mutex=0x7f18b4004ad8)
    at ../nptl/pthread_mutex_lock.c:79
#3  0x0000000000495568 in ovs_mutex_lock_at (l_=l_ at entry=0x7f18b4004ad8,
    where=where at entry=0x51bbf6 "lib/netlink-socket.c:816") at
lib/ovs-thread.c:73
#4  0x00000000004d8dc1 in nl_dump_next (dump=dump at entry=0x7f18b4004ac8,
    reply=reply at entry=0x7f18be566170, buffer=buffer at entry=0x7f18b4004f80)
    at lib/netlink-socket.c:816
#5  0x00000000004ce628 in dpif_linux_flow_dump_next
(thread_=0x7f18b4004ef0,
    flows=0x7f18be566370, max_flows=50) at lib/dpif-linux.c:1279
#6  0x0000000000450152 in dpif_flow_dump_next (thread=thread at entry
=0x7f18b4004ef0,
    flows=flows at entry=0x7f18be566370, max_flows=max_flows at entry=50) at
lib/dpif.c:1048
#7  0x000000000042d97f in revalidate (revalidator=0x227cb30)
    at ofproto/ofproto-dpif-upcall.c:1375
#8  0x000000000042ddcb in udpif_revalidator (arg=0x227cb30)
    at ofproto/ofproto-dpif-upcall.c:599
#9  0x0000000000495531 in ovsthread_wrapper (aux_=<optimized out>)
    at lib/ovs-thread.c:329
#10 0x00007f18bf84b182 in start_thread (arg=0x7f18be569700) at
pthread_create.c:312
#11 0x00007f18bf06a30d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111



On Fri, Jul 18, 2014 at 2:51 PM, Ben Pfaff <blp at nicira.com> wrote:

> On Fri, Jul 18, 2014 at 02:45:47PM -0700, Alex Wang wrote:
> > Commit 93295354 (netlink-socket: Simplify multithreaded dumping
> > to match Linux reality.) makes the call to recvmsg() block if no
> > messages are available.  This can cause revalidator threads hanging
> > for long time or even deadlock when main thread tries to stop the
> > revalidator threads.
> >
> > This commit fixes the issue by enabling the MSG_DONTWAIT flag in
> > the call to recvmsg().
> >
> > Signed-off-by: Alex Wang <alexw at nicira.com>
>
> It's a reasonable fix but I'd like to learn more about the situation
> where the problem arises.  It seems like there might be more to it.
> Can you explain more, or maybe give a backtrace?
>
> Thanks,
>
> Ben.
>



More information about the dev mailing list