[ovs-dev] OVS+DPDK: deadlock and race condtion, which leads OVS deadlock or crash

王志克 wangzhike at jd.com
Tue Jan 23 10:26:07 UTC 2018


Hi Yuanha and all,

Thanks for review. I change the title and let more people involved.

The issues can be reproduced easily with OVS. 
The steps are below:
1) virsh start vm
2) ovs-vsctl add port
3) delete the port via ovs-vsctl del-port.
4) shutdown VM via vrish destroy.

I run a script to repeatly do such job for 2 VM, and it needs about 1 hour to reproduce it. 

However, I found there are more issues (I can reproduce it):

1) see below code
static void
vhost_user_read_cb(int connfd, void *dat, int *remove)
{
	struct vhost_user_connection *conn = dat;
	struct vhost_user_socket *vsocket = conn->vsocket;
	int ret;

	ret = vhost_user_msg_handler(conn->vid, connfd);
	if (ret < 0) {
		close(connfd);
		*remove = 1;
		vhost_destroy_device(conn->vid);

		pthread_mutex_lock(&vsocket->conn_mutex);
		TAILQ_REMOVE(&vsocket->conn_list, conn, next);
		pthread_mutex_unlock(&vsocket->conn_mutex);

		free(conn);
                 -------------------------> [Zhike Wang] we can see that the vsocket does not exist in conn_list or reconn_list at this short time slot. So in this time, if rte_vhost_driver_unregister() is called, vsocket is freed. So below vhost_user_create_client() may use raw pointer, and lead to crash.

		if (vsocket->reconnect)
			vhost_user_create_client(vsocket);
	}
}

2) regarding the above issue, I think we need one mutex to lock conn_list and reconn_list, so the vsocket can in either conn_list or reconn_list, not both or none. So I use vhost_user.mutex() 
diff --git a/dpdk-16.11.3/lib/librte_vhost/fd_man.c b/dpdk-16.11.3/lib/librte_vhost/fd_man.c
@@ -233,6 +234,7 @@
                poll(pfdset->rwfds, numfds, 1000 /* millisecs */);

                need_shrink = 0;
+                pthread_mutex_lock(&vhost_user.mutex);
                for (i = 0; i < numfds; i++) {
                        pthread_mutex_lock(&pfdset->fd_mutex);

@@ -264,7 +266,10 @@
                                rcb(fd, dat, &remove1);
                        if (wcb && pfd->revents & (POLLOUT | FDPOLLERR))
                                wcb(fd, dat, &remove2);
+
+                       pthread_mutex_lock(&pfdset->fd_mutex);
                        pfdentry->busy = 0;
+                       pthread_mutex_unlock(&pfdset->fd_mutex);
                        /*
                         * fdset_del needs to check busy flag.
                         * We don't allow fdset_del to be called in callback
@@ -285,5 +290,6 @@

                if (need_shrink)
                        fdset_shrink(pfdset);
+                pthread_mutex_unlock(&vhost_user.mutex);


@@ -390,6 +395,8 @@ struct vhost_user_reconnect_list {
        struct vhost_user_reconnect *reconn, *next;

        while (1) {
+
+               pthread_mutex_lock(&vhost_user.mutex);
                pthread_mutex_lock(&reconn_list.mutex);

                /*
@@ -417,11 +424,18 @@ struct vhost_user_reconnect_list {
                                "%s: connected\n", reconn->vsocket->path);
                        vhost_user_add_connection(reconn->fd, reconn->vsocket);
 remove_fd:

                pthread_mutex_unlock(&reconn_list.mutex);
+               pthread_mutex_unlock(&vhost_user.mutex);
                sleep(1);
        }

Then I met a deadlock (another thread enters rte_vhost_driver_unregister, and waiting for vhost_user.mutex.) 
(gdb) bt
#0  0x00007febfaf8ebcd in poll () from /lib64/libc.so.6
#1  0x00007fec029b6836 in poll (__timeout=<optimized out>, __nfds=2, __fds=0x7febe80008c0) at /usr/include/bits/poll2.h:46
#2  time_poll (pollfds=pollfds at entry=0x7febe80008c0, n_pollfds=2, handles=handles at entry=0x0, timeout_when=4098990819, elapsed=elapsed at entry=0x7febf7450290) at lib/timeval.c:305
#3  0x00007fec0299afea in poll_block () at lib/poll-loop.c:364
#4  0x00007fec0297da25 in ovsrcu_synchronize () at lib/ovs-rcu.c:231
#5  0x00007fec029e9477 in destroy_device (vid=<optimized out>) at lib/netdev-dpdk.c:2651
#6  0x00007fec0195db04 in vhost_destroy_device (vid=<optimized out>) at /lib/librte_vhost/vhost.c:277
#7  0x00007fec0195cbdc in vhost_user_read_cb (connfd=<optimized out>, dat=0x7febc80008c0, remove=0x7febf7451420) at /lib/librte_vhost/socket.c:269
#8  0x00007fec0195c200 in fdset_event_dispatch (pfdset=pfdset at entry=0x7fec01b70180 <vhost_user+8192>) at /lib/librte_vhost/fd_man.c:266
#9  0x00007fec0195d647 in rte_vhost_driver_session_start () at /lib/librte_vhost/socket.c:714
#10 0x00007fec029e661b in start_vhost_loop (dummy=<optimized out>) at lib/netdev-dpdk.c:2736
#11 0x00007fec0297f796 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348
#12 0x00007febfb9b6dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007febfaf9921d in clone () from /lib64/libc.so.6
(gdb) n


So now I believe it is hard to fix the issues insides the modules, and I would like to present identified issues, and want to hear your proposal or fix.

Br,
Zhike Wang



-----Original Message-----
From: Yuanhan Liu [mailto:yliu at fridaylinux.org] 
Sent: Thursday, January 18, 2018 10:04 PM
To: 王志克
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex

Hi,

Apologize for late review.

On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote:
> From: wang zhike <wangzhike at jd.com>
> 
> v3:
> * Fix duplicate variable name, which leads to unexpected memory write.
> v2:
> * Move fdset_del before conn destroy.
> * Fix coding style.

Note that we prefer to put the change logs after "---" below Signed-off-by,
so that those change logs won't be tracked in the git log history.

> This patch fixes below race condition:
> 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex
>    ->fdset_del->loop to check fd.busy.
> 2. another thread calls fdset_event_dispatch, and the busy flag is
>    changed AFTER handling on the fd, i.e, rcb(). However, the rcb,
>    such as vhost_user_read_cb() would try to retrieve the conn_mutex.
> 
> So issue is that the 1st thread will loop check the flag while holding
> the mutex, while the 2nd thread would be blocked by mutex and can not
> change the flag. Then dead lock is observed.

I then would change the title to "vhost: fix deadlock".

I'm also keen to know how do you reproduce this issue with real-life
APP (say ovs) and how easy it is for reproduce.

> Signed-off-by: zhike wang <wangzhike at jd.com>

Again, you need fix your git config file about your name.

> ---
>  lib/librte_vhost/socket.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 422da00..ea01327 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list {
>  		struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>  
>  		if (!strcmp(vsocket->path, path)) {
> +			int del_fds[MAX_FDS];
> +			int num_of_fds = 0, fd_index;
> +

I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds),
fd_idx".

>  			if (vsocket->is_server) {
>  				fdset_del(&vhost_user.fdset, vsocket->socket_fd);
>  				close(vsocket->socket_fd);
> @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list {
>  				vhost_user_remove_reconnect(vsocket);
>  			}
>  
> +			/* fdset_del() must be called without conn_mutex. */
> +			pthread_mutex_lock(&vsocket->conn_mutex);
> +			for (conn = TAILQ_FIRST(&vsocket->conn_list);
> +			     conn != NULL;
> +			     conn = next) {
> +				next = TAILQ_NEXT(conn, next);
> +
> +				del_fds[num_of_fds++] = conn->connfd;
> +			}
> +			pthread_mutex_unlock(&vsocket->conn_mutex);
> +
> +			for (fd_index = 0; fd_index < num_of_fds; fd_index++)
> +				fdset_del(&vhost_user.fdset, del_fds[fd_index]);
> +
>  			pthread_mutex_lock(&vsocket->conn_mutex);
>  			for (conn = TAILQ_FIRST(&vsocket->conn_list);
>  			     conn != NULL;
>  			     conn = next) {
>  				next = TAILQ_NEXT(conn, next);
>  
> -				fdset_del(&vhost_user.fdset, conn->connfd);

If you log the fd here and invoke fdset_del() and close() after the loop,
you then could avoid one extra loop as you did above.

	--yliu
>  				RTE_LOG(INFO, VHOST_CONFIG,
>  					"free connfd = %d for device '%s'\n",
>  					conn->connfd, path);
> -- 
> 1.8.3.1


More information about the dev mailing list