[ovs-dev] [PATCH] netlink-socket: Work around upstream kernel Netlink bug.

Ben Pfaff blp at nicira.com
Wed Jul 2 20:36:16 UTC 2014


On Mon, Jun 30, 2014 at 02:59:45PM -0700, Ben Pfaff wrote:
> The upstream kernel net/netlink/af_netlink.c netlink_recvmsg() contains the
> following code to refill the Netlink socket buffer with more dump skbs
> while a dump is in progress:
> 
> 	if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
> 		ret = netlink_dump(sk);
> 		if (ret) {
> 			sk->sk_err = ret;
> 			sk->sk_error_report(sk);
> 		}
> 	}
> 
> The netlink_dump() function that this calls returns a negative number on
> error, the convention used throughout the kernel, and thus sk->sk_err
> receives a negative value on error.
> 
> However, sk->sk_err is supposed to contain either 0 or a positive errno
> value, as one can see from a quick "grep" through net for 'sk_err =', e.g.:
> 
>     ipv4/tcp.c:2067:		sk->sk_err = ECONNRESET;
>     ipv4/tcp.c:2069:		sk->sk_err = ECONNRESET;
>     ipv4/tcp_input.c:4106:		sk->sk_err = ECONNREFUSED;
>     ipv4/tcp_input.c:4109:		sk->sk_err = EPIPE;
>     ipv4/tcp_input.c:4114:		sk->sk_err = ECONNRESET;
>     netlink/af_netlink.c:741:			sk->sk_err = ENOBUFS;
>     netlink/af_netlink.c:1796:			sk->sk_err = ENOBUFS;
>     packet/af_packet.c:2476:		sk->sk_err = ENETDOWN;
>     unix/af_unix.c:341:			other->sk_err = ECONNRESET;
>     unix/af_unix.c:407:				skpair->sk_err = ECONNRESET;
> 
> The result is that the next attempt to receive from the socket will return
> the error to userspace with the wrong sign.
> 
> (The root of the error in this case is that multiple threads are attempting
> to read a single flow dump from a shared fd.  That should work, but the
> kernel has an internal race that can result in one or more of those threads
> hitting the EINVAL case at the start of netlink_dump().  The EINVAL is
> harmless in this case and userspace should be able to ignore it, but
> reporting the EINVAL as if it were a 22-byte message received in userspace
> throws a real wrench in the works.)
> 
> This bug makes me think that there are probably not many programs doing
> multithreaded Netlink dumps.  Maybe it is good that we are considering
> other approaches.
> 
> VMware-BZ: #1255704
> Reported-by: Mihir Gangar <gangarm at vmware.com>
> Signed-off-by: Ben Pfaff <blp at nicira.com>

Alex acked this off-list, so I applied it to master, branch-2.3, and
branch-2.2.



More information about the dev mailing list