[ovs-dev] [PATCH] netlink-socket: Work around upstream kernel Netlink bug.
Ben Pfaff
blp at nicira.com
Wed Jul 2 20:36:16 UTC 2014
On Mon, Jun 30, 2014 at 02:59:45PM -0700, Ben Pfaff wrote:
> The upstream kernel net/netlink/af_netlink.c netlink_recvmsg() contains the
> following code to refill the Netlink socket buffer with more dump skbs
> while a dump is in progress:
>
> if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
> ret = netlink_dump(sk);
> if (ret) {
> sk->sk_err = ret;
> sk->sk_error_report(sk);
> }
> }
>
> The netlink_dump() function that this calls returns a negative number on
> error, the convention used throughout the kernel, and thus sk->sk_err
> receives a negative value on error.
>
> However, sk->sk_err is supposed to contain either 0 or a positive errno
> value, as one can see from a quick "grep" through net for 'sk_err =', e.g.:
>
> ipv4/tcp.c:2067: sk->sk_err = ECONNRESET;
> ipv4/tcp.c:2069: sk->sk_err = ECONNRESET;
> ipv4/tcp_input.c:4106: sk->sk_err = ECONNREFUSED;
> ipv4/tcp_input.c:4109: sk->sk_err = EPIPE;
> ipv4/tcp_input.c:4114: sk->sk_err = ECONNRESET;
> netlink/af_netlink.c:741: sk->sk_err = ENOBUFS;
> netlink/af_netlink.c:1796: sk->sk_err = ENOBUFS;
> packet/af_packet.c:2476: sk->sk_err = ENETDOWN;
> unix/af_unix.c:341: other->sk_err = ECONNRESET;
> unix/af_unix.c:407: skpair->sk_err = ECONNRESET;
>
> The result is that the next attempt to receive from the socket will return
> the error to userspace with the wrong sign.
>
> (The root of the error in this case is that multiple threads are attempting
> to read a single flow dump from a shared fd. That should work, but the
> kernel has an internal race that can result in one or more of those threads
> hitting the EINVAL case at the start of netlink_dump(). The EINVAL is
> harmless in this case and userspace should be able to ignore it, but
> reporting the EINVAL as if it were a 22-byte message received in userspace
> throws a real wrench in the works.)
>
> This bug makes me think that there are probably not many programs doing
> multithreaded Netlink dumps. Maybe it is good that we are considering
> other approaches.
>
> VMware-BZ: #1255704
> Reported-by: Mihir Gangar <gangarm at vmware.com>
> Signed-off-by: Ben Pfaff <blp at nicira.com>
Alex acked this off-list, so I applied it to master, branch-2.3, and
branch-2.2.
More information about the dev
mailing list