[ovs-git] Open vSwitch: netlink-socket: Work around kernel Netlink dump thread races. (master)

dev at openvswitch.org dev at openvswitch.org
Fri Jul 11 00:04:09 UTC 2014


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "Open vSwitch".

The branch, master has been updated
       via  0791315e4dbf82e293a759086ee17c84b12b1d6e (commit)
       via  7f8e264600ec442f7cccc357bf299172419d6da9 (commit)
      from  fe90efd9d69ad84c5086442b6400be9e92aa1e9e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 0791315e4dbf82e293a759086ee17c84b12b1d6e
Diffs: http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=0791315e4dbf82e293a759086ee17c84b12b1d6e
Author: Ben Pfaff <blp at nicira.com>
		
netlink-socket: Work around kernel Netlink dump thread races.
		
The Linux kernel Netlink implementation has two races that cause problems
for processes that attempt to dump a table in a multithreaded manner.

The first race is in the structure of the kernel netlink_recv() function.
This function pulls a message from the socket queue and, if there is none,
reports EAGAIN:
	skb = skb_recv_datagram(sk, flags, noblock, &err);
	if (skb == NULL)
		goto out;
Only if a message is successfully read from the socket queue does the
function, toward the end, try to queue up a new message to be dumped:
	if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
		ret = netlink_dump(sk);
		if (ret) {
			sk->sk_err = ret;
			sk->sk_error_report(sk);
		}
	}
This means that if thread A reads a message from a dump, then thread B
attempts to read one before A queues up the next, B will get EAGAIN.  This
means that, following EAGAIN, B needs to wait until A returns to userspace
before it tries to read the socket again.  nl_dump_next() already does
this, using 'dump->status_seq' (although the need for it has never been
explained clearly, to my knowledge).

The second race is more serious.  Suppose thread X and thread Y both
simultaneously attempt to queue up a new message to be dumped, using the
call to netlink_dump() quoted above.  netlink_dump() begins with:
	mutex_lock(nlk->cb_mutex);

	cb = nlk->cb;
	if (cb == NULL) {
		err = -EINVAL;
		goto errout_skb;
	}
Suppose that X gets cb_mutex first and finds that the dump is complete.  It
will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to
indicate that no dump is in progress and release the mutex:
	nlk->cb = NULL;
	mutex_unlock(nlk->cb_mutex);
When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and
return -EINVAL as quoted above.  netlink_recv() stuffs -EINVAL in sk_err,
but that error is not reported immediately; instead, it is saved for the
next read from the socket.  Since Open vSwitch maintains a pool of Netlink
sockets, that next failure can crop up pretty much anywhere.  One of the
worst places for it to crop up is in the execution of a later transaction
(e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink
transactions as idempotent and will re-execute them when socket errors
occur.  For a transaction that sends a packet, this causes packet
duplication, which we actually observed in practice.  (ENOBUFS should
actually cause transactions to be re-executed in many cases, but EINVAL
should not; this is a separate bug in the userspace netlink code.)

VMware-BZ: #1283188
Reported-and-tested-by: Alex Wang <alexw at nicira.com>
Signed-off-by: Ben Pfaff <blp at nicira.com>
Acked-by: Alex Wang <alexw at nicira.com>


commit 7f8e264600ec442f7cccc357bf299172419d6da9
Diffs: http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=7f8e264600ec442f7cccc357bf299172419d6da9
Author: Ben Pfaff <blp at nicira.com>
		
netlink-socket: Fix sign of error code.
		
Commit 8f20fd98db (netlink-socket: Work around upstream kernel Netlink
bug.) got the sign of the error code wrong, so that it reported e.g. -22
for EINVAL to nl_sock_recv__()'s caller, instead of 22.

Signed-off-by: Ben Pfaff <blp at nicira.com>
Acked-by: Alex Wang <alexw at nicira.com>


-----------------------------------------------------------------------

Summary of changes:
 lib/netlink-socket.c |   12 +++++++++++-
 lib/netlink-socket.h |    2 ++
 2 files changed, 13 insertions(+), 1 deletion(-)


hooks/post-receive
-- 
Open vSwitch



More information about the git mailing list