[ovs-dev] [RFC 1/4] netlink: Support for memory mapped Netlink sockets

Fri May 23 08:03:54 UTC 2014

On 05/22/14 at 05:20pm, Ben Pfaff wrote:
> On Fri, May 23, 2014 at 01:15:58AM +0200, Thomas Graf wrote:
> > Signed-off-by: Thomas Graf <tgraf at suug.ch>
> 
> Thanks for the patches.

Thanks for the prompt review.
> 
> Definitions in netlink-protocol.h have to be written from the
> perspective that we might be on a non-Linux host and compiling with a
> non-GCC compiler (in particular MSVC).  The definition of
> __ALIGN_KERNEL is problematic from the latter perspective, since MSVC
> doesn't have typeof.  I am also not sure whether definition of __u32
> is universal in practice; I would tend to use uint32_t.

I'll fix these up.

> The comment on nl_sock_create_mmap() says that it falls back to an
> unmapped socket if it cannot create a mapped one, but the
> implementation appears to fail hard in that case.  The return value
> convention also seems to be mixed up versus the comment.

I can see how the code gives that impression, I'll address that.
nl_sock_set_ring() does already return 0 if the setsockopt() for
NETLINK_RX_RING fails. So we'll fall back if mmap is unsupported
but we'll fail hard if the Netlink mmap feature is supported and
the setup thereof fails.

> We do not usually write code of this form:
>     if ((retval = nl_sock_create(protocol, sockp)) < 0) {
> instead:
>     retval = nl_sock_create(protocol, sockp);
>     if (retval < 0) {

OK

> I guess that the double cast in mmap_frame() is due to the issue that
> we usually document by using the ALIGNED_CAST macro.

I removed the double cast and found no warning from gcc or sparse.
I don't remember why I had added it in the first place ;-)

> One of the if statements in nl_sock_send_mmap() lacks {} around its
> conditional statement.

OK

> I don't think that it is a good idea to use nl_sock_wait() and
> poll_block() inside nl_sock_send_mmap(), because this could interact
> with any poll set being built up by the caller.  The caller ordinarily
> wouldn't be doing that (it's not the model we use) but it still
> doesn't seem quite wise.  I would be more inclined to use the poll()
> function directly here to wait for a slot to become available.

I was torn on this and had a pure call to poll() in the first
prototype for just that reason but then wanted to reuse the
existing infrastructure with the assumption that if the caller
passes wait he would not use the same fd non-blocking.

I'll be happy to convert that to a direct poll() call for both
nl_send_mmap() and nl_recv_mmap()

> Using nl_sock_send_linear() can cause messages to be reordered.  Do we
> need to wait for the tx ring to empty before calling it?

Are you referring to parallel usage of _send_linear() and
_send_mmap() on the same socket? The ring does support carrying
a do-a-linear-read event which allows parallel use while preserving
order but the caller may not decide to do so on his own. Going from
maped to unmapped on a live socket would be a nice feature but would
require kernel support first.

> Do we need a memory barrier before setting nm_status?  (Can the kernel
> running on a different CPU consume the TX buffer before we call
> sendto()?)

The sendto() is the trigger for the kernel to look at the ring so
we're good as long as we don't start writing to the ring from multiple
threads. The kernel does a wmb() after every status update though as
it writes to the same ring from multiple kthreads.
> 
> I need to continue reading the patch at nl_sock_recv_mmap().

Thanks. I'll wait for additional feedback.