[ovs-dev] [RFC 1/4] netlink: Support for memory mapped Netlink sockets

Fri May 23 15:51:25 UTC 2014

On Fri, May 23, 2014 at 09:03:54AM +0100, Thomas Graf wrote:
> On 05/22/14 at 05:20pm, Ben Pfaff wrote:
> > On Fri, May 23, 2014 at 01:15:58AM +0200, Thomas Graf wrote:
> > > Signed-off-by: Thomas Graf <tgraf at suug.ch>
> > I guess that the double cast in mmap_frame() is due to the issue that
> > we usually document by using the ALIGNED_CAST macro.
> 
> I removed the double cast and found no warning from gcc or sparse.
> I don't remember why I had added it in the first place ;-)

OK.  Clang is sometimes pickier on this account, but I'll mention that
if I notice it on v2.

> > I don't think that it is a good idea to use nl_sock_wait() and
> > poll_block() inside nl_sock_send_mmap(), because this could interact
> > with any poll set being built up by the caller.  The caller ordinarily
> > wouldn't be doing that (it's not the model we use) but it still
> > doesn't seem quite wise.  I would be more inclined to use the poll()
> > function directly here to wait for a slot to become available.
> 
> I was torn on this and had a pure call to poll() in the first
> prototype for just that reason but then wanted to reuse the
> existing infrastructure with the assumption that if the caller
> passes wait he would not use the same fd non-blocking.
> 
> I'll be happy to convert that to a direct poll() call for both
> nl_send_mmap() and nl_recv_mmap()

Thanks.  I think that is a good idea, not because use of the
infrastructure will not work here (I think that it will) but because it
goes against the grain of intent for that infrastructure in this case.

> > Using nl_sock_send_linear() can cause messages to be reordered.  Do we
> > need to wait for the tx ring to empty before calling it?
> 
> Are you referring to parallel usage of _send_linear() and
> _send_mmap() on the same socket? The ring does support carrying
> a do-a-linear-read event which allows parallel use while preserving
> order but the caller may not decide to do so on his own. Going from
> maped to unmapped on a live socket would be a nice feature but would
> require kernel support first.

If we have a series of back-to-back nl_sock_send_mmap() calls, and one
or a few of them are too big for the frame size, then nothing currently
assures that the kernel will receive them in the order sent.  I am not
sure that it matters, but that is what I was trying to point out.

> > Do we need a memory barrier before setting nm_status?  (Can the kernel
> > running on a different CPU consume the TX buffer before we call
> > sendto()?)
> 
> The sendto() is the trigger for the kernel to look at the ring so
> we're good as long as we don't start writing to the ring from multiple
> threads. The kernel does a wmb() after every status update though as
> it writes to the same ring from multiple kthreads.

OK.