[ovs-dev] Possible Regression in 'netdev: Add new "struct netdev_rx" for capturing packets from a netdev.'

Simon Horman horms at verge.net.au
Thu Jun 13 00:34:09 UTC 2013


On Wed, Jun 12, 2013 at 08:08:50AM -0700, Ben Pfaff wrote:
> On Wed, Jun 12, 2013 at 07:11:50AM -0700, Murphy McCauley wrote:
> > 
> > On Jun 12, 2013, at 6:28 AM, Ed Maste wrote:
> > 
> > > On 12 June 2013 07:04, Murphy McCauley <murphy.mccauley at gmail.com> wrote:
> > >> (Sorry this isn't an actual reply and is missing context -- I wasn't on the list when it was originally posted.)
> > >> 
> > >> Simon and I have been in touch about this, and I thought I'd share my findings for what they're worth.
> > >> 
> > >> The problem is from the commit Simon mentioned (796223f5bc3a4896e6398733c798390158479400).  Specifically, it's in netdev-linux.c in netdev_linux_send().
> > >> 
> > >> The new version always sends using the "sender" socket made by af_packet_sock() unless the interface is a tap, in which case it sends it using the tap fd.  This differs from the old version which sent using whatever was in the fd field of the netdev if it was available.  For tap interfaces, this was the tap fd, so the result was the same as it is now.  But for other interfaces, this held the socket opened for receiving if the interface was listening (which was maybe never "right" in some sense and isn't convenient anymore since this socket descriptor is no longer stored in the non-rx netdev).
> > >> 
> > >> The comments indicate that the exception is made for tap interfaces since writing to a tap interface with an AF_PACKET socket results in receiving the packet you just wrote.  However, I don't think this behavior is limited to taps.  Since the old version of the code sent and received with the same socket descriptor, I think the loop was fixed by the check in dev_queue_xmit_nit() in net/core/dev.c.  Since they're two different socket descriptors now, this no longer works and you get the loop.
> > > 
> > > Ahh, it turns out Ben explained this to me when I ran into a related
> > > issue with the FreeBSD userspace implementation.  Ben's message in the
> > > thread is at http://openvswitch.org/pipermail/dev/2012-July/018806.html
> > > .
> > > 
> > >> I fixed it (I think) by adding a BPF packet filter on the rx socket so that it only receives incoming packets.  There's probably a better fix, but you're welcome to the patch if you want it.
> > > 
> > > I think it's worth taking a look.
> > 
> > 
> > I think this is more or less against master.
> > 
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index d73115b..cc47a6b 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -744,6 +744,14 @@ netdev_linux_rx_open(struct netdev *netdev_, struct netdev_rx **rxp)
> >      } else {
> >          struct sockaddr_ll sll;
> >          int ifindex;
> > +        /* Result of tcpdump -dd inbound */
> > +        static struct sock_filter filt[] = {
> > +            { 0x28, 0, 0, 0xfffff004 },
> > +            { 0x15, 0, 1, 0x00000004 },
> > +            { 0x6, 0, 0, 0x00000000 },
> > +            { 0x6, 0, 0, 0x0000ffff }
> > +        };
> > +        static struct sock_fprog fprog = {ARRAY_SIZE(filt), filt};
> >  
> >          /* Create file descriptor. */
> >          fd = socket(PF_PACKET, SOCK_RAW, 0);
> > @@ -776,6 +784,16 @@ netdev_linux_rx_open(struct netdev *netdev_, struct netdev_rx **rxp)
> >                       netdev_get_name(netdev_), strerror(error));
> >              goto error;
> >          }
> > +
> > +        /* Filter for only incoming packets. */
> > +        error = setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &fprog,
> > +                           sizeof fprog);
> > +        if (error) {
> > +            error = errno;
> > +            VLOG_ERR("%s: failed attach filter (%s)",
> > +                     netdev_get_name(netdev_), strerror(error));
> > +            goto error;
> > +        }
> >      }
> >  
> >      rx = xmalloc(sizeof *rx);
> 
> Thanks, Murphy.
> 
> Simon, does this solve the problem you see?

Hi Ben, Hi all,

I have lightly tested this using the scenarios that I mentioned
in my initial post - multicast IPv6 and IPv6, broadcast ARP - and
yes, this seems to resolve the problem that I observed.

Tested-by: Simon Horman <horms at verge.net.au>




More information about the dev mailing list