[ovs-discuss] intermitting ARP problems on DP interface

Ben Pfaff blp at nicira.com
Wed Oct 19 16:32:35 UTC 2011


The error type is 0, which indicates "success".  It is the form of
acknowledgment sent when an operation completes successfully.

I've diagnosed the first such error that you posted.  I can't diagnose
the ones that I have not seen.

We haven't seen anything like this internally.  I suggest using
ovs-vswitchd instead of a toy test program that is only intended for
unit tests.

On Wed, Oct 19, 2011 at 06:13:11PM +0200, Andreas Schultz wrote:
> Hi Ben,
> 
> It might be harmless, but it is followed by tons of similar errors and why
> errors anyway?.
> Also, there is not a single such error on the first start of controller.
> Only when I kill the fist instance and start it again those error occur
> and the switch is acting erratically. Removing the dp interface in between
> also does not help.
> 
> Any ideas?
> 
> Andreas
> 
> ----- Original Message -----
> > On Wed, Oct 19, 2011 at 01:23:53PM +0200, Andreas Schultz wrote:
> > > Hi all,
> > > 
> > > Upon further investigation it turns out the the dpif_linux_open()
> > > sequence
> > > is broken somewhere. The flow of netlink messages simply makes no
> > > sence at
> > > all.
> > > 
> > > first we get an create attempt, which is expected to fail since the
> > > DP already exists:
> > > Oct 19 11:07:52|00014|netlink_socket|DBG|nl_sock_recv__ (Success):
> > > nl(len:60, type=2(error), flags=0, seq=4e9ef0dc,
> > > pid=25535(25535:0)) error(-17(File exists), in-reply-to(nl(len:40,
> > > type=33(ovs_datapath), flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dc,
> > > pid=25535(25535:0))))
> > > Oct 19 11:07:52|00015|netlink_socket|DBG|received NAK error=17
> > > (File exists)
> > > 
> > > now it tries OVS_DP_CMD_GET:
> > > Oct 19
> > > 11:07:52|00016|netlink_socket|DBG|nl_sock_transact_multiple__
> > > (Success): nl(len:32, type=33(ovs_datapath),
> > > flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dd,
> > > pid=25535(25535:0)),genl(cmd=3,version=1)
> > > Oct 19 11:07:52|00017|netlink_socket|DBG|nl_sock_recv__ (Success):
> > > nl(len:84, type=33(ovs_datapath), flags=0, seq=4e9ef0dd,
> > > pid=25535(25535:0)),genl(cmd=1,version=1)
> > > 
> > > and succeeds. So far so good...
> > > 
> > > Next step is to flush any old flows:
> > > Oct 19
> > > 11:07:52|00018|netlink_socket|DBG|nl_sock_transact_multiple__
> > > (Success): nl(len:24, type=35(ovs_flow), flags=5[REQUEST][ACK],
> > > seq=4e9ef0de, pid=25535(25535:0)),genl(cmd=2,version=1)
> > > 
> > > send that to the kernel...
> > > 
> > > and the kernel give us a netlink error report for the
> > > OVS_DP_CMD_GET that was already ACKed OK:
> > > Oct 19 11:07:52|00019|netlink_socket|DBG|nl_sock_recv__ (Success):
> > > nl(len:36, type=2(error), flags=0, seq=4e9ef0dd,
> > > pid=25535(25535:0)) error(0, in-reply-to(nl(len:32,
> > > type=33(ovs_datapath), flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dd,
> > > pid=25535(25535:0))))
> > > Oct 19 11:07:52|00020|netlink_socket|DBG|ignoring unexpected seq
> > > 0x4e9ef0dd
> > > 
> > > I have verified above sequence with strace and the decoded netlink
> > > messages where indeed send to and received from the kernel. So it
> > > is not a buffering issue in the controller.
> > 
> > This is a red herring.  It's very common for a Netlink request to
> > have
> > two replies when the ACK flag is set, because the kernel
> > unconditionally
> > sends an "error" reply after the command implementation itself sends
> > any
> > reply of its own.  We just ignore the second reply in userspace; it's
> > harmless.
> > 
> 
> -- 
> -- 
> Dipl. Inform.
> Andreas Schultz



More information about the discuss mailing list