[ovs-discuss] intermitting ARP problems on DP interface

Andreas Schultz aschultz at tpip.net
Wed Oct 19 16:13:11 UTC 2011


Hi Ben,

It might be harmless, but it is followed by tons of similar errors and why
errors anyway?.
Also, there is not a single such error on the first start of controller.
Only when I kill the fist instance and start it again those error occur
and the switch is acting erratically. Removing the dp interface in between
also does not help.

Any ideas?

Andreas

----- Original Message -----
> On Wed, Oct 19, 2011 at 01:23:53PM +0200, Andreas Schultz wrote:
> > Hi all,
> > 
> > Upon further investigation it turns out the the dpif_linux_open()
> > sequence
> > is broken somewhere. The flow of netlink messages simply makes no
> > sence at
> > all.
> > 
> > first we get an create attempt, which is expected to fail since the
> > DP already exists:
> > Oct 19 11:07:52|00014|netlink_socket|DBG|nl_sock_recv__ (Success):
> > nl(len:60, type=2(error), flags=0, seq=4e9ef0dc,
> > pid=25535(25535:0)) error(-17(File exists), in-reply-to(nl(len:40,
> > type=33(ovs_datapath), flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dc,
> > pid=25535(25535:0))))
> > Oct 19 11:07:52|00015|netlink_socket|DBG|received NAK error=17
> > (File exists)
> > 
> > now it tries OVS_DP_CMD_GET:
> > Oct 19
> > 11:07:52|00016|netlink_socket|DBG|nl_sock_transact_multiple__
> > (Success): nl(len:32, type=33(ovs_datapath),
> > flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dd,
> > pid=25535(25535:0)),genl(cmd=3,version=1)
> > Oct 19 11:07:52|00017|netlink_socket|DBG|nl_sock_recv__ (Success):
> > nl(len:84, type=33(ovs_datapath), flags=0, seq=4e9ef0dd,
> > pid=25535(25535:0)),genl(cmd=1,version=1)
> > 
> > and succeeds. So far so good...
> > 
> > Next step is to flush any old flows:
> > Oct 19
> > 11:07:52|00018|netlink_socket|DBG|nl_sock_transact_multiple__
> > (Success): nl(len:24, type=35(ovs_flow), flags=5[REQUEST][ACK],
> > seq=4e9ef0de, pid=25535(25535:0)),genl(cmd=2,version=1)
> > 
> > send that to the kernel...
> > 
> > and the kernel give us a netlink error report for the
> > OVS_DP_CMD_GET that was already ACKed OK:
> > Oct 19 11:07:52|00019|netlink_socket|DBG|nl_sock_recv__ (Success):
> > nl(len:36, type=2(error), flags=0, seq=4e9ef0dd,
> > pid=25535(25535:0)) error(0, in-reply-to(nl(len:32,
> > type=33(ovs_datapath), flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dd,
> > pid=25535(25535:0))))
> > Oct 19 11:07:52|00020|netlink_socket|DBG|ignoring unexpected seq
> > 0x4e9ef0dd
> > 
> > I have verified above sequence with strace and the decoded netlink
> > messages where indeed send to and received from the kernel. So it
> > is not a buffering issue in the controller.
> 
> This is a red herring.  It's very common for a Netlink request to
> have
> two replies when the ACK flag is set, because the kernel
> unconditionally
> sends an "error" reply after the command implementation itself sends
> any
> reply of its own.  We just ignore the second reply in userspace; it's
> harmless.
> 

-- 
-- 
Dipl. Inform.
Andreas Schultz



More information about the discuss mailing list