[ovs-discuss] intermitting ARP problems on DP interface

Ben Pfaff blp at nicira.com
Wed Oct 19 15:32:10 UTC 2011


On Wed, Oct 19, 2011 at 01:23:53PM +0200, Andreas Schultz wrote:
> Hi all,
> 
> Upon further investigation it turns out the the dpif_linux_open() sequence
> is broken somewhere. The flow of netlink messages simply makes no sence at
> all.
> 
> first we get an create attempt, which is expected to fail since the DP already exists:
> Oct 19 11:07:52|00014|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:60, type=2(error), flags=0, seq=4e9ef0dc, pid=25535(25535:0)) error(-17(File exists), in-reply-to(nl(len:40, type=33(ovs_datapath), flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dc, pid=25535(25535:0))))
> Oct 19 11:07:52|00015|netlink_socket|DBG|received NAK error=17 (File exists)
> 
> now it tries OVS_DP_CMD_GET:
> Oct 19 11:07:52|00016|netlink_socket|DBG|nl_sock_transact_multiple__ (Success): nl(len:32, type=33(ovs_datapath), flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dd, pid=25535(25535:0)),genl(cmd=3,version=1)
> Oct 19 11:07:52|00017|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:84, type=33(ovs_datapath), flags=0, seq=4e9ef0dd, pid=25535(25535:0)),genl(cmd=1,version=1)
> 
> and succeeds. So far so good...
> 
> Next step is to flush any old flows:
> Oct 19 11:07:52|00018|netlink_socket|DBG|nl_sock_transact_multiple__ (Success): nl(len:24, type=35(ovs_flow), flags=5[REQUEST][ACK], seq=4e9ef0de, pid=25535(25535:0)),genl(cmd=2,version=1)
> 
> send that to the kernel...
> 
> and the kernel give us a netlink error report for the OVS_DP_CMD_GET that was already ACKed OK:
> Oct 19 11:07:52|00019|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:36, type=2(error), flags=0, seq=4e9ef0dd, pid=25535(25535:0)) error(0, in-reply-to(nl(len:32, type=33(ovs_datapath), flags=d[REQUEST][ACK][ECHO], seq=4e9ef0dd, pid=25535(25535:0))))
> Oct 19 11:07:52|00020|netlink_socket|DBG|ignoring unexpected seq 0x4e9ef0dd
> 
> I have verified above sequence with strace and the decoded netlink messages where indeed send to and received from the kernel. So it is not a buffering issue in the controller.

This is a red herring.  It's very common for a Netlink request to have
two replies when the ACK flag is set, because the kernel unconditionally
sends an "error" reply after the command implementation itself sends any
reply of its own.  We just ignore the second reply in userspace; it's
harmless.



More information about the discuss mailing list