[ovs-dev] [PATCH] connmgr: Fix vswitchd abort when a port is added and the controller is down
nusiddiq at redhat.com
Wed Oct 17 16:52:51 UTC 2018
On Wed, Oct 17, 2018 at 10:01 PM Ben Pfaff <blp at ovn.org> wrote:
> On Wed, Oct 17, 2018 at 05:33:05PM +0530, nusiddiq at redhat.com wrote:
> > From: Numan Siddique <nusiddiq at redhat.com>
> > We see the below trace when a port is added to a bridge and the
> > controller is down
> > 0x00007fb002f8b207 in raise () from /lib64/libc.so.6
> > 0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
> > 0x00007fb004953026 in ofputil_protocol_to_ofp_version () from /lib64/
> > 0x00007fb00494e38e in ofputil_encode_port_status () from /lib64/
> > 0x00007fb004ef1c5b in connmgr_send_port_status () from
> > 0x00007fb004efa9f4 in ofport_install () from /lib64/libofproto-2.10.so.0
> > 0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0
> > 0x00007fb004efc7f9 in ofproto_port_add () from
> > 0x0000556d540a3f95 in bridge_add_ports__ ()
> > 0x0000556d540a5a47 in bridge_reconfigure ()
> > 0x0000556d540a9199 in bridge_run ()
> > 0x0000556d540a02a5 in main ()
> > When connmgr detects that the connection to the controller is down, it
> > resets the ofconn's protocol to 'OFPUTIL_P_NONE' and that's why we
> > see the above abort. This patch fixes the issue by also checking the
> > connection status before sending the port status in the
> > connmgr_send_port_status().
> > The issue can be reproduced by running the test added in this patch
> > without the fix.
> > Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> This is good work. Thank you.
> In general there's some bad factoring here, but a proper fix would be a
> lot more work.
> I think that we could generalize this by adjusting
> ofconn_receives_async_msg() to return false if the connection is down.
> At least one caller checks already, but it seems like all the callers
> should ignore connections that are down. What do you think of that?
Agree. I will work on it. This issue is seen after the commit - 
- rconn: Introduce new invariant to fix assertion failure in corner case.
So probably the caller should check the connection status by
More information about the dev