[ovs-dev] [PATCH] connmgr: Fix vswitchd abort when a port is added and the controller is down

Numan Siddique nusiddiq at redhat.com
Wed Oct 17 15:49:06 UTC 2018


On Wed, Oct 17, 2018 at 7:45 PM Eelco Chaudron <echaudro at redhat.com> wrote:

>
>
> On 17 Oct 2018, at 14:03, nusiddiq at redhat.com wrote:
>
> > From: Numan Siddique <nusiddiq at redhat.com>
> >
> > We see the below trace when a port is added to a bridge and the
> > configured
> > controller is down
> >
> > 0x00007fb002f8b207 in raise () from /lib64/libc.so.6
> > 0x00007fb002f8c8f8 in abort () from /lib64/libc.so.6
> > 0x00007fb004953026 in ofputil_protocol_to_ofp_version () from
> > /lib64/libopenvswitch-2.10.so.0
> > 0x00007fb00494e38e in ofputil_encode_port_status () from
> > /lib64/libopenvswitch-2.10.so.0
> > 0x00007fb004ef1c5b in connmgr_send_port_status () from
> > /lib64/libofproto-2.10.so.0
> > 0x00007fb004efa9f4 in ofport_install () from
> > /lib64/libofproto-2.10.so.0
> > 0x00007fb004efbfb2 in update_port () from /lib64/libofproto-2.10.so.0
> > 0x00007fb004efc7f9 in ofproto_port_add () from
> > /lib64/libofproto-2.10.so.0
> > 0x0000556d540a3f95 in bridge_add_ports__ ()
> > 0x0000556d540a5a47 in bridge_reconfigure ()
> > 0x0000556d540a9199 in bridge_run ()
> > 0x0000556d540a02a5 in main ()
> >
>
> I have a similar crash with the following backtrace:
>
> #0  0x00007f3c6524b207 in raise () from /lib64/libc.so.6
> #1  0x00007f3c6524c8f8 in abort () from /lib64/libc.so.6
> #2  0x00007f3c66c06cb7 in ofputil_encode_flow_removed
> (fr=fr at entry=0x7f3c59ff9b80, protocol=<optimized out>)
>      at lib/ofp-monitor.c:293
> #3  0x00007f3c671b1db3 in connmgr_send_flow_removed
> (mgr=mgr at entry=0x56197f5a4800, fr=fr at entry=0x7f3c59ff9b80)
>      at ofproto/connmgr.c:1702
> #4  0x00007f3c671b7464 in ofproto_rule_send_removed
> (rule=0x56197f69db80) at ofproto/ofproto.c:5729
> #5  0x00007f3c671bdc3d in rule_destroy_cb (rule=0x56197f69db80) at
> ofproto/ofproto.c:2839
> #6  0x00007f3c66c1e88e in ovsrcu_call_postponed () at lib/ovs-rcu.c:342
> #7  0x00007f3c66c1ea94 in ovsrcu_postpone_thread (arg=<optimized out>)
> at lib/ovs-rcu.c:357
> #8  0x00007f3c66c20d2f in ovsthread_wrapper (aux_=<optimized out>) at
> lib/ovs-thread.c:354
> #9  0x00007f3c66000dd5 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007f3c65313b3d in clone () from /lib64/libc.so.6
>
> > When connmgr detects that the connection to the controller is down, it
> > resets the ofconn's protocol to 'OFPUTIL_P_NONE' and that's why we
> > see the above abort. This patch fixes the issue by also checking the
> > connection status before sending the port status in the
> >  connmgr_send_port_status().
>
> Same issue, in my case the connection is in S_BACKOFF state.
> >
> > The issue can be reproduced by running the test added in this patch
> > without the fix.
> >
> > Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> > ---
> >  ofproto/connmgr.c |  3 ++-
> >  tests/bridge.at   | 21 +++++++++++++++++++++
> >  2 files changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> > index f78b4c5ff..02ba75938 100644
> > --- a/ofproto/connmgr.c
> > +++ b/ofproto/connmgr.c
> > @@ -1624,7 +1624,8 @@ connmgr_send_port_status(struct connmgr *mgr,
> > struct ofconn *source,
> >      ps.reason = reason;
> >      ps.desc = *pp;
> >      LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
> > -        if (ofconn_receives_async_msg(ofconn, OAM_PORT_STATUS,
> > reason)) {
> > +        if (ofconn_receives_async_msg(ofconn, OAM_PORT_STATUS,
> > reason) &&
> > +            rconn_is_connected(ofconn->rconn)) {
> >              struct ofpbuf *msg;
>
> I could add a similar fix in connmgr_send_flow_removed(). However, I was
> wondering why this problem is surfacing now, did anything change that
> would start to trigger this issue?
>

You are right. Probably it's better to figure out why the function
"ofconn_receives_async_msg" is returning true now
while it was returning false earlier when connection to controller is lost
and fix in the right place.

I can see the issue with master, branch 2.10, v2.10.0 and with branch 2.9.
However I don't see the issue with v2.9.2.

Thanks
Numan


> >              /* Before 1.5, OpenFlow specified that OFPT_PORT_MOD
> > should not
> > diff --git a/tests/bridge.at b/tests/bridge.at
> > index 1c3618563..ee398bdb1 100644
> > --- a/tests/bridge.at
> > +++ b/tests/bridge.at
> > @@ -79,3 +79,24 @@ AT_CHECK([ovs-vsctl --columns=status list
> > controller | dnl
> >  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> >  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >  AT_CLEANUP
> > +
> > +AT_SETUP([bridge - add port after stopping controller])
> > +OVS_VSWITCHD_START
> > +
> > +dnl Start ovs-testcontroller
> > +ovs-testcontroller --detach punix:controller
> > --pidfile=ovs-testcontroller.pid
> > +OVS_WAIT_UNTIL([test -e controller])
> > +
> > +AT_CHECK([ovs-vsctl set-controller br0 unix:controller])
> > +AT_CHECK([ovs-vsctl add-port br0 p1 -- set Interface p1
> > type=internal], [0], [ignore])
> > +AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore])
> > +
> > +# Now kill the ovs-testcontroller
> > +kill `cat ovs-testcontroller.pid`
> > +OVS_WAIT_UNTIL([! test -e controller])
> > +AT_CHECK([ovs-vsctl --no-wait add-port br0 p2 -- set Interface p2
> > type=internal], [0], [ignore])
> > +AT_CHECK([ovs-appctl -t ovs-vswitchd version], [0], [ignore])
> > +
> > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > +AT_CLEANUP
> > --
> > 2.17.2
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list