[ovs-dev] [PATCH] netdev-dummy: fix crash with more than one passive connection

Ryan Moats rmoats at us.ibm.com
Fri Jul 22 22:25:06 UTC 2016


Ben Pfaff <blp at ovn.org> wrote on 07/22/2016 05:18:44 PM:

> From: Ben Pfaff <blp at ovn.org>
> To: Ryan Moats/Omaha/IBM at IBMUS
> Cc: Lance Richardson <lrichard at redhat.com>, dev at openvswitch.org
> Date: 07/22/2016 05:18 PM
> Subject: Re: [ovs-dev] [PATCH] netdev-dummy: fix crash with more
> than one passive connection
>
> On Tue, Jul 19, 2016 at 09:20:37AM -0500, Ryan Moats wrote:
> > Lance Richardson <lrichard at redhat.com> wrote on 07/19/2016 09:07:41 AM:
> >
> > > From: Lance Richardson <lrichard at redhat.com>
> > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > Cc: dev at openvswitch.org
> > > Date: 07/19/2016 09:07 AM
> > > Subject: Re: [ovs-dev] [PATCH] netdev-dummy: fix crash with more
> > > than one passive connection
> > >
> > > ----- Original Message -----
> > > > From: "Ryan Moats" <rmoats at us.ibm.com>
> > > > To: "Lance Richardson" <lrichard at redhat.com>
> > > > Cc: dev at openvswitch.org
> > > > Sent: Monday, July 18, 2016 11:36:27 PM
> > > > Subject: Re: [ovs-dev] [PATCH] netdev-dummy: fix crash with more
> > > than   one   passive connection
> > > >
> > > > "dev" <dev-bounces at openvswitch.org> wrote on 07/18/2016 11:29:28
AM:
> > > >
> > > > >
> > > > > > From: "Lance Richardson" <lrichard at redhat.com>
> > > > > > To: dev at openvswitch.org
> > > > > > Sent: Wednesday, July 6, 2016 7:39:52 PM
> > > > > > Subject: [ovs-dev] [PATCH] netdev-dummy: fix crash with more
than
> > > > > one   passive connection
> > > > > >
> > > > > > Investigation found that Some of the occasional failures in the
> > > > > > "ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS" test case are
caused
> > > > > > by ovs-vswitchd crashing with SIGSEGV. It turns out that the
> > > > > > crash occurrs when the number of netdev-dummy passive
connections
> > > > > > transitions from 1 to 2.  When xrealloc() copies the array of
> > > > > > dummy_packet_stream structures from the original buffer to a
> > > > > > newly allocated one, the struct ovs_list txq member of the
> > structure
> > > > > > becomes corrupt (e.g. if ovs_list_is_empty() would have
returned
> > > > > > false before the copy, it will return true after the copy,
which
> > > > > > will lead to a crash when the bogus packet buffer on the list
is
> > > > > > dereferenced).
> > > > > >
> > > > > > Fix by taking a hint from David Wheeler and adding a level of
> > > > > > indirection.
> > > > > >
> > > > > > Signed-off-by: Lance Richardson <lrichard at redhat.com>
> > > >
> > > > [snip]
> > > >
> > > > >
> > > > > Here is a small script that reliably reproduces the crash in
> > > > ovs-vswitchd.
> > > > > I don't have an explanation for why we have two connections to
the
> > same
> > > > > port in the "ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS" test case
> > (this
> > > > > happens infrequently), perhaps it's something like the active
connect
> > > > from
> > > > > a peer ovs-vswitchd being interrupted and re-tried.
> > > > >
> > > > > #!/bin/bash
> > > > >
> > > > > set -ex
> > > > >
> > > > > PWD=$(pwd)
> > > > > export PATH=${PWD}/vswitchd:${PWD}/utilities:${PWD}/ovsdb:$PATH
> > > > >
> > > > > ovs_setenv() {
> > > > >     export OVS_RUNDIR=${PWD}/$1
> > > > >     export OVS_LOGDIR=${PWD}/$1
> > > > >     export OVS_DBDIR=${PWD}/$1
> > > > >     export OVS_SYSCONFDIR=${PWD}/$1
> > > > >     export OVS_PKGDATADIR=${PWD}/$1
> > > > > }
> > > > >
> > > > > for x in repro_main repro_hv1 repro_hv2; do
> > > > >     mkdir -p $x
> > > > >     rm -f $x/*
> > > > >     ovs_setenv $x
> > > > >     : > $OVS_RUNDIR/.conf.db.~lock~
> > > > >     ovsdb-tool create $OVS_RUNDIR/conf.db
vswitchd/vswitch.ovsschema
> > > > >     ovsdb-server --remote=punix:$OVS_RUNDIR/db.sock
-vconsole:off
> > > > > --detach --no-chdir --pidfile --log-file
> > > > >     ovs-vsctl --no-wait -- init
> > > > >     ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif -
> > > > > vunixctl -vconsole:off --detach --no-chdir --pidfile --log-file
> > > > > done
> > > > >
> > > > > ovs_setenv repro_main
> > > > > ovs-vsctl add-br foo  \
> > > > >           -- add-port foo p1 \
> > > > >           -- set Interface p1
> > > > options:pstream="punix:$PWD/repro_main/p1.sock"
> > > > >
> > > > > ovs_setenv repro_hv1
> > > > > ovs-vsctl add-br br1 \
> > > > >           -- add-port br1 p1 \
> > > > >           -- set Interface p1
> > > > options:stream="unix:$PWD/repro_main/p1.sock"
> > > > >
> > > > > ovs_setenv repro_hv2
> > > > > ovs-vsctl add-br br1 \
> > > > >           -- add-port br1 p1 \
> > > > >           -- set Interface p1
> > > > options:stream="unix:$PWD/repro_main/p1.sock"
> > > >
> > > > I like this, but I think I'm going to be consistent and ask if you
> > > > can spin a new version with the above script as a unit test so that
> > > > we can see the crash before applying the rest of the patch and then
> > > > verify that it doesn't happen with the patch.
> > > >
> > > > Ryan
> > > >
> > >
> > > Hi Ryan,
> > >
> > > Thanks for the review.  I'm a little unclear on whether it's a good
idea
> > > to add a test case that doesn't verify any functionality and is no
longer
> > > useful as soon as it is merged.  I was rather hoping that the script
> > provided
> > > above would serve the "see the crash before applying" function.
> > >
> > > I'm fine with writing and posting such a test case if that's the
policy,
> > > but I do wonder how many new test cases we'll have if one is written
for
> > > every crash that has been fixed, and how effective this will be at
> > detecting
> > > future crash causes given the enormous number of ways code can be
changed
> > > to produce a crash.
> > >
> > > On the other hand it probably would be a good idea to add test cases
to
> > > verify netdev-dummy functionality (especially for things like
multiple
> > > passive connection support that aren't already used in other test
cases).
> > > This is probably a larger effort though.
> >
> > My reasoning (and it is purely personal, I'm not a committer or
anything :)
> > was that since we *know* there is a crash lurking there, a canary test
> > would be useful to make sure we don't regress and allow it to happen
> > again...
> >
> > If others think that I'm being overly-paranoid, then I'll happily drop
the
> > request.
>
> I usually agree.
>
> In this case, though, this is a feature that is intended only for
> developer testing, primarily within OVS's own tests, and which in fact
> cannot be used by end users without making a special effort to enable it
> (by adding --enable-dummy to the OVS command line, which distribution
> packaging doesn't do).  It's OK if the OVS tests break, we'll fix them.

Ok, then I withdraw the request - I'd put an Ack on it, but since it
has already landed ....

Ryan



More information about the dev mailing list