[ovs-discuss] SIGSEGV in nl_attr_get_size() - OVS 2.0.2

Duncan Idaho ghola at rebelbase.com
Tue Nov 4 00:40:49 UTC 2014


Strange indeed.

Ubuntu built this binary with this GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2

Let me know if there is any other information I can provide that would be
helpful here.


On Fri, Oct 31, 2014 at 3:42 PM, Ben Pfaff <blp at nicira.com> wrote:

> On Mon, Oct 27, 2014 at 04:21:54PM -0700, Duncan Idaho wrote:
> > We're currently seeing this crash several times a day in our Ubuntu
> > Icehouse OpenStack environment of about 60 nodes.
> >
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  nl_attr_get_size (nla=nla at entry=0x0) at ../lib/netlink.c:506
> > #1  0x0000000000460473 in format_generic_odp_key (a=a at entry=0x0,
> > ds=ds at entry=0x7fffc803d290)
> > at ../lib/odp-util.c:767
> > #2  0x0000000000460cd2 in format_odp_key_attr (a=a at entry=0x1a63c98,
> > ma=ma at entry=0x0, ds=ds at entry=0x7fffc803d290, verbose=verbose at entry=true)
> at
> > ../lib/odp-util.c:1332
> > #3  0x00000000004609d7 in odp_flow_format (key=key at entry=0x1a63c50,
> > key_len=key_len at entry=80, mask=mask at entry=0x0, mask_len=mask_len at entry
> =0,
> > ds=ds at entry=0x7fffc803d290,
> >     verbose=verbose at entry=true) at ../lib/odp-util.c:1402
> > #4  0x00000000004450f3 in log_flow_message (error=error at entry=2,
> > operation=operation at entry=0x4d0e73 "flow_del", key=0x1a63c50,
> key_len=80,
> > mask=mask at entry=0x0,
> >     mask_len=mask_len at entry=0, stats=0x0, actions=actions at entry=0x0,
> > actions_len=actions_len at entry=0, dpif=<optimized out>) at
> ../lib/dpif.c:1354
> > #5  0x00000000004453c9 in log_flow_del_message (dpif=dpif at entry
> =0x1a06c70,
> > del=del at entry=0x7fffc803d340, error=error at entry=2) at ../lib/dpif.c:1397
> > #6  0x0000000000445433 in log_flow_del_message (error=2,
> > del=0x7fffc803d340, dpif=0x1a06c70) at ../lib/dpif.c:1396
> > #7  dpif_flow_del__ (dpif=0x1a06c70, del=del at entry=0x7fffc803d340) at
> > ../lib/dpif.c:945
> > #8  0x00000000004455ca in dpif_flow_del (dpif=<optimized out>,
> > key=<optimized out>, key_len=<optimized out>, stats=stats at entry
> =0x7fffc803d370)
> > at ../lib/dpif.c:965
> > #9  0x000000000041b423 in subfacet_uninstall (subfacet=0x1be9a80) at
> > ../ofproto/ofproto-dpif.c:4686
> > #10 0x0000000000420f18 in facet_remove (facet=0x1be9680) at
> > ../ofproto/ofproto-dpif.c:4014
> > #11 0x0000000000422f52 in facet_revalidate (facet=facet at entry=0x1be9680)
> at
> > ../ofproto/ofproto-dpif.c:4321
> > #12 0x0000000000423b96 in type_run (type=<optimized out>) at
> > ../ofproto/ofproto-dpif.c:836
> > #13 0x000000000041224f in ofproto_type_run (datapath_type=<optimized
> out>,
> > datapath_type at entry=0x1ab88a0 "system") at ../ofproto/ofproto.c:1309
> > #14 0x000000000040d755 in bridge_run () at ../vswitchd/bridge.c:2384
> > #15 0x00000000004059bb in main (argc=<optimized out>, argv=<optimized
> out>)
> > at ../vswitchd/ovs-vswitchd.c:118
>
> This backtrace doesn't quite add up.
>
> We can see from frames 4 and 3 that we've got a nonnull 'key', which
> becomes a nonnull nlattr 'a' in frame 2.  Along the same chain, we
> have a null 'mask' that becomes a null 'ma'.  I often don't trust GDB
> to give me correct arguments in backtraces but all of that adds up
> nicely so I tend to believe it.
>
> Take a look at the code for format_odp_key_attr().  It always
> dereferences 'a' to get its type 'attr':
>
>     enum ovs_key_attr attr = nl_attr_type(a);
>
> A few lines later we can see 'is_exact' getting set to true (since
> 'ma' is NULL):
>
>     bool is_exact;
>
>     is_exact = ma ? odp_mask_attr_is_exact(ma) : true;
>
> We're evidently hitting the default case in the switch statement given
> the line number cited in the backtrace, which runs this code:
>
>     case OVS_KEY_ATTR_UNSPEC:
>     case __OVS_KEY_ATTR_MAX:
>     default:
>         format_generic_odp_key(a, ds);
>         if (!is_exact) {
>             ds_put_char(ds, '/');
>             format_generic_odp_key(ma, ds);      <---- line 1332
>         }
>         break;
>
> but that doesn't make sense--we should never get there, because
> is_exact is true.  So--WTF?
>
> > This is probably related to the following "fixed" Ubuntu bug:
> > https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1352570
> >
> > The fix referenced was:
> >
> https://github.com/openvswitch/ovs/commit/dd2e44f835fac8c2df99f84c54250c3ca981f2f5
> >
> > Not sure if it's relevant but part of this patch was reverted prior to
> the
> > 2.0.2 release:
> >
> https://github.com/openvswitch/ovs/commit/e8ac8c3940535fb439eba980afa6c61bdd428003
>
> commit dd2e44f835 is about a race between two threads when a bridge is
> being deleted.  I don't see any evidence that there's a bridge being
> deleted here.
>
> > Any help will be appreciated!  Let me know if I can provide any more
> > relevant information.
>
> What GCC version was used for this build?  I've seen an unusual number
> of code generation bugs with GCC 4.9.x.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20141103/e5a17140/attachment-0002.html>


More information about the discuss mailing list