[ovs-discuss] SIGSEGV in nl_attr_get_size() - OVS 2.0.2
Duncan Idaho
ghola at rebelbase.com
Tue Nov 4 00:40:49 UTC 2014
Strange indeed.
Ubuntu built this binary with this GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Let me know if there is any other information I can provide that would be
helpful here.
On Fri, Oct 31, 2014 at 3:42 PM, Ben Pfaff <blp at nicira.com> wrote:
> On Mon, Oct 27, 2014 at 04:21:54PM -0700, Duncan Idaho wrote:
> > We're currently seeing this crash several times a day in our Ubuntu
> > Icehouse OpenStack environment of about 60 nodes.
> >
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0 nl_attr_get_size (nla=nla at entry=0x0) at ../lib/netlink.c:506
> > #1 0x0000000000460473 in format_generic_odp_key (a=a at entry=0x0,
> > ds=ds at entry=0x7fffc803d290)
> > at ../lib/odp-util.c:767
> > #2 0x0000000000460cd2 in format_odp_key_attr (a=a at entry=0x1a63c98,
> > ma=ma at entry=0x0, ds=ds at entry=0x7fffc803d290, verbose=verbose at entry=true)
> at
> > ../lib/odp-util.c:1332
> > #3 0x00000000004609d7 in odp_flow_format (key=key at entry=0x1a63c50,
> > key_len=key_len at entry=80, mask=mask at entry=0x0, mask_len=mask_len at entry
> =0,
> > ds=ds at entry=0x7fffc803d290,
> > verbose=verbose at entry=true) at ../lib/odp-util.c:1402
> > #4 0x00000000004450f3 in log_flow_message (error=error at entry=2,
> > operation=operation at entry=0x4d0e73 "flow_del", key=0x1a63c50,
> key_len=80,
> > mask=mask at entry=0x0,
> > mask_len=mask_len at entry=0, stats=0x0, actions=actions at entry=0x0,
> > actions_len=actions_len at entry=0, dpif=<optimized out>) at
> ../lib/dpif.c:1354
> > #5 0x00000000004453c9 in log_flow_del_message (dpif=dpif at entry
> =0x1a06c70,
> > del=del at entry=0x7fffc803d340, error=error at entry=2) at ../lib/dpif.c:1397
> > #6 0x0000000000445433 in log_flow_del_message (error=2,
> > del=0x7fffc803d340, dpif=0x1a06c70) at ../lib/dpif.c:1396
> > #7 dpif_flow_del__ (dpif=0x1a06c70, del=del at entry=0x7fffc803d340) at
> > ../lib/dpif.c:945
> > #8 0x00000000004455ca in dpif_flow_del (dpif=<optimized out>,
> > key=<optimized out>, key_len=<optimized out>, stats=stats at entry
> =0x7fffc803d370)
> > at ../lib/dpif.c:965
> > #9 0x000000000041b423 in subfacet_uninstall (subfacet=0x1be9a80) at
> > ../ofproto/ofproto-dpif.c:4686
> > #10 0x0000000000420f18 in facet_remove (facet=0x1be9680) at
> > ../ofproto/ofproto-dpif.c:4014
> > #11 0x0000000000422f52 in facet_revalidate (facet=facet at entry=0x1be9680)
> at
> > ../ofproto/ofproto-dpif.c:4321
> > #12 0x0000000000423b96 in type_run (type=<optimized out>) at
> > ../ofproto/ofproto-dpif.c:836
> > #13 0x000000000041224f in ofproto_type_run (datapath_type=<optimized
> out>,
> > datapath_type at entry=0x1ab88a0 "system") at ../ofproto/ofproto.c:1309
> > #14 0x000000000040d755 in bridge_run () at ../vswitchd/bridge.c:2384
> > #15 0x00000000004059bb in main (argc=<optimized out>, argv=<optimized
> out>)
> > at ../vswitchd/ovs-vswitchd.c:118
>
> This backtrace doesn't quite add up.
>
> We can see from frames 4 and 3 that we've got a nonnull 'key', which
> becomes a nonnull nlattr 'a' in frame 2. Along the same chain, we
> have a null 'mask' that becomes a null 'ma'. I often don't trust GDB
> to give me correct arguments in backtraces but all of that adds up
> nicely so I tend to believe it.
>
> Take a look at the code for format_odp_key_attr(). It always
> dereferences 'a' to get its type 'attr':
>
> enum ovs_key_attr attr = nl_attr_type(a);
>
> A few lines later we can see 'is_exact' getting set to true (since
> 'ma' is NULL):
>
> bool is_exact;
>
> is_exact = ma ? odp_mask_attr_is_exact(ma) : true;
>
> We're evidently hitting the default case in the switch statement given
> the line number cited in the backtrace, which runs this code:
>
> case OVS_KEY_ATTR_UNSPEC:
> case __OVS_KEY_ATTR_MAX:
> default:
> format_generic_odp_key(a, ds);
> if (!is_exact) {
> ds_put_char(ds, '/');
> format_generic_odp_key(ma, ds); <---- line 1332
> }
> break;
>
> but that doesn't make sense--we should never get there, because
> is_exact is true. So--WTF?
>
> > This is probably related to the following "fixed" Ubuntu bug:
> > https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1352570
> >
> > The fix referenced was:
> >
> https://github.com/openvswitch/ovs/commit/dd2e44f835fac8c2df99f84c54250c3ca981f2f5
> >
> > Not sure if it's relevant but part of this patch was reverted prior to
> the
> > 2.0.2 release:
> >
> https://github.com/openvswitch/ovs/commit/e8ac8c3940535fb439eba980afa6c61bdd428003
>
> commit dd2e44f835 is about a race between two threads when a bridge is
> being deleted. I don't see any evidence that there's a bridge being
> deleted here.
>
> > Any help will be appreciated! Let me know if I can provide any more
> > relevant information.
>
> What GCC version was used for this build? I've seen an unusual number
> of code generation bugs with GCC 4.9.x.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20141103/e5a17140/attachment-0002.html>
More information about the discuss
mailing list