[ovs-dev] use-after-free in ofproto destruct path

Ethan Jackson ethan at nicira.com
Tue Oct 8 20:27:20 UTC 2013


I think the real problem here is that rule_dpif_lookup() doesn't sit
inside ofproto-dpif-xlate like everything else.  There it's easy to
handle because if the ofproto is destroyed, we'll fail to find an
xbridge and give up.  It wouldn't be too hard to move it I think, but
I'm not sure we have time for it.  We'd have to give xbridges a ref
counted pointer to the tables array, I think it has everything else it
needs.

As an expedient alternative, what if we just killed and restarted all
of the threads using udpif_recv_set()?  It's kind of a ugly hack, but
we could get it in quickly and leave the correct solution for after
the 2.0 release.

Thoughts?


On Tue, Oct 8, 2013 at 12:37 PM, Ben Pfaff <blp at nicira.com> wrote:
> Trying to track down a memory leak, I set up 100 tunnels, configured
> CFM on them, and ran under valgrind.  After a while I ran "ovs-appctl
> exit".  I didn't find any leaks, but I do see this valgrind warning:
>
>     Thread 2:
>     Invalid read of size 4
>        at 0x805C495: ofproto_rule_ref (ofproto.c:2454)
>        by 0x806EEA8: rule_dpif_ref (ofproto-dpif.c:4683)
>        by 0x806EE8F: choose_miss_rule (ofproto-dpif.c:4676)
>        by 0x806EC9D: rule_dpif_lookup (ofproto-dpif.c:4620)
>        by 0x8078F3D: handle_upcalls (ofproto-dpif-upcall.c:767)
>        by 0x8078305: udpif_upcall_handler (ofproto-dpif-upcall.c:423)
>        by 0x80ED062: ovsthread_wrapper (ovs-thread.c:214)
>        by 0x43E3C38: start_thread (pthread_create.c:304)
>        by 0x434F78D: clone (clone.S:130)
>      Address 0x583292c is 140 bytes inside a block of size 296 free'd
>        at 0x402750C: free (vg_replace_malloc.c:427)
>        by 0x806EF45: rule_dealloc (ofproto-dpif.c:4721)
>        by 0x805C5F7: ofproto_rule_destroy__ (ofproto.c:2503)
>        by 0x805C51C: ofproto_rule_unref (ofproto.c:2468)
>        by 0x80634B0: ofopgroup_complete (ofproto.c:6010)
>        by 0x8063050: ofopgroup_submit (ofproto.c:5873)
>        by 0x805996D: ofproto_rule_delete (ofproto.c:1175)
>        by 0x8067A14: destruct (ofproto-dpif.c:1395)
>        by 0x8059DC4: ofproto_destroy (ofproto.c:1283)
>        by 0x80522EE: bridge_destroy (bridge.c:2694)
>        by 0x804CF5A: bridge_exit (bridge.c:429)
>        by 0x805603C: main (ovs-vswitchd.c:133)
>
> followed by an assertion failure:
>
>     ovs-vswitchd(upcall_handler): ../ofproto/ofproto.c:2455: assertion
>     orig != 0 failed in ofproto_rule_ref()
>
> The sequence of events appears to be:
>
>         1. In its first loop, handle_upcalls() receives a bunch of
>            upcalls and finds their ofprotos.
>
>         2. The ofproto gets destroyed.
>
>         3. In its second loop, handle_upcalls() looks up rules in the
>            now-destroyed ofproto.  Boom!
>
> I can think of a few different fixes.  One would be to hold the
> xlate_rwlock across all of handle_upcalls().  Another would be to
> refcount ofprotos somehow.  A third would be to make destruct() in
> ofproto-dpif.c pause the upcall thread one way or another.
>
> Thoughts?



More information about the dev mailing list