[ovs-dev] [PATCH] dpdk: Fix abort on double free.

Ilya Maximets i.maximets at samsung.com
Tue Nov 29 07:07:46 UTC 2016


On 28.11.2016 21:55, Aaron Conole wrote:
> Ilya Maximets <i.maximets at samsung.com> writes:
> 
>> According to DPDK API (lib/librte_eal/common/include/rte_eal.h):
>>
>> 	"After the call to rte_eal_init(), all arguments argv[x]
>> 	 with x < ret may be modified and should not be accessed
>> 	 by the application."
>>
>> This means, that OVS must not free the arguments passed to DPDK.
>> In real world, 'rte_eal_init()' replaces the last argument in
>> 'dpdk_argv' with the first one by doing this:
> 
> Thanks for spotting this error, Ilya.
> 
>> 	# eal_parse_args() from lib/librte_eal/linuxapp/eal/eal.c
>>
>> 	char *prgname = argv[0];
>> 	...
>> 	if (optind >= 0)
>> 		argv[optind-1] = prgname;
>>
>> This leads to double free inside 'deferred_argv_release()' and
>> possible ABORT at exit:
> 
> I haven't seen this, which is both shocking and scary - the commit which
> does this copy is almost 4 years old;  did you have to do anything
> specific for this behavior to occur?  Did something change in DPDK
> recently that exposed this behavior?  Just wondering how you reproduced
> it.

Abort was caught up accidentally. I'm able to reproduce it on my a
little unusual testing system (ARMv8 + Fedora 21 + clang 3.5) without
any specific manipulations. The bug exists always but it's hard
for libc to detect double free here because there are many other
frees/allocations at exit time. I've used following patch to confirm
the issue if it wasn't detected by libc:

diff --git a/lib/dpdk.c b/lib/dpdk.c
index 49a589a..65d2d28 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -258,6 +258,8 @@ deferred_argv_release(void)
 {
     int result;
     for (result = 0; result < dpdk_argc; ++result) {
+        VLOG_INFO("DPDK ARGV release: %2d: 0x%" PRIx64 " (%s)",
+                  result, (intptr_t)dpdk_argv[result], dpdk_argv[result]);
         free(dpdk_argv[result]);
     }
 


> 
>> *** Error in `ovs-vswitchd': double free or corruption (fasttop) <...> ***
>> 	Program received signal SIGABRT, Aborted.
>>
>> 	#0  raise () from /lib64/libc.so.6
>> 	#1  abort () from /lib64/libc.so.6
>> 	#2  __libc_message () from /lib64/libc.so.6
>> 	#3  free () from /lib64/libc.so.6
>> 	#4  deferred_argv_release () at lib/dpdk.c:261
>> 	#5  __run_exit_handlers () from /lib64/libc.so.6
>> 	#6  exit () from /lib64/libc.so.6
>> 	#7  __libc_start_main () from /lib64/libc.so.6
>> 	#8  _start ()
>>
>> Fix that by not calling free for the memory passed to DPDK.
>>
>> CC: Aaron Conole <aconole at redhat.com>
>> Fixes: bab694097133 ("netdev-dpdk: Convert initialization from cmdline to db")
>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>> ---
> 
> We need to free the memory - I think that is not a question;

Actually, it is. According to DPDK API (see above) 'rte_eal_init()'
takes the ownership of 'argv'. This means that we must not free
or use this memory.

Some thoughts:
DPDK internally doesn't free this memory, but it's not the reason to
touch it from the outside. Actually, DPDK API change required here to
support freeing of this resources if needed. But until there is no
'rte_eal_uninit()' such API change isn't actually useful.

Also, I forget to remove the variables. So, the following incremental
to my original patch required:

------------------------------------
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 2014946..4201149 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -250,9 +250,6 @@ get_dpdk_args(const struct smap *ovs_other_config, char ***argv,
     return i + extra_argc;
 }
 
-static char **dpdk_argv;
-static int dpdk_argc;
-
 static void
 dpdk_init__(const struct smap *ovs_other_config)
 {
@@ -370,9 +367,6 @@ dpdk_init__(const struct smap *ovs_other_config)
         }
     }
 
-    dpdk_argv = argv;
-    dpdk_argc = argc;
-
     rte_memzone_dump(stdout);
 
     /* We are called from the main thread here */
------------------------------------

Best regards, Ilya Maximets.


More information about the dev mailing list