[ovs-discuss] DPDK initialization

John Phillips john.phillips5 at hpe.com
Fri Jul 15 17:38:01 UTC 2016


Hi. I sent an email the other day about difficulties initializing DPDK 
with a certain NIC card. Basically I got bizarre errors when I added a 
dpdk port to a bridge using this card (Mellanox CX3Pro) with OVS 
2.5+DPDK-16.04 (ovs 2.5 + some commits on branch-2.5, and with a patch 
for DPDK 16.04 constants) after an apparently normal EAL initialization. 
I found that not daemonizing the vswitchd process fixed the issue, and 
created a patch to initialize the eal after daemonization instead of 
before in vswitchd/ovs-vswitchd.c, and this fixed the issue. My email 
was then replied to, and I was asked to try it also with 2.5.90. I did 
this and the issue went away. I found that the commit that fixed it was 
bab6940, which changed how dpdk was initialized amongst other things; it 
was initialized this time during bridge_run which was after the 
daemonization of the vswitchd process. This can't be backported to 2.5 
because it's also the commit that changes DPDK to initialize itself from 
the ovs database.

I then attempted to find out exactly what was causing this problem. An 
obvious explanation for this issue was that rte_eal_init created threads 
which were killed when, after they were created, the vswitchd process 
was daemonized. So I set a watchpoint for pthread_create and fork and 
ran ovs-vswitchd linked against DPDK 16.04 and found thatthere are 
actually several calls to pthread_create:

     RTE_LCORE_FOREACH_SLAVE(i) {

         /*
          * create communication pipes between master thread
          * and children
          */
         if (pipe(lcore_config[i].pipe_master2slave) < 0)
             rte_panic("Cannot create pipe\n");
         if (pipe(lcore_config[i].pipe_slave2master) < 0)
             rte_panic("Cannot create pipe\n");

         lcore_config[i].state = WAIT;

         /* create a thread for each lcore */
         ret = pthread_create(&lcore_config[i].thread_id, NULL,
                      eal_thread_loop, NULL);
         if (ret != 0)
             rte_panic("Cannot create thread\n");

         /* Set thread_name for aid in debugging. */
         snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
             "lcore-slave-%d", i);
         ret = rte_thread_setname(lcore_config[i].thread_id,
                         thread_name);
         if (ret != 0)
             RTE_LOG(ERR, EAL,
                 "Cannot set name for lcore thread\n");
     }

This is during rte_eal_init which is called before daemonization (if 
--daemonize is passed). This is true for DPDK 2.2 and DPDK 16.04 - all 
these threads will die when the parent process exits as part of 
daemonization according to the best of my (incomplete) knowledge about 
how pthreads/unix processes work. Even so this same software without any 
changes did work with the niantic NIC. Could someone explain to me if 
and how this is correct, or if it needs fixing? If so, is there a chance 
we can get a patch in branch-2.5 that changes the way DPDK initializes? 
bab6940 can't be used because it changes also the way DPDK gets its 
parameters.


Thanks,

   John




More information about the discuss mailing list