[ovs-discuss] problem about cls pvector may cause ovs-vswitchd crash

Yinpeijun yinpeijun at huawei.com
Tue Sep 15 11:22:28 UTC 2020


Hi All,

Recently we found a problem, as follow:


1.  Problem description:

PVECTOR_FOR_EACH use ovsrcu_get to get pvector’s current impl pointer, and the memory_order_consume
in ovsrcu_get will ensure *impl is read after this instruction, so we can get the the correct ptr value in
impl->vector[0], but it seems that we cannot make sure that the *ptr value is also correct.


2.  Verification through testing:

Copy the test code into file lib/dpif-netdev.c, and the modify fuction pmd_thread_main, insert the following line

in the for (;;) loop:

do_atomic_test(pmd);
  if the do_atomic_consumer is implement without mutex lock, we can easily get the following log in ovs-vswitchd.log:
      ----my_itrator:143, my_value:144  (we may get other values, and my_itrator = my_value - 1 stay true.

   The consumer thread can get dirty memory with memory_order_consume, if we change ovsrcu_get to get values
   with memory_order_acquire, we can still get the error message (my_itrator = my_value - 1).

   We can fix this problem if consumer thread access data with g_pvector_lock’ protection, but pvector is designed
to use without locks. Where did it go wrong? Can anyone here give some comments?


--------------------------------------------- A error case truly happened whining using pvector --------------------------
Suppose a scenario as follows:

1)       handler thread insert a dpcls_subtable, whose address is p,

to datapath classifier’s (struct dpcls) pvector;

2)       pmd thread call fast_path_processing to find the subtable and access address p;

3)       the subtable above is destroyed by someone;

4)       handler thread insert another subtable, whose address is also p,

and insert to dpcls’s pvector;

5)       pmd thread call fast_path_processing -> dp_netdev_pmd_lookup_flow

-> dpcls_lookup to access the subtable from pvector.

We use PVECTOR_FOR_EACH to iterate the pvector in step 5, no locks is used, dirty memory access may
cause to ovs-vswitchd process coredump.

----------------------------------------------Testing codes are as follows ---------------------------------------------
static struct pvector  g_atomic_test;
static unsigned char  g_my_data[65536];
static unsigned char  g_value = 0;
static int    g_index = 0;
static struct ovs_mutex g_pvector_lock = OVS_MUTEX_INITIALIZER;

static void do_atomic_producer(void)
{
    ovs_mutex_lock(&g_pvector_lock);
    if (g_index < 65536) {
        g_my_data[g_index] = g_value;
        pvector_insert(&g_atomic_test, (void *)&g_my_data[g_index], 0);
        g_index++;
    } else {
        for (int i = 0; i < g_index; i++) {
            pvector_remove(&g_atomic_test, (void *)&g_my_data[i]);
        }
        g_index = 0;
        g_value++; //value can loops to zero;
    }
    pvector_publish(&g_atomic_test);
    ovs_mutex_unlock(&g_pvector_lock);
}

static void do_atomic_consumer(void)
{
    unsigned char *my_itr;
    unsigned char my_value;
    bool first = true;

    // ovs_mutex_lock(&g_pvector_lock);
    PVECTOR_FOR_EACH (my_itr, &g_atomic_test) {
      if (first) {
          my_value = *my_itr;
          first = false;
      } else if (*my_itr != my_value) {
          VLOG_ERR("----my_itrator:%u, my_value:%u\n", *my_itr, my_value);
          break;
      }
    }
    // ovs_mutex_unlock(&g_pvector_lock);
}

static void do_atomic_test(struct dp_netdev_pmd_thread *pmd)
{
   if (pmd->core_id == 1) {
      do_atomic_producer();
   } else {
      do_atomic_consumer();
   }
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200915/7175c3f2/attachment.html>


More information about the discuss mailing list