[ovs-discuss] OVS-DPDK: Packets stuck in dpdk_tx_queue

Mattias Johansson G mattias.g.johansson at ericsson.com
Wed May 4 13:15:55 UTC 2016


Hi!

I'm doing a throughput test of OVS 2.4.1 and I'm seeing an issue with abnormally high
latency variation when sending traffic via a physical port connected to a numa node 1
when having OVS control plane running on numa node 0.

The problem seen is that when a PMD thread running on numa node 0 reads a packet from
the vhost queue and transmit it to the physical ports Tx queue located on numa node 1
the packet gets stuck there until next packet arrives.

In the code a PMD thread running on numa node 0 reads a packet from the vhost queue,
and calls dpdk_queue_pkts to queue it in struct dpdk_tx_queue for the physical port
located on numa node 1.

In this test dev->tx_q[i].flush_tx is false which means that dpdk_queue_flush__ is not
called (except if the Tx queue gets full, i.e. reach MAX_TX_QUEUE_LEN amount of
packets but that is not of interest here).

For the queues on numa node 1 no other trigger mechanism for a flush is present, so the
packet gets stuck in struct dpdk_tx_queue  until the next packet arrives.

What I think happens in my test is that the last packets I send doesn't fill up the Tx queue
hence no flush of it happens. The flush happens when I send an additional packet, then
the DRAIN_TSC interval has passed which forces a call to dpdk_queue_flush__.

This behavior is not seen when sending packets to a physical port connected to numa
node 0 where the OVS control plane is running as the dev->tx_q[i].flush_tx is true in
that situation.

The behavior seems to be controlled by how the Tx queue is configured in relation with
the CPU id is the PMD thread is executing on. Below is code from a function in
netdev-dpdk.c (seems to be similar in the latest version on the master- and 2.4.1 branch).

Why isn't dev->tx_q[i].flush_tx always enabled (set to true)? If I change the code and
always enable it the problem seems to disappear.

I've attached some logs describing the cpu layout, where my different PMD threads
executes and how the queues for the two physical ports I have gets configured.

I would very much appreciate if I can get help in understanding of how this is intended
to work.

Thanks in advance!

BR,
Mattias

Unmodified code snippet from:
https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c (362ca39)
static void
netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned int n_txqs)
{
    unsigned i;

    dev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *dev->tx_q);
    for (i = 0; i < n_txqs; i++) {
        int numa_id = ovs_numa_get_numa_id(i);

        if (!dev->txq_needs_locking) {
            /* Each index is considered as a cpu core id, since there should
             * be one tx queue for each cpu core.  If the corresponding core
             * is not on the same numa node as 'dev', flags the
             * 'flush_tx'. */
            dev->tx_q[i].flush_tx = dev->socket_id == numa_id;
        } else {
            /* Queues are shared among CPUs. Always flush */
            dev->tx_q[i].flush_tx = true;
        }

        /* Initialize map for vhost devices. */
        dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN;
        rte_spinlock_init(&dev->tx_q[i].tx_lock);
    }
}

Traffic tool output:
burst size:                           2
frequency:                       100000 burst/s
payload size:                        30 B
frame size:                          64 B
length of test:                       2 s

tx:                              399730
rx:                              399730
bits received:                        0 Gb
elapsed:                              4 s

tx lost:                              0
rx lost:                              0
out of order:                         0
corrupt:                              0
noise:                                0

min:                                 14 us
max:                            1994061 us
mean latency:                    95.254 us
standard deviation:            8344.242 us

Latencies (us):
       0 <= x < 50               158747
      50 <= x < 100              200266
     100 <= x < 150               40552
     150 <= x < 200                  32
     200 <= x < 250                  30
     250 <= x < 300                  20
     300 <= x < 350                  20
     350 <= x < 400                  22
     400 <= x < 450                  16
     450 <= x < 500                  10
     500 <= x < 550                   8
    50000 <= x                         7



OVS threads:
Name              CPU
pmd128             2 
pmd129           39 
pmd127           22 
pmd126           19
ovs-vswitchd   0

cpu_layout.py:
============================================================
Core and Socket Information (as reported by '/proc/cpuinfo')
============================================================

cores =  [0, 1, 2, 3, 4, 8, 9, 10, 11, 12] sockets =  [0, 1]

        Socket 0        Socket 1        
        --------        --------        
Core 0  [0, 20]         [1, 21]         

Core 1  [2, 22]         [3, 23]         

Core 2  [4, 24]         [5, 25]         

Core 3  [6, 26]         [7, 27]         

Core 4  [8, 28]         [9, 29]         

Core 8  [10, 30]        [11, 31]        

Core 9  [12, 32]        [13, 33]        

Core 10 [14, 34]        [15, 35]        

Core 11 [16, 36]        [17, 37]        

Core 12 [18, 38]        [19, 39]  


How Tx flush is configured for dpdk0 on numa node 0:
2016-04-19T08:34:36.389Z|00024|dpdk|INFO|Enable flush for tx queue 0
2016-04-19T08:34:36.389Z|00025|dpdk|INFO|Disable flush for tx queue 1
2016-04-19T08:34:36.389Z|00026|dpdk|INFO|Enable flush for tx queue 2
2016-04-19T08:34:36.389Z|00027|dpdk|INFO|Disable flush for tx queue 3
2016-04-19T08:34:36.389Z|00028|dpdk|INFO|Enable flush for tx queue 4
2016-04-19T08:34:36.389Z|00029|dpdk|INFO|Disable flush for tx queue 5
2016-04-19T08:34:36.389Z|00030|dpdk|INFO|Enable flush for tx queue 6
2016-04-19T08:34:36.389Z|00031|dpdk|INFO|Disable flush for tx queue 7
2016-04-19T08:34:36.389Z|00032|dpdk|INFO|Enable flush for tx queue 8
2016-04-19T08:34:36.389Z|00033|dpdk|INFO|Disable flush for tx queue 9
2016-04-19T08:34:36.389Z|00034|dpdk|INFO|Enable flush for tx queue 10
2016-04-19T08:34:36.389Z|00035|dpdk|INFO|Disable flush for tx queue 11
2016-04-19T08:34:36.389Z|00036|dpdk|INFO|Enable flush for tx queue 12
2016-04-19T08:34:36.389Z|00037|dpdk|INFO|Disable flush for tx queue 13
2016-04-19T08:34:36.389Z|00038|dpdk|INFO|Enable flush for tx queue 14
2016-04-19T08:34:36.389Z|00039|dpdk|INFO|Disable flush for tx queue 15
2016-04-19T08:34:36.389Z|00040|dpdk|INFO|Enable flush for tx queue 16
2016-04-19T08:34:36.389Z|00041|dpdk|INFO|Disable flush for tx queue 17
2016-04-19T08:34:36.389Z|00042|dpdk|INFO|Enable flush for tx queue 18
2016-04-19T08:34:36.389Z|00043|dpdk|INFO|Disable flush for tx queue 19
2016-04-19T08:34:36.389Z|00044|dpdk|INFO|Enable flush for tx queue 20
2016-04-19T08:34:36.389Z|00045|dpdk|INFO|Disable flush for tx queue 21
2016-04-19T08:34:36.389Z|00046|dpdk|INFO|Enable flush for tx queue 22
2016-04-19T08:34:36.389Z|00047|dpdk|INFO|Disable flush for tx queue 23
2016-04-19T08:34:36.389Z|00048|dpdk|INFO|Enable flush for tx queue 24
2016-04-19T08:34:36.389Z|00049|dpdk|INFO|Disable flush for tx queue 25
2016-04-19T08:34:36.389Z|00050|dpdk|INFO|Enable flush for tx queue 26
2016-04-19T08:34:36.389Z|00051|dpdk|INFO|Disable flush for tx queue 27
2016-04-19T08:34:36.389Z|00052|dpdk|INFO|Enable flush for tx queue 28
2016-04-19T08:34:36.389Z|00053|dpdk|INFO|Disable flush for tx queue 29
2016-04-19T08:34:36.389Z|00054|dpdk|INFO|Enable flush for tx queue 30
2016-04-19T08:34:36.389Z|00055|dpdk|INFO|Disable flush for tx queue 31
2016-04-19T08:34:36.389Z|00056|dpdk|INFO|Enable flush for tx queue 32
2016-04-19T08:34:36.389Z|00057|dpdk|INFO|Disable flush for tx queue 33
2016-04-19T08:34:36.389Z|00058|dpdk|INFO|Enable flush for tx queue 34
2016-04-19T08:34:36.389Z|00059|dpdk|INFO|Disable flush for tx queue 35
2016-04-19T08:34:36.389Z|00060|dpdk|INFO|Enable flush for tx queue 36
2016-04-19T08:34:36.389Z|00061|dpdk|INFO|Disable flush for tx queue 37
2016-04-19T08:34:36.389Z|00062|dpdk|INFO|Enable flush for tx queue 38
2016-04-19T08:34:36.389Z|00063|dpdk|INFO|Disable flush for tx queue 39
2016-04-19T08:34:36.389Z|00064|dpdk|INFO|Disable flush for tx queue 40

How Tx flush is configured for dpdk1 on numa node 1:
2016-04-19T08:34:37.495Z|00069|dpdk|INFO|Disable flush for tx queue 0
2016-04-19T08:34:37.495Z|00070|dpdk|INFO|Enable flush for tx queue 1
2016-04-19T08:34:37.495Z|00071|dpdk|INFO|Disable flush for tx queue 2
2016-04-19T08:34:37.495Z|00072|dpdk|INFO|Enable flush for tx queue 3
2016-04-19T08:34:37.495Z|00073|dpdk|INFO|Disable flush for tx queue 4
2016-04-19T08:34:37.495Z|00074|dpdk|INFO|Enable flush for tx queue 5
2016-04-19T08:34:37.495Z|00075|dpdk|INFO|Disable flush for tx queue 6
2016-04-19T08:34:37.495Z|00076|dpdk|INFO|Enable flush for tx queue 7
2016-04-19T08:34:37.495Z|00077|dpdk|INFO|Disable flush for tx queue 8
2016-04-19T08:34:37.495Z|00078|dpdk|INFO|Enable flush for tx queue 9
2016-04-19T08:34:37.495Z|00079|dpdk|INFO|Disable flush for tx queue 10
2016-04-19T08:34:37.495Z|00080|dpdk|INFO|Enable flush for tx queue 11
2016-04-19T08:34:37.495Z|00081|dpdk|INFO|Disable flush for tx queue 12
2016-04-19T08:34:37.495Z|00082|dpdk|INFO|Enable flush for tx queue 13
2016-04-19T08:34:37.495Z|00083|dpdk|INFO|Disable flush for tx queue 14
2016-04-19T08:34:37.495Z|00084|dpdk|INFO|Enable flush for tx queue 15
2016-04-19T08:34:37.495Z|00085|dpdk|INFO|Disable flush for tx queue 16
2016-04-19T08:34:37.495Z|00086|dpdk|INFO|Enable flush for tx queue 17
2016-04-19T08:34:37.495Z|00087|dpdk|INFO|Disable flush for tx queue 18
2016-04-19T08:34:37.495Z|00088|dpdk|INFO|Enable flush for tx queue 19
2016-04-19T08:34:37.495Z|00089|dpdk|INFO|Disable flush for tx queue 20
2016-04-19T08:34:37.495Z|00090|dpdk|INFO|Enable flush for tx queue 21
2016-04-19T08:34:37.495Z|00091|dpdk|INFO|Disable flush for tx queue 22
2016-04-19T08:34:37.495Z|00092|dpdk|INFO|Enable flush for tx queue 23
2016-04-19T08:34:37.495Z|00093|dpdk|INFO|Disable flush for tx queue 24
2016-04-19T08:34:37.495Z|00094|dpdk|INFO|Enable flush for tx queue 25
2016-04-19T08:34:37.495Z|00095|dpdk|INFO|Disable flush for tx queue 26
2016-04-19T08:34:37.495Z|00096|dpdk|INFO|Enable flush for tx queue 27
2016-04-19T08:34:37.495Z|00097|dpdk|INFO|Disable flush for tx queue 28
2016-04-19T08:34:37.495Z|00098|dpdk|INFO|Enable flush for tx queue 29
2016-04-19T08:34:37.495Z|00099|dpdk|INFO|Disable flush for tx queue 30
2016-04-19T08:34:37.495Z|00100|dpdk|INFO|Enable flush for tx queue 31
2016-04-19T08:34:37.495Z|00101|dpdk|INFO|Disable flush for tx queue 32
2016-04-19T08:34:37.495Z|00102|dpdk|INFO|Enable flush for tx queue 33
2016-04-19T08:34:37.495Z|00103|dpdk|INFO|Disable flush for tx queue 34
2016-04-19T08:34:37.495Z|00104|dpdk|INFO|Enable flush for tx queue 35
2016-04-19T08:34:37.495Z|00105|dpdk|INFO|Disable flush for tx queue 36
2016-04-19T08:34:37.495Z|00106|dpdk|INFO|Enable flush for tx queue 37
2016-04-19T08:34:37.495Z|00107|dpdk|INFO|Disable flush for tx queue 38
2016-04-19T08:34:37.495Z|00108|dpdk|INFO|Enable flush for tx queue 39
2016-04-19T08:34:37.495Z|00109|dpdk|INFO|Disable flush for tx queue 40



More information about the discuss mailing list