[ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Yanqin Wei Yanqin.Wei at arm.com
Tue Jul 7 05:25:54 UTC 2020


The 2nd one is another periodic task for dpcls ranking.

From: Yanqin Wei
Sent: Tuesday, July 7, 2020 1:19 PM
To: Shahaji Bhosle <shahaji.bhosle at broadcom.com>
Cc: Flavio Leitner <fbl at sysclose.org>; ovs-dev at openvswitch.org; nd <nd at arm.com>; Ilya Maximets <i.maximets at samsung.com>; Lee Reed <lee.reed at broadcom.com>; Vinay Gupta <vinay.gupta at broadcom.com>; Alex Barba <alex.barba at broadcom.com>
Subject: RE: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Hi Shahaji,

Yes, update some counter each 10 second for pmd balance and pmd info collection.  I have no idea of how to disable them from outside.
You could try to modify the following number and observe packet loss.

/* Time in microseconds of the interval in which rxq processing cycles used
* in rxq to pmd assignments is measured and stored. */
#define PMD_RXQ_INTERVAL_LEN 10000000LL

/* Time in microseconds between successive optimizations of the dpcls
* subtable vector */
#define DPCLS_OPTIMIZATION_INTERVAL 1000000LL

Best Regards,
Wei Yanqin

From: Shahaji Bhosle <shahaji.bhosle at broadcom.com<mailto:shahaji.bhosle at broadcom.com>>
Sent: Tuesday, July 7, 2020 12:23 PM
To: Yanqin Wei <Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>>
Cc: Flavio Leitner <fbl at sysclose.org<mailto:fbl at sysclose.org>>; ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>; nd <nd at arm.com<mailto:nd at arm.com>>; Ilya Maximets <i.maximets at samsung.com<mailto:i.maximets at samsung.com>>; Lee Reed <lee.reed at broadcom.com<mailto:lee.reed at broadcom.com>>; Vinay Gupta <vinay.gupta at broadcom.com<mailto:vinay.gupta at broadcom.com>>; Alex Barba <alex.barba at broadcom.com<mailto:alex.barba at broadcom.com>>
Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Thanks Yangin,
What does this define mean? Every 10 second some kind of book keeping of the packet processing cycles ? Are you saying to make this even bigger in time. 1000 seconds or something? If I want to disable what do I do?
Thanks, Shahaji

On Mon, Jul 6, 2020 at 10:30 PM Yanqin Wei <Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>> wrote:
Hi Shahaji,

It seems to be caused by some periodic task.  In the pmd thread, pmd auto load balance would be done periodically.
/* Time in microseconds of the interval in which rxq processing cycles used
* in rxq to pmd assignments is measured and stored. */
#define PMD_RXQ_INTERVAL_LEN 10000000LL

Would you like to disable it if it is not necessary?

Best Regards,
Wei Yanqin

From: Shahaji Bhosle <shahaji.bhosle at broadcom.com<mailto:shahaji.bhosle at broadcom.com>>
Sent: Monday, July 6, 2020 8:24 PM
To: Yanqin Wei <Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>>
Cc: Flavio Leitner <fbl at sysclose.org<mailto:fbl at sysclose.org>>; ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>; nd <nd at arm.com<mailto:nd at arm.com>>; Ilya Maximets <i.maximets at samsung.com<mailto:i.maximets at samsung.com>>; Lee Reed <lee.reed at broadcom.com<mailto:lee.reed at broadcom.com>>; Vinay Gupta <vinay.gupta at broadcom.com<mailto:vinay.gupta at broadcom.com>>; Alex Barba <alex.barba at broadcom.com<mailto:alex.barba at broadcom.com>>
Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Hi Yanqin,
The drops are random intervals, sometimes I can run for minutes without drops. The case is very borderline with when CPUs are close to 99% and with around 1000 flows. We see the drops once every 10-15 seconds and its random in nature. If I use one ring per core the drops go away, if I enable EMC then the drops go away etc.
Thanks, Shahaji

On Mon, Jul 6, 2020 at 5:27 AM Yanqin Wei <Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>> wrote:
Hi Shahaji,

I have not measured context switch overhead, but I feel it should be acceptable. Because 10Mpps throughput with zero-packet drop(20s) could be achieved in some arm server.  Maybe you could make performance profiling on your test bench to find out the root cause of performance degradation of  multi-rings.

Best Regards,
Wei Yanqin

From: Shahaji Bhosle <shahaji.bhosle at broadcom.com<mailto:shahaji.bhosle at broadcom.com>>
Sent: Thursday, July 2, 2020 9:27 PM
To: Yanqin Wei <Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>>
Cc: Flavio Leitner <fbl at sysclose.org<mailto:fbl at sysclose.org>>; ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>; nd <nd at arm.com<mailto:nd at arm.com>>; Ilya Maximets <i.maximets at samsung.com<mailto:i.maximets at samsung.com>>; Lee Reed <lee.reed at broadcom.com<mailto:lee.reed at broadcom.com>>; Vinay Gupta <vinay.gupta at broadcom.com<mailto:vinay.gupta at broadcom.com>>; Alex Barba <alex.barba at broadcom.com<mailto:alex.barba at broadcom.com>>
Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Thanks Yanqin,
I am not seeing any context switches beyond 40usec in our do nothing loop test. But when OvS packets multiple rings(queues) on the same CPU and the number of packet it starts batching (MAX_BURST_SIZE) the toops will will take more time, I can see rings getting getting filled up. And then its a feedback loop. CPUs are running close to 100% any disturbance at that point I think is too much.
Do you have any data that you use to monitor OvS. I am doing all the above experiments without OvS.
Thanks, Shahaji

On Thu, Jul 2, 2020 at 4:43 AM Yanqin Wei <Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>> wrote:
Hi Shahaji,

IIUC, 1Hz time tick cannot be disabled even if full dynticks, right? But I have no idea of why it caused packet loss because it should be only a small overhead when rcu_nocbs is enabled .

Best Regards,
Wei Yanqin

===========

From: Shahaji Bhosle <shahaji.bhosle at broadcom.com<mailto:shahaji.bhosle at broadcom.com>>
Sent: Thursday, July 2, 2020 6:11 AM
To: Yanqin Wei <Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>>
Cc: Flavio Leitner <fbl at sysclose.org<mailto:fbl at sysclose.org>>; ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>; nd <nd at arm.com<mailto:nd at arm.com>>; Ilya Maximets <i.maximets at samsung.com<mailto:i.maximets at samsung.com>>; Lee Reed <lee.reed at broadcom.com<mailto:lee.reed at broadcom.com>>; Vinay Gupta <vinay.gupta at broadcom.com<mailto:vinay.gupta at broadcom.com>>; Alex Barba <alex.barba at broadcom.com<mailto:alex.barba at broadcom.com>>
Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Hi Yanqin,
I added the patch you gave me to my script which runs a do nothing for loop. You can see the spikes in the below plot. 976/1000 times we are perfect, but around every 1 second u can see something going wrong. I dont see anything wrong in the trace-cmd world.
Thanks, Shahaji

root at bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc
+ TARGET=2
+ MASK=4
+ NUM_ITER=1000
+ NUM_MS=100
+ N=37500000
+ LOGFILE=loop_1000iter_100ms.log
+ tee loop_1000iter_100ms.log
+ trace-cmd record -p function_graph -e all -M 4 -o trace_1000iter_100ms.dat taskset -c 2 /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000
  plugin 'function_graph'
Cycles/Second (Hz) = 3000000000
Nano-seconds per cycle = 0.3333

Using ISB() before rte_rdtsc()
num_iter: 1000
do_nothing_loop for (N)=37500000
Running 1000 iterations of do_nothing_loop for (N)=37500000

Average =          100282.193430333 u-secs
Max     =          124777.488666667 u-secs
Min     =          100000.017666667 u-secs
\u03c3  =            1931.352376508 u-secs

Average =              300846580.29 cycles
Max     =              374332466.00 cycles
Min     =              300000053.00 cycles
\u03c3  =                5794057.13 cycles

#\u03c3 = events
 0 = 976
 1 = 3
 2 = 4
 3 = 3
 4 = 3
 5 = 2
 6 = 2
 7 = 2
 8 = 1
 9 = 1
10 = 1
12 = 2




On Wed, Jul 1, 2020 at 3:57 AM Yanqin Wei <mailto:Yanqin.Wei at arm.com<mailto:Yanqin.Wei at arm.com>> wrote:
Hi Shahaji,

Adding isb instruction can help rdtsc precise, which sync system counter to cntvct_el0. There is a patch in DPDK. https://patchwork.dpdk.org/patch/66561/
So it may be not related with intermittent drops you observed.

Best Regards,
Wei Yanqin

> -----Original Message-----
> From: dev <mailto:ovs-dev-bounces at openvswitch.org<mailto:ovs-dev-bounces at openvswitch.org>> On Behalf Of Shahaji Bhosle
> via dev
> Sent: Wednesday, July 1, 2020 6:05 AM
> To: Flavio Leitner <mailto:fbl at sysclose.org<mailto:fbl at sysclose.org>>
> Cc: mailto:ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>; Ilya Maximets <mailto:i.maximets at samsung.com<mailto:i.maximets at samsung.com>>;
> Lee Reed <mailto:lee.reed at broadcom.com<mailto:lee.reed at broadcom.com>>; Vinay Gupta
> <mailto:vinay.gupta at broadcom.com<mailto:vinay.gupta at broadcom.com>>; Alex Barba <mailto:alex.barba at broadcom.com<mailto:alex.barba at broadcom.com>>
> Subject: Re: [ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)
>
> Hi Flavio,
> I still see intermittent drops with rcu_nocbs. So I wrote that do_nothing()
> loop..to avoid all the other distractions to see if Linux is messing with the OVS
> loop just to see what is going on. The interesting thing I see the case *BOLD*
> below where I use an ISB() instruction my STD deviation is well within Both the
> results are basically DO NOTHING FOR 100msec and see what happens to
> time :) Thanks, Shahaji
>
> static inline uint64_t
> *rte_get_tsc_cycles*(void)
> {
> uint64_t tsc;
> #ifdef USE_ISB
> asm volatile("*isb*; mrs %0, pmccntr_el0" : "=r"(tsc)); #else asm
> volatile("mrs %0, pmccntr_el0" : "=r"(tsc)); #endif return tsc; } #endif
> /*RTE_ARM_EAL_RDTSC_USE_PMU*/
>
> ==================================
> usleep(100);
> for (volatile int i=0; i<num_iter; i++) { const uint64_t tsc_start =
> rte_get_tsc_cycles();
> /* do nothig for 1us second */
> *#ifdef USE_ISB*
> for(volatile int j=0; j < num_us; j++);       *<<<<<<<<<<<< THIS IS MESSED
> UP, 100msec do nothing, I am getting 2033 usec STD DEVIATION* #else
> *for(volatile int j=0; j < num_us; j++);       <<<<<<<<<<<< THIS LOOP HAS
> VERY LOW STD DEVIATION*
> * rte_isb();*
> #endif
> volatile uint64_t tsc_end = rte_get_tsc_cycles(); cycles[i] = tsc_end - tsc_start; }
> usleep(100); calc_avg_var_stddev(num_iter, &cycles[0]);
> ===================================
> *#ifdef USE_ISB*
> root at bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_rdtsc
> + TARGET=2
> + MASK=4
> + NUM_ITER=1000
> + NUM_MS=100
> + N=37500000
> + LOGFILE=loop_1000iter_100ms.log
> + tee loop_1000iter_100ms.log
> + trace-cmd record -p function_graph -e all -M 4 -o
> trace_1000iter_100ms.dat taskset -c 2
> /home/root/arm_stb_user_loop_isb_rdtsc 1000 37500000
>   plugin 'function_graph'
> Cycles/Second (Hz) = 3000000000
> Nano-seconds per cycle = 0.3333
>
> Using ISB() before rte_rdtsc()
> num_iter: 1000
> do_nothing_loop for (N)=37500000
> Running 1000 iterations of do_nothing_loop for (N)=37500000
>
> Average =          100328.158561667 u-secs
> Max =          123024.795333333 u-secs
> Min =          100000.017666667 u-secs
> *\sigma  =            2033.118969489 u-secs*
>
> Average =              300984475.69 cycles
> Max =              369074386.00 cycles
> Min =              300000053.00 cycles
> \sigma  =                6099356.91 cycles
>
> #\sigma = events
>  0 = 968
>  1 = 8
>  2 = 5
>  3 = 3
>  4 = 3
>  5 = 3
>  6 = 3
>  8 = 3
> 10 = 3
> 11 = 1
>
> *#ELSE*
> root at bcm958802a8046c:~/vinay_rx/dynticks-testing# ./run_isb_loop
> + TARGET=2
> + MASK=4
> + NUM_ITER=1000
> + NUM_MS=100
> + N=7316912
> + LOGFILE=loop_1000iter_100ms.log
> + tee loop_1000iter_100ms.log
> + trace-cmd record -p function_graph -e all -M 4 -o
> trace_1000iter_100ms.dat taskset -c 2
> /home/root/arm_stb_user_loop_isb_loop
> 1000 7316912
>   plugin 'function_graph'
> Cycles/Second (Hz) = 3000000000
> Nano-seconds per cycle = 0.3333
>
> NO ISB() before rte_rdtsc()
> num_iter: 1000
> do_nothing_loop for (N)=7316912
> Running 1000 iterations of do_nothing_loop for (N)=7316912
>
> Average =           99999.863256333 u-secs
> Max =          100052.790333333 u-secs
> Min =           99997.807333333 u-secs
> *\u03c3 =               6.497043982 u-secs*
>
> Average =              299999589.77 cycles
> Max =              300158371.00 cycles
> Min =              299993422.00 cycles
> \u03c3 =                  19491.13 cycles
>
> #\u03c3 = events
>  0 = 900
>  2 = 79
>  4 = 17
>  5 = 3
>  8 = 1
>
>
> On Tue, Jun 30, 2020 at 4:42 PM Flavio Leitner <mailto:fbl at sysclose.org<mailto:fbl at sysclose.org>> wrote:
>
> >
> >
> > Hi Shahaji,
> >
> > Did it help with the rcu_nocbs?
> >
> > fbl
> >
> > On Tue, Jun 30, 2020 at 12:56:27PM -0400, Shahaji Bhosle wrote:
> > > Thanks Flavio,
> > > Are there any special requirements for RCU on ARM vs x86.
> > >
> > > I am following what the above document is saying...Do you think I
> > > need to do something more than the below?
> > > Thanks again and appreciate the help. Shahaji
> > >
> > > 1. Isolate the CPU cores
> > > *isolcpus=1,2,3,4,5,6,7 nohz_full=1-7 rcu_nocbs=1-7* 2. Setting
> > > CONFIG_NO_HZ_FULL=y
> > > root at bcm958802a8046c:~/vinay_rx/dynticks-testing# zcat
> > > /proc/config.gz
> > > |grep HZ
> > > CONFIG_NO_HZ_COMMON=y
> > > # CONFIG_HZ_PERIODIC is not set
> > > # CONFIG_NO_HZ_IDLE is not set
> > > *CONFIG_NO_HZ_FULL*=y
> > > # CONFIG_NO_HZ_FULL_ALL is not set
> > > # CONFIG_NO_HZ is not set
> > > # CONFIG_HZ_100 is not set
> > > CONFIG_HZ_250=y
> > > # CONFIG_HZ_300 is not set
> > > # CONFIG_HZ_1000 is not set
> > > CONFIG_HZ=250
> > >
> > >
> > >
> > > On Tue, Jun 30, 2020 at 12:50 PM Flavio Leitner <mailto:fbl at sysclose.org<mailto:fbl at sysclose.org>>
> > wrote:
> > >
> > > >
> > > > Right, you might want to review Documentation/timers/no_hz.rst
> > > > from the kernel sources and look for RCU implications section
> > > > where it explains how to move RCU callbacks.
> > > >
> > > > fbl
> > > >
> > > > On Tue, Jun 30, 2020 at 12:08:05PM -0400, Shahaji Bhosle wrote:
> > > > > Hi Flavio,
> > > > > I wrote a small program which has do_nothing for loop and I
> > > > > measure
> > the
> > > > > timestamps across the do nothing loop. I am seeing 3% of the
> > > > > time
> > around
> > > > > the 1 second mark when the arch_timer fires I get the timestamps
> > > > > to
> > be
> > > > off
> > > > > by 25% of the exprected value. I ran trace-cmd to see what is
> > > > > going
> > on
> > > > and
> > > > > see the below. Looks like some issue with *gic_handle_irg*(),
> > > > > not
> > seeing
> > > > > tihs behaviour on x86 host, something special with ARM v8.
> > > > > Thanks, Shahaji
> > > > >
> > > > >   %21.77  (14181) arm_stb_user_lo                    rcu_dyntick #922
> > > > >          |
> > > > >          --- *rcu_dyntick*
> > > > >             |
> > > > >             |--%46.85-- gic_handle_irq  # 432
> > > > >             |
> > > > >             |--%23.32-- context_tracking_user_exit  # 215
> > > > >             |
> > > > >             |--%22.34-- context_tracking_user_enter  # 206
> > > > >             |
> > > > >             |--%2.60-- SyS_execve  # 24
> > > > >             |
> > > > >             |--%1.30-- do_page_fault  # 12
> > > > >             |
> > > > >             |--%0.65-- SyS_write  # 6
> > > > >             |
> > > > >             |--%0.65-- schedule  # 6
> > > > >             |
> > > > >             |--%0.65-- SyS_nanosleep  # 6
> > > > >             |
> > > > >             |--%0.65-- syscall_trace_enter  # 6
> > > > >             |
> > > > >             |--%0.65-- SyS_faccessat  # 6
> > > > >
> > > > >   %5.01  (14181) arm_stb_user_lo                rcu_utilization #212
> > > > >          |
> > > > >          --- *rcu_utilization*
> > > > >             |
> > > > >             |--%96.23-- gic_handle_irq  # 204
> > > > >             |
> > > > >             |--%1.89-- SyS_nanosleep  # 4
> > > > >             |
> > > > >             |--%0.94-- SyS_exit_group  # 2
> > > > >             |
> > > > >             |--%0.94-- do_notify_resume  # 2
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo                      user_exit #206
> > > > >          |
> > > > >          --- *user_exit*
> > > > >           context_tracking_user_exit
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo     context_tracking_user_exit #206
> > > > >          |
> > > > >          --- context_tracking_user_exit
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo    context_tracking_user_enter #206
> > > > >          |
> > > > >          --- context_tracking_user_enter
> > > > >
> > > > >   %4.86  (14181) arm_stb_user_lo                     user_enter #206
> > > > >          |
> > > > >          --- *user_enter*
> > > > >           context_tracking_user_enter
> > > > >
> > > > >   %2.95  (14181) arm_stb_user_lo                 gic_handle_irq #125
> > > > >          |
> > > > >          --- gic_handle_irq
> > > > >
> > > > >
> > > > > On Tue, Jun 30, 2020 at 9:45 AM Flavio Leitner
> > > > > <mailto:fbl at sysclose.org<mailto:fbl at sysclose.org>>
> > wrote:
> > > > >
> > > > > > On Tue, Jun 02, 2020 at 12:56:51PM -0700, Vinay Gupta wrote:
> > > > > > > Hi Flavio,
> > > > > > >
> > > > > > > Thanks for your reply.
> > > > > > > I have captured the suggested information but do not see
> > > > > > > anything
> > > > that
> > > > > > > could cause the packet drops.
> > > > > > > Can you please take a look at the below data and see if you
> > > > > > > can
> > find
> > > > > > > something unusual ?
> > > > > > > The PMDs are running on CPU 1,2,3,4 and CPU 1-7 are isolated
> > cores.
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > -------------------------------------------------------------------
> > > > > > > root at bcm958802a8046c:~# cstats ; sleep 10; cycles pmd thread
> > > > > > > numa_id 0 core_id 1:
> > > > > > >   idle cycles: 99140849 (7.93%)
> > > > > > >   processing cycles: 1151423715 (92.07%)
> > > > > > >   avg cycles per packet: 116.94 (1250564564/10693918)
> > > > > > >   avg processing cycles per packet: 107.67
> > > > > > > (1151423715/10693918) pmd thread numa_id 0 core_id 2:
> > > > > > >   idle cycles: 118373662 (9.47%)
> > > > > > >   processing cycles: 1132193442 (90.53%)
> > > > > > >   avg cycles per packet: 124.39 (1250567104/10053309)
> > > > > > >   avg processing cycles per packet: 112.62
> > > > > > > (1132193442/10053309) pmd thread numa_id 0 core_id 3:
> > > > > > >   idle cycles: 53805933 (4.30%)
> > > > > > >   processing cycles: 1196762002 (95.70%)
> > > > > > >   avg cycles per packet: 107.35 (1250567935/11649948)
> > > > > > >   avg processing cycles per packet: 102.73
> > > > > > > (1196762002/11649948) pmd thread numa_id 0 core_id 4:
> > > > > > >   idle cycles: 189102938 (15.12%)
> > > > > > >   processing cycles: 1061463293 (84.88%)
> > > > > > >   avg cycles per packet: 143.47 (1250566231/8716828)
> > > > > > >   avg processing cycles per packet: 121.77
> > > > > > > (1061463293/8716828) pmd thread numa_id 0 core_id 5:
> > > > > > > pmd thread numa_id 0 core_id 6:
> > > > > > > pmd thread numa_id 0 core_id 7:
> > > > > >
> > > > > >
> > > > > > The core_id 3 is high loaded, and then it's more likely to
> > > > > > show the drop issue when some other event happens.
> > > > > >
> > > > > > I think you need to run perf as I recommended before and see
> > > > > > if there are context switches happening and why they are happening.
> > > > > >
> > > > > > If a context switch happens, it's either because the core is
> > > > > > not well isolated or some other thing is going on. It will
> > > > > > help to understand why the queue wasn't serviced for a certain
> > > > > > amount of time.
> > > > > >
> > > > > > The issue is that running perf might introduce some load, so
> > > > > > you will need adjust the traffic rate accordingly.
> > > > > >
> > > > > > HTH,
> > > > > > fbl
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *Runtime summary*                          comm  parent
> >  sched-in
> > > > > > > run-time    min-run     avg-run     max-run  stddev  migrations
> > > > > > >                                           (count)       (msec)
> > > >  (msec)
> > > > > > >    (msec)      (msec)       %
> > > > > > >
> > > > > >
> > > >
> > ----------------------------------------------------------------------
> > -----------------------------------------------
> > > > > > >                 ksoftirqd/0[7]       2          1        0.079
> > > > 0.079
> > > > > > >     0.079       0.079    0.00       0
> > > > > > >                   rcu_sched[8]       2         14        0.067
> > > > 0.002
> > > > > > >     0.004       0.009    9.96       0
> > > > > > >                    rcuos/4[38]       2          6        0.027
> > > > 0.002
> > > > > > >     0.004       0.008   20.97       0
> > > > > > >                    rcuos/5[45]       2          4        0.018
> > > > 0.004
> > > > > > >     0.004       0.005    6.63       0
> > > > > > >                kworker/0:1[71]       2         12        0.156
> > > > 0.008
> > > > > > >     0.013       0.019    6.72       0
> > > > > > >                  mmcqd/0[1230]       2          3        0.054
> > > > 0.001
> > > > > > >     0.018       0.031   47.29       0
> > > > > > >             kworker/0:1H[1248]       2          1        0.006
> > > > 0.006
> > > > > > >     0.006       0.006    0.00       0
> > > > > > >            kworker/u16:2[1547]       2         16        0.045
> > > > 0.001
> > > > > > >     0.002       0.012   26.19       0
> > > > > > >                     ntpd[5282]       1          1        0.063
> > > > 0.063
> > > > > > >     0.063       0.063    0.00       0
> > > > > > >                 watchdog[6988]       1          2        0.089
> > > > 0.012
> > > > > > >     0.044       0.076   72.26       0
> > > > > > >             ovs-vswitchd[9239]       1          2        0.326
> > > > 0.152
> > > > > > >     0.163       0.173    6.45       0
> > > > > > >        revalidator8[9309/9239]    9239          2        1.260
> > > > 0.607
> > > > > > >     0.630       0.652    3.58       0
> > > > > > >                    perf[27150]   27140          1        0.000
> > > > 0.000
> > > > > > >     0.000       0.000    0.00       0
> > > > > > >
> > > > > > > Terminated tasks:
> > > > > > >                   sleep[27151]   27150          4        1.002
> > > > 0.015
> > > > > > >     0.250       0.677   58.22       0
> > > > > > >
> > > > > > > Idle stats:
> > > > > > >     CPU  0 idle for    999.814  msec  ( 99.84%)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *CPU  1 idle entire time window    CPU  2 idle entire time window
> > > > > > CPU  3
> > > > > > > idle entire time window    CPU  4 idle entire time window*
> > > > > > >     CPU  5 idle for    500.326  msec  ( 49.96%)
> > > > > > >     CPU  6 idle entire time window
> > > > > > >     CPU  7 idle entire time window
> > > > > > >
> > > > > > >     Total number of unique tasks: 14 Total number of context
> > > > > > > switches: 115
> > > > > > >            Total run time (msec):  3.198
> > > > > > >     Total scheduling time (msec): 1001.425  (x 8)
> > > > > > > (END)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > *02:16:22      UID      TGID       TID    %usr %system  %guest
> > > >  %wait
> > > > > > >  %CPU   CPU  Command *02:16:23        0      9239         -
> > 100.00
> > > > > > 0.00
> > > > > > >    0.00    0.00  100.00     5  ovs-vswitchd
> > > > > > > 02:16:23        0         -      9239    2.00    0.00    0.00
> > 0.00
> > > > > > >  2.00     5  |__ovs-vswitchd
> > > > > > > 02:16:23        0         -      9240    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     0  |__vfio-sync
> > > > > > > 02:16:23        0         -      9241    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__eal-intr-thread
> > > > > > > 02:16:23        0         -      9242    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__dpdk_watchdog1
> > > > > > > 02:16:23        0         -      9244    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__urcu2
> > > > > > > 02:16:23        0         -      9279    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__ct_clean3
> > > > > > > 02:16:23        0         -      9308    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__handler9
> > > > > > > 02:16:23        0         -      9309    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__revalidator8
> > > > > > > 02:16:23        0         -      9328    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     6  |__pmd13
> > > > > > > 02:16:23        0         -      9330  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     3  |__pmd12
> > > > > > > 02:16:23        0         -      9331  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     1  |__pmd11
> > > > > > > 02:16:23        0         -      9332    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     7  |__pmd10
> > > > > > > 02:16:23        0         -      9333    0.00    0.00    0.00
> > 0.00
> > > > > > >  0.00     5  |__pmd16
> > > > > > > 02:16:23        0         -      9334  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     2  |__pmd15
> > > > > > > 02:16:23        0         -      9335  100.00    0.00    0.00
> > 0.00
> > > > > > >  100.00     4  |__pmd14
> > > > > > >
> > > > > >
> > > >
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > -------------------------------------------------------------------
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinay
> > > > > > >
> > > > > > > On Tue, Jun 2, 2020 at 12:06 PM Flavio Leitner
> > > > > > > <mailto:fbl at sysclose.org<mailto:fbl at sysclose.org>
> > >
> > > > wrote:
> > > > > > >
> > > > > > > > On Mon, Jun 01, 2020 at 07:27:09PM -0400, Shahaji Bhosle
> > > > > > > > via
> > dev
> > > > wrote:
> > > > > > > > > Hi Ben/Ilya,
> > > > > > > > > Hope you guys are doing well and staying safe. I have
> > > > > > > > > been
> > > > chasing a
> > > > > > > > weird
> > > > > > > > > problem with small drops and I think that is causing
> > > > > > > > > lots of
> > TCP
> > > > > > > > > retransmission.
> > > > > > > > >
> > > > > > > > > Setup details
> > > > > > > > > iPerf3(1k-5K
> > > > > > > > >
> > Servers)<--
> DPDK2:OvS+DPDK(VxLAN:BOND)[DPDK0+DPDK1)<====2x25G<====
> > > > > > > > > [DPDK0+DPDK1)(VxLAN:BOND)OVS+DPDKDPDK2<---
> iPerf3(Clients
> > > > > > > > > )
> > > > > > > > >
> > > > > > > > > All the Drops are ring drops on BONDed functions on the
> > server
> > > > > > side.  I
> > > > > > > > > have 4 CPUs each with 3PMD threads, DPDK0, DPDK1 and
> > > > > > > > > DPDK2
> > all
> > > > > > running
> > > > > > > > with
> > > > > > > > > 4 Rx rings each.
> > > > > > > > >
> > > > > > > > > What is interesting is when I give each Rx rings its own
> > > > > > > > > CPU
> > the
> > > > > > drops go
> > > > > > > > > away. Or if I set cother_config:emc-insert-inv-prob=1
> > > > > > > > > the
> > drops
> > > > go
> > > > > > away.
> > > > > > > > > But I need to scale up the number of flows so trying to
> > > > > > > > > run
> > this
> > > > > > with EMC
> > > > > > > > > disabled.
> > > > > > > > >
> > > > > > > > > I can tell that the rings are not getting serviced for
> > 30-40usec
> > > > > > because
> > > > > > > > of
> > > > > > > > > some kind context switch or interrupts on these cores. I
> > > > > > > > > have
> > > > tried
> > > > > > to do
> > > > > > > > > the usual isolation, nohz_full rcu_nocbs etc. Move all
> > > > > > > > > the
> > > > interrupts
> > > > > > > > away
> > > > > > > > > from these cores etc. But nothing helps. I mean it
> > > > > > > > > improves,
> > but
> > > > the
> > > > > > > > drops
> > > > > > > > > still happen.
> > > > > > > >
> > > > > > > > When you disable the EMC (or reduce its efficiency) the
> > > > > > > > per
> > packet
> > > > cost
> > > > > > > > increases, then it becomes more sensitive to variations.
> > > > > > > > If you
> > > > share
> > > > > > > > a CPU with multiple queues, you decrease the amount of
> > > > > > > > time
> > > > available
> > > > > > > > to process the queue. In either case, there will be less
> > > > > > > > room
> > to
> > > > > > tolerate
> > > > > > > > variations.
> > > > > > > >
> > > > > > > > Well, you might want to use 'perf' and monitor for the
> > scheduling
> > > > > > events
> > > > > > > > and then based on the stack trace see what is causing it
> > > > > > > > and
> > try to
> > > > > > > > prevent it.
> > > > > > > >
> > > > > > > > For example:
> > > > > > > > # perf record -e sched:sched_switch -a -g sleep 1
> > > > > > > >
> > > > > > > > For instance, you might see that another NIC used for
> > management
> > > > has
> > > > > > > > IRQs assigned to one isolated CPU. You can move it to
> > > > > > > > another
> > CPU
> > > > to
> > > > > > > > reduce the noise, etc...
> > > > > > > >
> > > > > > > > Another suggestion is look at PMD thread idle statistics
> > because it
> > > > > > > > will tell you how much "extra" room you have left. As it
> > approaches
> > > > > > > > to 0, more fine tuned your setup needs to be to avoid drops.
> > > > > > > >
> > > > > > > > HTH,
> > > > > > > > --
> > > > > > > > fbl
> > > > > > > >
> > > > > >
> > > > > > --
> > > > > > fbl
> > > > > >
> > > >
> > > > --
> > > > fbl
> > > >
> >
> > --
> > fbl
> >
> _______________________________________________
> dev mailing list
> mailto:dev at openvswitch.org<mailto:dev at openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


More information about the dev mailing list