[ovs-dev] 10-25 packet drops every few (10-50) seconds TCP (iperf3)

Vinay Gupta vinay.gupta at broadcom.com
Tue Jun 2 19:56:51 UTC 2020


Hi Flavio,

Thanks for your reply.
I have captured the suggested information but do not see anything that
could cause the packet drops.
Can you please take a look at the below data and see if you can find
something unusual ?
The PMDs are running on CPU 1,2,3,4 and CPU 1-7 are isolated cores.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
root at bcm958802a8046c:~# cstats ; sleep 10; cycles
pmd thread numa_id 0 core_id 1:
  idle cycles: 99140849 (7.93%)
  processing cycles: 1151423715 (92.07%)
  avg cycles per packet: 116.94 (1250564564/10693918)
  avg processing cycles per packet: 107.67 (1151423715/10693918)
pmd thread numa_id 0 core_id 2:
  idle cycles: 118373662 (9.47%)
  processing cycles: 1132193442 (90.53%)
  avg cycles per packet: 124.39 (1250567104/10053309)
  avg processing cycles per packet: 112.62 (1132193442/10053309)
pmd thread numa_id 0 core_id 3:
  idle cycles: 53805933 (4.30%)
  processing cycles: 1196762002 (95.70%)
  avg cycles per packet: 107.35 (1250567935/11649948)
  avg processing cycles per packet: 102.73 (1196762002/11649948)
pmd thread numa_id 0 core_id 4:
  idle cycles: 189102938 (15.12%)
  processing cycles: 1061463293 (84.88%)
  avg cycles per packet: 143.47 (1250566231/8716828)
  avg processing cycles per packet: 121.77 (1061463293/8716828)
pmd thread numa_id 0 core_id 5:
pmd thread numa_id 0 core_id 6:
pmd thread numa_id 0 core_id 7:


*Runtime summary*                          comm  parent   sched-in
run-time    min-run     avg-run     max-run  stddev  migrations
                                          (count)       (msec)     (msec)
   (msec)      (msec)       %
---------------------------------------------------------------------------------------------------------------------
                ksoftirqd/0[7]       2          1        0.079      0.079
    0.079       0.079    0.00       0
                  rcu_sched[8]       2         14        0.067      0.002
    0.004       0.009    9.96       0
                   rcuos/4[38]       2          6        0.027      0.002
    0.004       0.008   20.97       0
                   rcuos/5[45]       2          4        0.018      0.004
    0.004       0.005    6.63       0
               kworker/0:1[71]       2         12        0.156      0.008
    0.013       0.019    6.72       0
                 mmcqd/0[1230]       2          3        0.054      0.001
    0.018       0.031   47.29       0
            kworker/0:1H[1248]       2          1        0.006      0.006
    0.006       0.006    0.00       0
           kworker/u16:2[1547]       2         16        0.045      0.001
    0.002       0.012   26.19       0
                    ntpd[5282]       1          1        0.063      0.063
    0.063       0.063    0.00       0
                watchdog[6988]       1          2        0.089      0.012
    0.044       0.076   72.26       0
            ovs-vswitchd[9239]       1          2        0.326      0.152
    0.163       0.173    6.45       0
       revalidator8[9309/9239]    9239          2        1.260      0.607
    0.630       0.652    3.58       0
                   perf[27150]   27140          1        0.000      0.000
    0.000       0.000    0.00       0

Terminated tasks:
                  sleep[27151]   27150          4        1.002      0.015
    0.250       0.677   58.22       0

Idle stats:
    CPU  0 idle for    999.814  msec  ( 99.84%)



*CPU  1 idle entire time window    CPU  2 idle entire time window    CPU  3
idle entire time window    CPU  4 idle entire time window*
    CPU  5 idle for    500.326  msec  ( 49.96%)
    CPU  6 idle entire time window
    CPU  7 idle entire time window

    Total number of unique tasks: 14
Total number of context switches: 115
           Total run time (msec):  3.198
    Total scheduling time (msec): 1001.425  (x 8)
(END)



*02:16:22      UID      TGID       TID    %usr %system  %guest   %wait
 %CPU   CPU  Command *02:16:23        0      9239         -  100.00    0.00
   0.00    0.00  100.00     5  ovs-vswitchd
02:16:23        0         -      9239    2.00    0.00    0.00    0.00
 2.00     5  |__ovs-vswitchd
02:16:23        0         -      9240    0.00    0.00    0.00    0.00
 0.00     0  |__vfio-sync
02:16:23        0         -      9241    0.00    0.00    0.00    0.00
 0.00     5  |__eal-intr-thread
02:16:23        0         -      9242    0.00    0.00    0.00    0.00
 0.00     5  |__dpdk_watchdog1
02:16:23        0         -      9244    0.00    0.00    0.00    0.00
 0.00     5  |__urcu2
02:16:23        0         -      9279    0.00    0.00    0.00    0.00
 0.00     5  |__ct_clean3
02:16:23        0         -      9308    0.00    0.00    0.00    0.00
 0.00     5  |__handler9
02:16:23        0         -      9309    0.00    0.00    0.00    0.00
 0.00     5  |__revalidator8
02:16:23        0         -      9328    0.00    0.00    0.00    0.00
 0.00     6  |__pmd13
02:16:23        0         -      9330  100.00    0.00    0.00    0.00
 100.00     3  |__pmd12
02:16:23        0         -      9331  100.00    0.00    0.00    0.00
 100.00     1  |__pmd11
02:16:23        0         -      9332    0.00    0.00    0.00    0.00
 0.00     7  |__pmd10
02:16:23        0         -      9333    0.00    0.00    0.00    0.00
 0.00     5  |__pmd16
02:16:23        0         -      9334  100.00    0.00    0.00    0.00
 100.00     2  |__pmd15
02:16:23        0         -      9335  100.00    0.00    0.00    0.00
 100.00     4  |__pmd14
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Thanks
Vinay

On Tue, Jun 2, 2020 at 12:06 PM Flavio Leitner <fbl at sysclose.org> wrote:

> On Mon, Jun 01, 2020 at 07:27:09PM -0400, Shahaji Bhosle via dev wrote:
> > Hi Ben/Ilya,
> > Hope you guys are doing well and staying safe. I have been chasing a
> weird
> > problem with small drops and I think that is causing lots of TCP
> > retransmission.
> >
> > Setup details
> > iPerf3(1k-5K
> > Servers)<--DPDK2:OvS+DPDK(VxLAN:BOND)[DPDK0+DPDK1)<====2x25G<====
> > [DPDK0+DPDK1)(VxLAN:BOND)OVS+DPDKDPDK2<---iPerf3(Clients)
> >
> > All the Drops are ring drops on BONDed functions on the server side.  I
> > have 4 CPUs each with 3PMD threads, DPDK0, DPDK1 and DPDK2 all running
> with
> > 4 Rx rings each.
> >
> > What is interesting is when I give each Rx rings its own CPU the drops go
> > away. Or if I set cother_config:emc-insert-inv-prob=1 the drops go away.
> > But I need to scale up the number of flows so trying to run this with EMC
> > disabled.
> >
> > I can tell that the rings are not getting serviced for 30-40usec because
> of
> > some kind context switch or interrupts on these cores. I have tried to do
> > the usual isolation, nohz_full rcu_nocbs etc. Move all the interrupts
> away
> > from these cores etc. But nothing helps. I mean it improves, but the
> drops
> > still happen.
>
> When you disable the EMC (or reduce its efficiency) the per packet cost
> increases, then it becomes more sensitive to variations. If you share
> a CPU with multiple queues, you decrease the amount of time available
> to process the queue. In either case, there will be less room to tolerate
> variations.
>
> Well, you might want to use 'perf' and monitor for the scheduling events
> and then based on the stack trace see what is causing it and try to
> prevent it.
>
> For example:
> # perf record -e sched:sched_switch -a -g sleep 1
>
> For instance, you might see that another NIC used for management has
> IRQs assigned to one isolated CPU. You can move it to another CPU to
> reduce the noise, etc...
>
> Another suggestion is look at PMD thread idle statistics because it
> will tell you how much "extra" room you have left. As it approaches
> to 0, more fine tuned your setup needs to be to avoid drops.
>
> HTH,
> --
> fbl
>


More information about the dev mailing list