[ovs-dev] Flaky ALB tests.

Ilya Maximets i.maximets at ovn.org
Fri Nov 19 12:25:53 UTC 2021


Hi, Kevin.

After the recent ALB commits, some of these tests are failing more
or less frequently in our CI and on my local setup too.  Could you,
please, take a look?

Below is one of the recent examples.  Another test that fails with
similar symptoms periodically is 'ALB - min num PMD/RxQ'.
It looks like it doesn't actually perform the dry run.  At least it
doesn't log that.

Bets regards, Ilya Maximets.


#                             -*- compilation -*-
1026. alb.at:98: testing ALB - PMD/RxQ assignment type ...
../../../tests/alb.at:99: ovsdb-tool create conf.db $abs_top_srcdir/vswitchd/vswitch.ovsschema
../../../tests/alb.at:99: ovsdb-server --detach --no-chdir --pidfile --log-file --remote=punix:$OVS_RUNDIR/db.sock
stderr:
2021-11-19T02:22:41Z|00001|vlog|INFO|opened log file /home/runner/work/ovs/ovs/openvswitch-2.16.90/_build/sub/tests/testsuite.dir/1026/ovsdb-server.log
../../../tests/alb.at:99: sed < stderr '
/vlog|INFO|opened log file/d
/ovsdb_server|INFO|ovsdb-server (Open vSwitch)/d'
../../../tests/alb.at:99: ovs-vsctl --no-wait init 
../../../tests/alb.at:99: ovs-vswitchd --enable-dummy --disable-system --disable-system-route --dummy-numa="0,0" --detach --no-chdir --pidfile --log-file -vvconn -vofproto_dpif -vunixctl
stderr:
2021-11-19T02:22:41Z|00001|vlog|INFO|opened log file /home/runner/work/ovs/ovs/openvswitch-2.16.90/_build/sub/tests/testsuite.dir/1026/ovs-vswitchd.log
2021-11-19T02:22:41Z|00002|ovs_numa|INFO|Discovered 2 CPU cores on NUMA node 0
2021-11-19T02:22:41Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 2 CPU cores
2021-11-19T02:22:41Z|00004|reconnect|INFO|unix:/home/runner/work/ovs/ovs/openvswitch-2.16.90/_build/sub/tests/testsuite.dir/1026/db.sock: connecting...
2021-11-19T02:22:41Z|00005|netlink_socket|INFO|netlink: could not enable listening to all nsid (Operation not permitted)
2021-11-19T02:22:41Z|00006|reconnect|INFO|unix:/home/runner/work/ovs/ovs/openvswitch-2.16.90/_build/sub/tests/testsuite.dir/1026/db.sock: connected
../../../tests/alb.at:99: sed < stderr '
/ovs_numa|INFO|Discovered /d
/vlog|INFO|opened log file/d
/vswitchd|INFO|ovs-vswitchd (Open vSwitch)/d
/reconnect|INFO|/d
/dpif_netlink|INFO|Generic Netlink family .ovs_datapath. does not exist/d
/ofproto|INFO|using datapath ID/d
/netdev_linux|INFO|.*device has unknown hardware address family/d
/ofproto|INFO|datapath ID changed to fedcba9876543210/d
/dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d
/netlink_socket|INFO|netlink: could not enable listening to all nsid/d
/probe tc:/d
/tc: Using policy/d'
../../../tests/alb.at:99: add_of_br 0 add-port br0 p0 \
                    -- set Interface p0 type=dummy-pmd options:n_rxq=3 \
                    -- set Open_vSwitch . other_config:pmd-cpu-mask=3 \
                    -- set open_vswitch . other_config:pmd-auto-lb="true" \
                    -- set open_vswitch . other_config:pmd-auto-lb-load-threshold=0 
alb.at:105: waiting until grep "PMD auto load balance is enabled" ovs-vswitchd.log...
2021-11-19T02:22:41.880Z|00039|dpif_netdev|INFO|PMD auto load balance is enabled, interval 1 mins, pmd load threshold 0%, improvement threshold 25%.
alb.at:105: wait succeeded immediately
../../../tests/alb.at:109: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=roundrobin
alb.at:110: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "mode changed to: 'roundrobin'"...
2021-11-19T02:22:41.942Z|00051|dpif_netdev|INFO|Rxq to PMD assignment mode changed to: 'roundrobin'.
alb.at:110: wait succeeded immediately
../../../tests/alb.at:113: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="false"
alb.at:114: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is disabled"...
2021-11-19T02:22:42.026Z|00056|dpif_netdev|INFO|PMD auto load balance is disabled.
alb.at:114: wait succeeded immediately
../../../tests/alb.at:116: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
alb.at:117: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is enabled"...
2021-11-19T02:22:42.060Z|00057|dpif_netdev|INFO|PMD auto load balance is enabled, interval 1 mins, pmd load threshold 0%, improvement threshold 25%.
alb.at:117: wait succeeded immediately
../../../tests/alb.at:121: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=cycles
alb.at:122: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "mode changed to: 'cycles'"...
2021-11-19T02:22:42.122Z|00058|dpif_netdev|INFO|Rxq to PMD assignment mode changed to: 'cycles'.
alb.at:122: wait succeeded immediately
../../../tests/alb.at:125: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="false"
alb.at:126: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is disabled"...
2021-11-19T02:22:42.175Z|00063|dpif_netdev|INFO|PMD auto load balance is disabled.
alb.at:126: wait succeeded immediately
../../../tests/alb.at:128: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
alb.at:129: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is enabled"...
2021-11-19T02:22:42.230Z|00064|dpif_netdev|INFO|PMD auto load balance is enabled, interval 1 mins, pmd load threshold 0%, improvement threshold 25%.
alb.at:129: wait succeeded immediately
../../../tests/alb.at:132: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=group
alb.at:133: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "mode changed to: 'group'"...
2021-11-19T02:22:42.279Z|00065|dpif_netdev|INFO|Rxq to PMD assignment mode changed to: 'group'.
alb.at:133: wait succeeded immediately
../../../tests/alb.at:136: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="false"
alb.at:137: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is disabled"...
2021-11-19T02:22:42.305Z|00070|dpif_netdev|INFO|PMD auto load balance is disabled.
alb.at:137: wait succeeded immediately
../../../tests/alb.at:139: ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
alb.at:140: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance is enabled"...
2021-11-19T02:22:42.330Z|00071|dpif_netdev|INFO|PMD auto load balance is enabled, interval 1 mins, pmd load threshold 0%, improvement threshold 25%.
alb.at:140: wait succeeded immediately
../../../tests/alb.at:143: ovs-appctl vlog/set dpif_netdev:dbg
../../../tests/alb.at:146: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=group
warped
alb.at:149: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance performing dry run."...
2021-11-19T02:22:42.434Z|00078|dpif_netdev|DBG|PMD auto load balance performing dry run.
alb.at:149: wait succeeded quickly
../../../tests/alb.at:152: ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=cycles
alb.at:153: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "mode changed to: 'cycles'"...
2021-11-19T02:22:42.570Z|00085|dpif_netdev|INFO|Rxq to PMD assignment mode changed to: 'cycles'.
alb.at:153: wait succeeded immediately
warped
alb.at:155: waiting until tail -n +$LINENUM ovs-vswitchd.log | grep "PMD auto load balance performing dry run."...
alb.at:155: wait failed after 30 seconds
../../../tests/ovs-macros.at:242: hard failure
1026. alb.at:98: 1026. ALB - PMD/RxQ assignment type (alb.at:98): FAILED (ovs-macros.at:242)



Last bits of the ovs-vswitchd.log:

2021-11-19T02:22:42.434Z|00078|dpif_netdev|DBG|PMD auto load balance performing dry run.
2021-11-19T02:22:42.434Z|00079|dpif_netdev|DBG|Core  0 on numa node 0 assigned port 'p0' rx queue 0 (measured processing cycles 0).
2021-11-19T02:22:42.434Z|00080|dpif_netdev|DBG|Core  1 on numa node 0 assigned port 'p0' rx queue 1 (measured processing cycles 0).
2021-11-19T02:22:42.434Z|00081|dpif_netdev|DBG|Core  0 on numa node 0 assigned port 'p0' rx queue 2 (measured processing cycles 0).
2021-11-19T02:22:42.434Z|00082|dpif_netdev|DBG|Current variance 0 Estimated variance 0.
2021-11-19T02:22:42.434Z|00083|dpif_netdev|DBG|Variance improvement 0%.
2021-11-19T02:22:42.434Z|00084|dpif_netdev|DBG|PMD load variance improvement threshold 25% is not met.
2021-11-19T02:22:42.570Z|00085|dpif_netdev|INFO|Rxq to PMD assignment mode changed to: 'cycles'.
2021-11-19T02:22:42.570Z|00086|dpif_netdev|INFO|Performing pmd to rx queue assignment using cycles algorithm.
2021-11-19T02:22:42.570Z|00087|dpif_netdev|INFO|Core  0 on numa node 0 assigned port 'p0' rx queue 0 (measured processing cycles 0).
2021-11-19T02:22:42.570Z|00088|dpif_netdev|INFO|Core  1 on numa node 0 assigned port 'p0' rx queue 1 (measured processing cycles 0).
2021-11-19T02:22:42.570Z|00089|dpif_netdev|INFO|Core  1 on numa node 0 assigned port 'p0' rx queue 2 (measured processing cycles 0).
2021-11-19T02:22:42.581Z|00001|dpif_netdev(pmd-c00/id:6)|DBG|Core 0 processing port 'p0' with queue-id 0
2021-11-19T02:22:42.584Z|00001|dpif_netdev(pmd-c01/id:7)|DBG|Core 1 processing port 'p0' with queue-id 2
2021-11-19T02:22:42.584Z|00002|dpif_netdev(pmd-c01/id:7)|DBG|Core 1 processing port 'p0' with queue-id 1
2021-11-19T02:22:42.598Z|00090|unixctl|DBG|received request time/warp["60000","10000"], id=0
2021-11-19T02:22:42.651Z|00091|unixctl|DBG|replying with success, id=0: "warped"
2021-11-19T02:23:13.232Z|00092|unixctl|DBG|received request exit["--cleanup"], id=0
2021-11-19T02:23:13.232Z|00093|unixctl|DBG|replying with success, id=0: ""
2021-11-19T02:23:13.233Z|00094|bridge|INFO|bridge br0: deleted interface br0 on port 65534
2021-11-19T02:23:13.236Z|00095|bridge|INFO|bridge br0: deleted interface p0 on port 1
2021-11-19T02:23:13.244Z|00003|dpif_netdev(pmd-c01/id:7)|DBG|Core 1 processing port 'p0' with queue-id 2
2021-11-19T02:23:13.244Z|00004|dpif_netdev(pmd-c01/id:7)|DBG|Core 1 processing port 'p0' with queue-id 1
2021-11-19T02:23:13.253Z|00002|dpif_netdev(pmd-c00/id:6)|DBG|Core 0 processing port 'p0' with queue-id 0
2021-11-19T02:23:13.256Z|00096|dpif_netdev|INFO|Performing pmd to rx queue assignment using cycles algorithm.
2021-11-19T02:23:13.256Z|00097|dpif_netdev|INFO|Core  0 on numa node 0 assigned port 'p0' rx queue 0 (measured processing cycles 0).
2021-11-19T02:23:13.256Z|00098|dpif_netdev|INFO|Core  1 on numa node 0 assigned port 'p0' rx queue 1 (measured processing cycles 0).
2021-11-19T02:23:13.256Z|00099|dpif_netdev|INFO|Core  1 on numa node 0 assigned port 'p0' rx queue 2 (measured processing cycles 0).
2021-11-19T02:23:13.257Z|00100|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  1 destroyed.
2021-11-19T02:23:13.258Z|00101|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  0 destroyed.


More information about the dev mailing list