[ovs-git] [openvswitch/ovs] 12d0ed: dpif-netdev: Avoid deadlock with offloading during...

Roi Dayan noreply at github.com
Fri Jul 17 01:54:27 UTC 2020


  Branch: refs/heads/master
  Home:   https://github.com/openvswitch/ovs
  Commit: 12d0edd75eba3e9262c44fa77938b27bda491e79
      https://github.com/openvswitch/ovs/commit/12d0edd75eba3e9262c44fa77938b27bda491e79
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-07-16 (Thu, 16 Jul 2020)

  Changed paths:
    M AUTHORS.rst
    M lib/dpif-netdev.c

  Log Message:
  -----------
  dpif-netdev: Avoid deadlock with offloading during PMD thread deletion.

Main thread will try to pause/stop all revalidators during datapath
reconfiguration via datapath purge callback (dp_purge_cb) while
holding 'dp->port_mutex'.  And deadlock happens in case any of
revalidator threads is already waiting on 'dp->port_mutex' while
dumping offloaded flows:

           main thread                           revalidator
 ---------------------------------  ----------------------------------

 ovs_mutex_lock(&dp->port_mutex)

                                    dpif_netdev_flow_dump_next()
                                    -> dp_netdev_flow_to_dpif_flow
                                    -> get_dpif_flow_status
                                    -> dpif_netdev_get_flow_offload_status()
                                    -> ovs_mutex_lock(&dp->port_mutex)
                                       <waiting for mutex here>

 reconfigure_datapath()
 -> reconfigure_pmd_threads()
 -> dp_netdev_del_pmd()
 -> dp_purge_cb()
 -> udpif_pause_revalidators()
 -> ovs_barrier_block(&udpif->pause_barrier)
    <waiting for revalidators to reach barrier>

                          <DEADLOCK>

We're not allowed to call offloading API without holding global
port mutex from the userspace datapath due to thread safety
restrictions on netdev-offload-dpdk module.  And it's also not easy
to rework datapath reconfiguration process in order to move actual
PMD removal and datapath purge out of the port mutex.

So, for now, not sleeping on a mutex if it's not immediately available
seem like an easiest workaround.  This will have impact on flow
statistics update rate and on ability to get the latest statistics
before removing the flow (latest stats will be lost in case we were
not able to take the mutex).  However, this will allow us to operate
normally avoiding the deadlock.

The last bit is that to avoid flapping of flow attributes and
statistics we're not failing the operation, but returning last
statistics and attributes returned by offload provider.  Since those
might be updated in different threads, stores and reads are atomic.

Reported-by: Frank Wang (王培辉) <wangpeihui at inspur.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/371753.html
Fixes: a309e4f52660 ("dpif-netdev: Update offloaded flows statistics.")
Acked-by: Kevin Traynor <ktraynor at redhat.com>
Acked-by: Ian Stokes <ian.stokes at intel.com>
Tested-by: Eli Britstein <elibr at mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: b4e50218a0f8da43ffe7c420826ddb19985b0b03
      https://github.com/openvswitch/ovs/commit/b4e50218a0f8da43ffe7c420826ddb19985b0b03
  Author: Jeff Squyres <jsquyres at cisco.com>
  Date:   2020-07-17 (Fri, 17 Jul 2020)

  Changed paths:
    M NEWS
    M ofproto/bond.c
    M ofproto/bond.h
    M tests/lacp.at
    M tests/ofproto-dpif.at
    M vswitchd/bridge.c
    M vswitchd/vswitch.xml

  Log Message:
  -----------
  bond: Add 'primary' interface concept for active-backup mode.

In AB bonding, if the current active slave becomes disabled, a
replacement slave is arbitrarily picked from the remaining set of
enabled slaves.  This commit adds the concept of a "primary" slave: an
interface that will always be (or become) the current active slave if
it is enabled.

The rationale for this functionality is to allow the designation of a
preferred interface for a given bond.  For example:

1. Bond is created with interfaces p1 (primary) and p2, both enabled.
2. p1 becomes the current active slave (because it was designated as
   the primary).
3. Later, p1 fails/becomes disabled.
4. p2 is chosen to become the current active slave.
5. Later, p1 becomes re-enabled.
6. p1 is chosen to become the current active slave (because it was
   designated as the primary)

Note that p1 becomes the active slave once it becomes re-enabled, even
if nothing has happened to p2.

This "primary" concept exists in Linux kernel network interface
bonding, but did not previously exist in OVS bonding.

Only one primary slave interface is supported per bond, and is only
supported for active/backup bonding.

The primary slave interface is designated via
"other_config:bond-primary" when creating a bond.

Also, while adding tests for the "primary" concept, make a few small
improvements to the non-primary AB bonding test.

Signed-off-by: Jeff Squyres <jsquyres at cisco.com>
Reviewed-by: Aaron Conole <aconole at redhat.com>
Tested-by: Greg Rose <gvrose8192 at gmail.com>
Acked-by: Greg Rose <gvrose8192 at gmail.com>
Acked-by: Flavio Leitner <fbl at sysclose.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 55ec5df3b56700eec4b3e8312cff989f1fef1597
      https://github.com/openvswitch/ovs/commit/55ec5df3b56700eec4b3e8312cff989f1fef1597
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-07-17 (Fri, 17 Jul 2020)

  Changed paths:
    M AUTHORS.rst

  Log Message:
  -----------
  AUTHORS: Add Jeff Squyres.

Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 9ecaa5cb71b6e4eedfa3566fa2c5b0f537198ac9
      https://github.com/openvswitch/ovs/commit/9ecaa5cb71b6e4eedfa3566fa2c5b0f537198ac9
  Author: Roi Dayan <roid at mellanox.com>
  Date:   2020-07-17 (Fri, 17 Jul 2020)

  Changed paths:
    M rhel/openvswitch-fedora.spec.in

  Log Message:
  -----------
  rhel: openvswitch-fedora.spec.in: Fix installed but not packaged.

With the cited commit, we get an error from rpmbuild about installed
but not packaged /usr/lib64/libopenvswitchavx512.a.
Fix it by treating it as the other la files.

Fixes: 352b6c7116cd ("dpif-lookup: add avx512 gather implementation.")
Signed-off-by: Roi Dayan <roid at mellanox.com>
Acked-by: Ian Stokes <ian.stokes at intel.com>
Tested-by: Greg Rose <gvrose8192 at gmail.com>
Acked-by: Greg Rose <gvrose8192 at gmail.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 9af9dbcec933401ae95b4f51c47cc9db21a7b351
      https://github.com/openvswitch/ovs/commit/9af9dbcec933401ae95b4f51c47cc9db21a7b351
  Author: David Marchand <david.marchand at redhat.com>
  Date:   2020-07-17 (Fri, 17 Jul 2020)

  Changed paths:
    M NEWS
    M lib/automake.mk
    A lib/dpdk-unixctl.man
    M lib/dpdk.c
    M vswitchd/ovs-vswitchd.8.in

  Log Message:
  -----------
  dpdk: Add commands to configure log levels.

Enabling debug logs in dpdk can be a challenge to be sure of what is
actually enabled, add commands to list and change those log levels.
However, these commands do not help when tracking issues in dpdk init
itself: dump log levels right after init.

Example:
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is info
id 1: lib.malloc, level is info
id 2: lib.ring, level is info
id 3: lib.mempool, level is info
id 4: lib.timer, level is info
id 5: pmd, level is info
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is info
[...]

$ ovs-appctl dpdk/log-set debug pmd.*:notice
$ ovs-appctl dpdk/log-list
global log level is debug
id 0: lib.eal, level is debug
id 1: lib.malloc, level is debug
id 2: lib.ring, level is debug
id 3: lib.mempool, level is debug
id 4: lib.timer, level is debug
id 5: pmd, level is debug
[...]
id 37: pmd.net.bnxt.driver, level is notice
id 38: pmd.net.e1000.init, level is notice
id 39: pmd.net.e1000.driver, level is notice
id 40: pmd.net.enic, level is notice
[...]

Signed-off-by: David Marchand <david.marchand at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 8231c9f624b3e0c4f0b28f78767240292a65d5ef
      https://github.com/openvswitch/ovs/commit/8231c9f624b3e0c4f0b28f78767240292a65d5ef
  Author: Timothy Redaelli <tredaelli at redhat.com>
  Date:   2020-07-17 (Fri, 17 Jul 2020)

  Changed paths:
    M acinclude.m4

  Log Message:
  -----------
  acinclude: Remove libmnl for MLX5 PMD.

libmnl is not used anymore for MLX5 PMD since DPDK 19.08.

Signed-off-by: Timothy Redaelli <tredaelli at redhat.com>
Acked-by: Numan Siddique <numans at ovn.org>
Reviewed-by: David Marchand <david.marchand at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 68a95c9ca7cea83f91c7fd152608e8ee6621013d
      https://github.com/openvswitch/ovs/commit/68a95c9ca7cea83f91c7fd152608e8ee6621013d
  Author: Roi Dayan <roid at mellanox.com>
  Date:   2020-07-17 (Fri, 17 Jul 2020)

  Changed paths:
    M utilities/checkpatch.py

  Log Message:
  -----------
  checkpatch: Add argument to skip gerrit change id check.

This arg can be used internally by groups using gerrit for code reviews.

Acked-by: Flavio Leitner <fbl at sysclose.org>
Signed-off-by: Roi Dayan <roid at mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/c57e02cfd797...68a95c9ca7ce


More information about the git mailing list