[ovs-git] [openvswitch/ovs] 68466e: tests/flowgen: Fix length field of 802.2 data link...

Ilya Maximets noreply at github.com
Tue Nov 30 14:56:16 UTC 2021


  Branch: refs/heads/branch-2.15
  Home:   https://github.com/openvswitch/ovs
  Commit: 68466efed9528036a9ba4cdaf7a979b6d8adf960
      https://github.com/openvswitch/ovs/commit/68466efed9528036a9ba4cdaf7a979b6d8adf960
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-11-30 (Tue, 30 Nov 2021)

  Changed paths:
    M tests/flowgen.py

  Log Message:
  -----------
  tests/flowgen: Fix length field of 802.2 data link header.

Length in Data Link Header for these packets should not include
source and destination MACs or the length field itself.

Therefore, it should be 14 bytes less, otherwise other network
tools like wireshark complains:

  Expert Info (Error/Malformed):
    Length field value goes past the end of the payload

Additionally fixing the printing of the packet/flow configuration,
as it currently prints '%s=%s' strings without any real data.

Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
Acked-by: Paolo Valerio <pvalerio at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 0a7e66e37fe7bf5e76c40f33a3b402ad389f450c
      https://github.com/openvswitch/ovs/commit/0a7e66e37fe7bf5e76c40f33a3b402ad389f450c
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-11-30 (Tue, 30 Nov 2021)

  Changed paths:
    M ofproto/ofproto-provider.h
    M ofproto/ofproto.c

  Log Message:
  -----------
  ofproto: Fix resource usage explosion while processing bundled FLOW_MOD.

While processing a bundle, OVS will add all new and modified rules
to classifiers.  Classifiers are using RCU-protected pvector to
store subtables.  Addition of a new subtable or removal of the old
one leads to re-allocation and memory copy of the pvector array.
Old version of that array is given to RCU thread to free it later.

The problem is that bundle is processed under the mutex without
entering the quiescent state.  Therefore, memory can not be freed
until the whole bundle is processed.  So, if a few thousands of
flows added to the same table in a bundle, pvector will be re-allocated
while adding each of them.  So, we'll have a few thousands of copies
of the same array waiting to be freed.

In case of OVN deployments, there could be hundreds of thousands of
flows in the same table leading to a fast consumption of a huge
amount of memory and wasting a lot of CPU cycles on allocations and
copies.  Below snippet of the 'valgrind --tool=massif' output shows
ovs-vswitchd consuming 3.5GB of RAM while processing a bundle with
65K FLOW_MODs in the OVN deployment.  3.4GB of that memory are
copies of the same pvector.

 -------------------------------------------------------------------
   n        time(i)         total(B)   useful-heap(B) extra-heap(B)
 -------------------------------------------------------------------
  64 109,907,465,404    3,559,987,568    3,546,879,748    13,107,820
 99.63% (3,546,879,748B) (heap allocation functions) malloc/new/new[]
 ->97.61% (3,474,750,333B) xmalloc__ (util.c:137)
 |->97.61% (3,474,750,333B) xmalloc (util.c:172)
 | ->96.38% (3,431,068,352B) pvector_impl_dup (pvector.c:48)
 || ->96.38% (3,431,067,840B) pvector_insert (pvector.c:138)
 || |->96.38% (3,431,067,840B) classifier_replace (classifier.c:664)
 || | ->96.38% (3,431,067,840B) classifier_insert (classifier.c:695)
 || |  ->96.38% (3,431,067,840B) replace_rule_start (ofproto.c:5563)
 || |   ->96.38% (3,431,067,840B) add_flow_start (ofproto.c:5179)
 || |    ->96.38% (3,431,067,840B) ofproto_flow_mod_start (ofproto.c:8017)
 || |     ->96.38% (3,431,067,744B) do_bundle_commit (ofproto.c:8168)
 || |     |->96.38% (3,431,067,744B) handle_bundle_control (ofproto.c:8309)
 || |     | ->96.38% (3,431,067,744B) handle_single_part_openflow (ofproto.c:8593)
 || |     |  ->96.38% (3,431,067,744B) handle_openflow (ofproto.c:8674)
 || |     |   ->96.38% (3,431,067,744B) ofconn_run (connmgr.c:1329)
 || |     |    ->96.38% (3,431,067,744B) connmgr_run (connmgr.c:356)
 || |     |     ->96.38% (3,431,067,744B) ofproto_run (ofproto.c:1879)
 || |     |      ->96.38% (3,431,067,744B) bridge_run__ (bridge.c:3251)
 || |     |       ->96.38% (3,431,067,744B) bridge_run (bridge.c:3310)
 || |     |        ->96.38% (3,431,067,744B) main (ovs-vswitchd.c:127)

Fixing that by postponing the publishing of classifier updates,
so each flow modification can work with the same version of pvector.

Unfortunately, bundled PACKET_OUT messages requires all previous
changes to be published before processing, otherwise the packet
will use wrong version of OF tables.  Publishing all changes before
processing PACKET_OUT messages to avoid this issue.  Hopefully,
mixup of a big number of FLOW_MOD and PACKET_OUT messages is not
a common usecase.

Reported-by: Vladislav Odintsov <odivlad at gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389503.html
Tested-by: Vladislav Odintsov <odivlad at gmail.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: d2e0632dbe03cd06a1cf70ca42f916dce1b9509f
      https://github.com/openvswitch/ovs/commit/d2e0632dbe03cd06a1cf70ca42f916dce1b9509f
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-11-30 (Tue, 30 Nov 2021)

  Changed paths:
    M ofproto/ofproto-provider.h
    M ofproto/ofproto.c

  Log Message:
  -----------
  ofproto: Fix resource usage explosion due to removal of large number of flows.

While removing flows, removal itself is deferred, so classifier changes
performed already from the RCU thread.  This way every deferred removal
triggers classifier change and reallocation of a pvector.  Freeing of
old version of a pvector is postponed.  Since all this is happening
from an RCU thread, all these copies of the same pvector will be freed
only after the next grace period.

Below is the example output of the 'valgrind --tool=massif' from an OVN
deployment, where copies of that pvector took 5 GB of memory while
processing a bundled flow removal:

 -------------------------------------------------------------------
   n        time(i)         total(B)   useful-heap(B) extra-heap(B)
 -------------------------------------------------------------------
  89 176,257,987,954    5,329,763,160    5,318,171,607    11,591,553
 99.78% (5,318,171,607B) (heap allocation functions) malloc/new/new[]
 ->98.45% (5,247,008,392B) xmalloc__ (util.c:137)
 |->98.17% (5,232,137,408B) pvector_impl_dup (pvector.c:48)
 ||->98.16% (5,231,472,896B) pvector_remove (pvector.c:159)
 |||->98.16% (5,231,472,800B) destroy_subtable (classifier.c:1558)
 ||||->98.16% (5,231,472,800B) classifier_remove (classifier.c:792)
 |||| ->98.16% (5,231,472,800B) classifier_remove_assert (classifier.c:832)
 ||||  ->98.16% (5,231,472,800B) remove_rule_rcu__ (ofproto.c:2978)
 ||||   ->98.16% (5,231,472,800B) remove_rule_rcu (ofproto.c:2990)
 ||||    ->98.16% (5,231,472,800B) ovsrcu_call_postponed (ovs-rcu.c:346)
 ||||     ->98.16% (5,231,472,800B) ovsrcu_postpone_thread (ovs-rcu.c:362)
 ||||      ->98.16% (5,231,472,800B) ovsthread_wrapper
 ||||       ->98.16% (5,231,472,800B) start_thread
 ||||        ->98.16% (5,231,472,800B) clone

Collecting all the flows to be removed and postponing removal for
all of them together to avoid the problem.  This way all removals
will trigger only a single pvector re-allocation greatly reducing
the CPU and memory usage.

Reported-by: Vladislav Odintsov <odivlad at gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389538.html
Tested-by: Vladislav Odintsov <odivlad at gmail.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/2a2185f9e674...d2e0632dbe03


More information about the git mailing list