[ovs-dev] [PATCH v8 00/13] Support multi-segment mbufs

Eelco Chaudron echaudro at redhat.com
Mon Jun 18 11:18:01 UTC 2018



On 11 Jun 2018, at 18:21, Tiago Lam wrote:

> Overview
> ========
> This patchset introduces support for multi-segment mbufs to OvS-DPDK.
> Multi-segment mbufs are typically used when the size of an mbuf is
> insufficient to contain the entirety of a packet's data. Instead, the
> data is split across numerous mbufs, each carrying a portion, or
> 'segment', of the packet data. mbufs are chained via their 'next'
> attribute (an mbuf pointer).
>
> Use Cases
> =========
> i.  Handling oversized (guest-originated) frames, which are marked
>     for hardware accelration/offload (TSO, for example).
>
>     Packets which originate from a non-DPDK source may be marked for
>     offload; as such, they may be larger than the permitted ingress
>     interface's MTU, and may be stored in an oversized dp-packet. In
>     order to transmit such packets over a DPDK port, their contents
>     must be copied to a DPDK mbuf (via dpdk_do_tx_copy). However, in
>     its current implementation, that function only copies data into
>     a single mbuf; if the space available in the mbuf is exhausted,
>     but not all packet data has been copied, then it is lost.
>     Similarly, when cloning a DPDK mbuf, it must be considered
>     whether that mbuf contains multiple segments. Both issues are
>     resolved within this patchset.
>
> ii. Handling jumbo frames.
>
>     While OvS already supports jumbo frames, it does so by increasing
>     mbuf size, such that the entirety of a jumbo frame may be handled
>     in a single mbuf. This is certainly the preferred, and most
>     performant approach (and remains the default). However, it places
>     high demands on system memory; multi-segment mbufs may be
>     prefereable for systems which are memory-constrained.
>
> Enabling multi-segment mbufs
> ============================
> Multi-segment and single-segment mbufs are mutually exclusive, and the
> user must decide on which approach to adopt on init. The introduction
> of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this.
>
> This is a global boolean value, which determines how jumbo frames are
> represented across all DPDK ports. In the absence of a user-supplied
> value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
> mbufs must be explicitly enabled / single-segment mbufs remain the
> default.
>
> Setting the field is identical to setting existing DPDK-specific OVSDB
> fields:
>
>     ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
>     ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
>     ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
> ==> ovs-vsctl set Open_vSwitch . 
> other_config:dpdk-multi-seg-mbufs=true
>
> Performance notes
> =================
> In order to test for regressions in performance, tests were run on top
> of master 88125d6 and v8 of this patchset, both with the multi-segment
> mbufs option enabled and disabled.
>
> VSperf was used to run the phy2phy_cont and pvp_cont tests with 
> varying
> packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.
>
> Test | Size | Master | Multi-seg disabled | Multi-seg enabled
> -------------------------------------------------------------
> p2p  |  64  | ~22.7  |      ~22.65        |       ~18.3
> p2p  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
> p2p  | 7000 | ~0.36  |       ~0.36        |       ~0.36
> pvp  |  64  |  ~6.7  |        ~6.7        |        ~6.3
> pvp  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
> pvp  | 7000 | ~0.36  |       ~0.36        |       ~0.36
>
> Packet size is in bytes, while all packet rates are reported in mpps
> (aggregated).
>
> No noticeable regression has been observed (certainly everything is
> within the ± 5% margin of existing performance), aside from the 64B
> packet size case when multi-segment mbuf is enabled. This is
> expected, however, because of how Tx vectoriszed functions are
> incompatible with multi-segment mbufs on some PMDs. The PMD under
> use during these tests was the i40e (on a Intel X710 NIC), which
> indeed doesn't support vectorized Tx functions with multi-segment
> mbufs.

I did some testing using my PVP framework, 
https://github.com/chaudron/ovs_perf, and I see the same as above. I 
also used an XL710 for these tests.

I reviewed the complete patch-set and will reply on the individual 
patches.

//Eelco


>
> ---
> v8 (non-RFC):
>     - Rebase on master 88125d6 ("rhel: remove ovs-sim man page from
>       temporary directory (also for RHEL)");
>     - Address Ciara's and Ilya's comments:
>         - Drop the dp_packet_mbuf_tail() function and use only the
>           already existing dp_packet_tail();
>         - Fix bug in dpdk_do_tx_copy() where the error return from
>           dpdk_prep_tx_buf() was being wrongly checked;
>         - Use dpdk_buf_alloc() and free_dpdk_buf() instead of
>           rte_pktmbuf_alloc() and rte_pktmbuf_free();
>         - Fix some other code style and duplication issues pointed 
> out.
>     - Refactored dp_packet_shift(), dp_packet_resize__() and
>       dp_packet_put*() functions to work within the bounds of existing
>       mbufs only;
>     - Fix dp_packet_clear() which wasn't correctly clearing / freeing
>       other mbufs in the chain for chains with more than a single 
> mbuf;
>     - dp_packet layer functions (such as dp_packet_l3()) now check if
>       the header is within the first mbuf, when using mbufs;
>     - Move patch 08/13 to before patch 04/13, since 
> dp_packet_set_size()
>       was refactored to use free_dpdk_buf();
>     - Fix wrong rte_memcpy() when performing dp_packet_clone() which 
> was
>       leading to memory corruption;
>     - Modified the added tests to account for some of the above 
> changes;
>     - Run performance tests, compiling results and adding them to the
>       cover letter;
>     - Add a multi-seg mbufs explanation to the jumbo-frames.rst doc,
>       together with a "Performance notes" sub-section reflecting the
>       findings mentioned above in the cover letter.
>
> v7:  - Rebase on master 5e720da ("erspan: fix invalid erspan 
> version.");
>      - Address Ilya comments;
>         - Fix non-DPDK build;
>         - Serialise the access of non pmds to allocation and free of 
> mbufs by
>           using a newly introduced mutex.
>      - Add a new set of tests that integrates with the recently added 
> DPDK
>        testsuite. These focus on allocating dp_packets, with a single 
> or
>        multiple mbufs, from an instantiated mempool and performing 
> several
>        operations on those, verifying if the data at the end matches 
> what's
>        expected;
>      - Fix bugs found by the new tests:
>         - dp_packet_shift() wasn't taking into account shift lefts;
>         - dp_packet_resize__() was misusing and miscalculating the 
> tailrooms
>           and headrooms, ending up calculating the wrong number of 
> segments
>           that needed allocation;
>         - An mbuf's end was being miscalculated both in 
> dp_packet_tail,
>           dp_packet_mbuf_tail() and dp_packet_end();
>         - dp_packet_set_size() was not updating the number of chained 
> segments
>           'nb_segs';
>      - Add support for multi-seg mbufs in dp_packet_clear().
>
> v6:  - Rebase on master 7c0cb29 ("conntrack-tcp: Handle tcp session
>        reuse.");
>      - Further improve dp_packet_put_uninit() and dp_packet_shift() to
>        support multi-seg mbufs;
>      - Add support for multi-seg mbufs in dp_packet_l4_size() and
>        improve other helper funcs, such as dp_packet_tail() and dp_
>        packet_tailroom().
>      - Add support for multi-seg mbufs in dp_packet_put(), dp_packet_
>        put_zeros(), as well as dp_packet_resize__() - allocating new
>        mbufs and linking them together;
>      Restructured patchset:
>      - Squash patch 5 into patch 6, since they were both related to
>        copying data while handling multi-seg mbufs;
>      - Split patch 4 into two separate patches - one that introduces 
> the
>        changes in helper functions to deal with multi-seg mbufs and
>        two others for the shift() and put_uninit() functionality;
>      - Move patch 4 to before patch 3, so that ihelper functions come
>        before functionality improvement that rely on those helpers.
>
> v5: - Rebased on master e5e22dc ("datapath-windows: Prevent 
> ct-counters
>       from getting redundantly incremented");
>     - Sugesh's comments have been addressed:
>       - Changed dp_packet_set_data() and dp_packet_set_size() logic to
>         make them independent of each other;
>       - Dropped patch 3 now that dp_packet_set_data() and 
> dp_packet_set_
>         size() are independent;
>       - dp_packet_clone_with_headroom() now has split functions for
>         handling DPDK sourced packets and non-DPDK packets;
>     - Modified various functions in dp-packet.h to account for 
> multi-seg
>       mbufs - dp_packet_put_uninit(), dp_packet_tail(), 
> dp_packet_tail()
>       and dp_packet_at();
>     - Added support for shifting packet data in multi-seg mbufs, using
>       dp_packet_shift();
>     - Fixed some minor inconsistencies.
>
>     Note that some of the changes in v5 have been contributed by Mark
>     Kavanagh as well.
>
> v4: - restructure patchset
>     - account for 128B ARM cacheline when sizing mbufs
>
> Mark Kavanagh (4):
>   netdev-dpdk: fix mbuf sizing
>   dp-packet: Init specific mbuf fields.
>   netdev-dpdk: copy large packet to multi-seg. mbufs
>   netdev-dpdk: support multi-segment jumbo frames
>
> Michael Qiu (1):
>   dp-packet: copy data from multi-seg. DPDK mbuf
>
> Tiago Lam (8):
>   dp-packet: Fix allocated size on DPDK init.
>   netdev-dpdk: Serialise non-pmds mbufs' alloc/free.
>   dp-packet: Fix data_len handling multi-seg mbufs.
>   dp-packet: Handle multi-seg mbufs in helper funcs.
>   dp-packet: Handle multi-seg mbufs in put_uninit().
>   dp-packet: Handle multi-seg mubfs in shift() func.
>   dp-packet: Handle multi-seg mbufs in resize__().
>   dpdk-tests: Add test coverage for multi-seg mbufs.
>
>  Documentation/topics/dpdk/jumbo-frames.rst |  41 +++
>  NEWS                                       |   1 +
>  lib/dp-packet.c                            | 243 +++++++++++++-
>  lib/dp-packet.h                            | 319 ++++++++++++++----
>  lib/dpdk.c                                 |   8 +
>  lib/netdev-dpdk.c                          | 213 ++++++++++--
>  lib/netdev-dpdk.h                          |   2 +
>  tests/automake.mk                          |  10 +-
>  tests/dpdk-packet-mbufs.at                 |   7 +
>  tests/system-dpdk-testsuite.at             |   1 +
>  tests/test-dpdk-mbufs.c                    | 518 
> +++++++++++++++++++++++++++++
>  vswitchd/vswitch.xml                       |  20 ++
>  12 files changed, 1273 insertions(+), 110 deletions(-)
>  create mode 100644 tests/dpdk-packet-mbufs.at
>  create mode 100644 tests/test-dpdk-mbufs.c
>
> -- 
> 2.7.4


More information about the dev mailing list