[ovs-dev] [PATCH v8 00/13] Support multi-segment mbufs

Lam, Tiago tiago.lam at intel.com
Fri Jun 22 19:02:09 UTC 2018

Hi Eelco,

On 18/06/2018 12:18, Eelco Chaudron wrote:
> On 11 Jun 2018, at 18:21, Tiago Lam wrote:


>> Performance notes
>> =================
>> In order to test for regressions in performance, tests were run on top
>> of master 88125d6 and v8 of this patchset, both with the multi-segment
>> mbufs option enabled and disabled.
>> VSperf was used to run the phy2phy_cont and pvp_cont tests with 
>> varying
>> packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.
>> Test | Size | Master | Multi-seg disabled | Multi-seg enabled
>> -------------------------------------------------------------
>> p2p  |  64  | ~22.7  |      ~22.65        |       ~18.3
>> p2p  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
>> p2p  | 7000 | ~0.36  |       ~0.36        |       ~0.36
>> pvp  |  64  |  ~6.7  |        ~6.7        |        ~6.3
>> pvp  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
>> pvp  | 7000 | ~0.36  |       ~0.36        |       ~0.36
>> Packet size is in bytes, while all packet rates are reported in mpps
>> (aggregated).
>> No noticeable regression has been observed (certainly everything is
>> within the ± 5% margin of existing performance), aside from the 64B
>> packet size case when multi-segment mbuf is enabled. This is
>> expected, however, because of how Tx vectoriszed functions are
>> incompatible with multi-segment mbufs on some PMDs. The PMD under
>> use during these tests was the i40e (on a Intel X710 NIC), which
>> indeed doesn't support vectorized Tx functions with multi-segment
>> mbufs.
> I did some testing using my PVP framework, 
> https://github.com/chaudron/ovs_perf, and I see the same as above. I 
> also used an XL710 for these tests.

Thanks for the review! And for confirming the results, that gives some

> I reviewed the complete patch-set and will reply on the individual 
> patches.
> //Eelco
I thought it would be worth giving a more generic explanation here since
some of the comments are around the same point.

This series takes the approach of not allocating new mbufs in
`DPBUF_DPDK` dp_packets. This is to avoid the case where a
dp_packet_put() call, for example, allocates two mbufs to create enough
space, and returns a pointer to the data which isn't contiguous in
memory (because it is spread across two mbufs). Most of the code in OvS
that uses the dp_packet API relies on the assumption that memory
returned is contiguous in memory.

To enforce this, dp_packet_resize__() doesn't allocate any new mbufs,
and only tries to make enough room from the space it already has
available (both in the tailroom and headroom). dp_packet_shift() also
doesn't allow the possibility for the head to move past the first mbuf
(when shifting right), or the tail to move past the last mbuf (if
shifting left).

This is also why some assumptions can be made in the implementation,
such that the tailroom of a dp_packet will be the tailroom available in
the last mbuf, since it shouldn't be possible to have a whole free mbuf
after the tail.

This is opposed to a `DPBUF_MALLOC` packet, which when `DPDK_NETDEV` is
defined, an mbuf's allocated memory comes from the system. In many cases
though, this is the type of packet used internally by OvS when doing
manipulations, since any clone of the received `DPBUF_DPDK` dp_packet
will be allocated in a `DPBUF_MALLOC` (done by dpdk_do_tx_copy()).

I'll reply to the other emails individually and refer to this
explanation when useful.


More information about the dev mailing list