[ovs-dev] [PATCH v8 00/13] Support multi-segment mbufs

Eelco Chaudron echaudro at redhat.com
Tue Jun 26 13:23:22 UTC 2018



On 22 Jun 2018, at 21:02, Lam, Tiago wrote:

> Hi Eelco,
>
> On 18/06/2018 12:18, Eelco Chaudron wrote:
>>
>>
>> On 11 Jun 2018, at 18:21, Tiago Lam wrote:
>>
>
> [snip]
>
>>> Performance notes
>>> =================
>>> In order to test for regressions in performance, tests were run on 
>>> top
>>> of master 88125d6 and v8 of this patchset, both with the 
>>> multi-segment
>>> mbufs option enabled and disabled.
>>>
>>> VSperf was used to run the phy2phy_cont and pvp_cont tests with
>>> varying
>>> packet sizes of 64B, 1500B and 7000B, on a 10Gbps interface.
>>>
>>> Test | Size | Master | Multi-seg disabled | Multi-seg enabled
>>> -------------------------------------------------------------
>>> p2p  |  64  | ~22.7  |      ~22.65        |       ~18.3
>>> p2p  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
>>> p2p  | 7000 | ~0.36  |       ~0.36        |       ~0.36
>>> pvp  |  64  |  ~6.7  |        ~6.7        |        ~6.3
>>> pvp  | 1500 |  ~1.6  |        ~1.6        |        ~1.6
>>> pvp  | 7000 | ~0.36  |       ~0.36        |       ~0.36
>>>
>>> Packet size is in bytes, while all packet rates are reported in mpps
>>> (aggregated).
>>>
>>> No noticeable regression has been observed (certainly everything is
>>> within the ± 5% margin of existing performance), aside from the 64B
>>> packet size case when multi-segment mbuf is enabled. This is
>>> expected, however, because of how Tx vectoriszed functions are
>>> incompatible with multi-segment mbufs on some PMDs. The PMD under
>>> use during these tests was the i40e (on a Intel X710 NIC), which
>>> indeed doesn't support vectorized Tx functions with multi-segment
>>> mbufs.
>>
>> I did some testing using my PVP framework,
>> https://github.com/chaudron/ovs_perf, and I see the same as above. I
>> also used an XL710 for these tests.
>
> Thanks for the review! And for confirming the results, that gives some
> assurance.
>
>>
>> I reviewed the complete patch-set and will reply on the individual
>> patches.
>>
>> //Eelco
>>
>>
> I thought it would be worth giving a more generic explanation here 
> since
> some of the comments are around the same point.
>
> This series takes the approach of not allocating new mbufs in
> `DPBUF_DPDK` dp_packets. This is to avoid the case where a
> dp_packet_put() call, for example, allocates two mbufs to create 
> enough
> space, and returns a pointer to the data which isn't contiguous in
> memory (because it is spread across two mbufs). Most of the code in 
> OvS
> that uses the dp_packet API relies on the assumption that memory
> returned is contiguous in memory.
>
> To enforce this, dp_packet_resize__() doesn't allocate any new mbufs,
> and only tries to make enough room from the space it already has
> available (both in the tailroom and headroom). dp_packet_shift() also
> doesn't allow the possibility for the head to move past the first mbuf
> (when shifting right), or the tail to move past the last mbuf (if
> shifting left).
>
> This is also why some assumptions can be made in the implementation,
> such that the tailroom of a dp_packet will be the tailroom available 
> in
> the last mbuf, since it shouldn't be possible to have a whole free 
> mbuf
> after the tail.

Thanks for explaining the details, maybe its good to explain some of 
these details in dp_packet_resize__().

However, for the general function, I think you should not assume that 
tailroom is only available in the last mbuf. I think you should stick as 
much as possible to the existing mbuf API’s to avoid problems in the 
future if people decide to do add “full” mbuf support. Most of these 
shortcuts do not give any real performance advantages.

> This is opposed to a `DPBUF_MALLOC` packet, which when `DPDK_NETDEV` 
> is
> defined, an mbuf's allocated memory comes from the system. In many 
> cases
> though, this is the type of packet used internally by OvS when doing
> manipulations, since any clone of the received `DPBUF_DPDK` dp_packet
> will be allocated in a `DPBUF_MALLOC` (done by dpdk_do_tx_copy()).
>
> I'll reply to the other emails individually and refer to this
> explanation when useful.

Thanks, I’ve replied to all the comments in the individual mails.

I’ll allocate some time to review your new revision once it’s 
available.

Cheers,

Eelco


More information about the dev mailing list