[ovs-dev] [PATCH 0/4] Add support for TSO with DPDK

Ilya Maximets i.maximets at ovn.org
Tue Dec 3 16:26:51 UTC 2019


On 03.12.2019 14:12, Flavio Leitner wrote:
> On Tue, Dec 03, 2019 at 01:15:23PM +0100, Ilya Maximets wrote:
>> On 02.12.2019 14:44, Flavio Leitner wrote:
>>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
>>> the network stack to delegate the TCP segmentation to the NIC reducing
>>> the per packet CPU overhead.
>>>
>>> A guest using vhost-user interface with TSO enabled can send TCP packets
>>> much bigger than the MTU, which saves CPU cycles normally used to break
>>> the packets down to MTU size and to calculate checksums.
>>>
>>> It also saves CPU cycles used to parse multiple packets/headers during
>>> the packet processing inside virtual switch.
>>>
>>> If the destination of the packet is another guest in the same host, then
>>> the same big packet can be sent through a vhost-user interface skipping
>>> the segmentation completely. However, if the destination is not local,
>>> the NIC hardware is instructed to do the TCP segmentation and checksum
>>> calculation.
>>>
>>> The first 3 patches are not really part of TSO support, but they are
>>> required to make sure everything works.
>>>
>>> The TSO is only enabled in the vhost-user ports in client mode which
>>> means DPDK is required. The other ports (dpdk or linux) can receive
>>> those packets.
>>>
>>> This patchset is based on branch dpdk-latest (v19.11 required).
>>>
>>> The numbers look good and I noticed no regression so far. However, my
>>> environment is tuned but not well fined tuned, so consider those as
>>> ball park numbers only.
>>>
>>> This is VM-to-external host (Gbits/sec)
>>>                | No patch  |  TSO off  |  TSO on
>>>               ---------------------------------------------
>>> TCPv4          |    10     |    10     |   38  (line rate)
>>> -----------------------------------------------------------
>>> TCPv6          |     9     |     9     |   38  (line rate)
>>> -----------------------------------------------------------
>>> V4 Conntrack   |     1     |     1     |   33
>>>
>>>
>>> This is VM-to-VM in the same host (Gbits/sec)
>>>                | No patch  |  TSO off  |  TSO on
>>>               ---------------------------------------------
>>> TCPv4          |     9.4   |    9.4    |   31
>>> -----------------------------------------------------------
>>> TCPv6          |     8     |    8      |   30
>>> -----------------------------------------------------------
>>>
>>> There are good improvements sending to veth pairs or tap devices too.
>>>
>>> Flavio Leitner (4):
>>>   dp-packet: preserve headroom when cloning a pkt batch
>>>   vhost: Disable multi-segmented buffers
>>>   dp-packet: handle new dpdk buffer flags
>>>   netdev-dpdk: Add TCP Segmentation Offload support
>>>
>>>  Documentation/automake.mk           |   1 +
>>>  Documentation/topics/dpdk/index.rst |   1 +
>>>  Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>>>  NEWS                                |   1 +
>>>  lib/automake.mk                     |   2 +
>>>  lib/dp-packet.c                     |  50 ++++++++++++-
>>>  lib/dp-packet.h                     |  38 ++++++++--
>>>  lib/netdev-bsd.c                    |   7 ++
>>>  lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>>>  lib/netdev-dummy.c                  |   7 ++
>>>  lib/netdev-linux.c                  |  71 ++++++++++++++++--
>>>  lib/tso.c                           |  54 ++++++++++++++
>>>  lib/tso.h                           |  23 ++++++
>>>  vswitchd/bridge.c                   |   2 +
>>>  vswitchd/vswitch.xml                |  14 ++++
>>>  15 files changed, 430 insertions(+), 36 deletions(-)
>>>  create mode 100644 Documentation/topics/dpdk/tso.rst
>>>  create mode 100644 lib/tso.c
>>>  create mode 100644 lib/tso.h
>>>
>>
>>
>> Hi Flavio,
>>
>> Thanks for working on this.
>> I didn't read the code carefully,  just a couple of first look points:
>>
>> * Patch-set needs to have 'dpdk-latest' in a subject prefix.
> 
> Ok.
> 
>> * First patch seems to be OK even without TSO, batch_clone usually
>>   happens on clone action that implies further packet modifications,
>>   so it might make sense to have a headroom for that.
> 
> It's in the cover letter that the first three patches are somewhat
> unrelated, but needed to make sure everything works ok.
> 
> Also that it preserves the original headroom and not the default
> headroom used in the new packets.
> 
>> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
>>   avoid receiving of mutli-segment mbufs?
> 
> That's what the patch does, right? The flag is always passed
> regardless of TSO being enabled or disabled.

I meant dpdkvhostuser vs dpdkvhostuserclient.

> 
>> * Third patch is not needed.  Similar patch was already merged recently.
> 
> Cool, but that patch wasn't in dpdk-latest, so I can't drop that yet.
> 
>> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
>>   packets for TSO for Intel NICs that needs pseudo-header checksum
>>   precalculated.
> 
> Good catch, will fix in v2.
> 
>> * Guest is allowed to disable offloading.  In this case we're not allowed
>>   to send incomplete packets (no-checksum, oversized), they will be just
>>   dropped by the guest.
> 
> Good point, will fix in v2.
> 
>> * NICs could not support TSO.  I see that you're moving this issue to the
>>   user by forcing to be sure that TSO is supported.  Most of HW NICs
>>   nowadays should support TSO, so it might be not a big deal, but
>>   what about some software NIC that DPDK implements?  I'm not sure.
> 
> What kind of some interface you have in mind? Since I will need to
> handle VMs with TSO disabled, then most probably the same solution
> applies to those software devices.

Looking at the feature support table there are even few HW NICs that
doesn't support TSO: https://doc.dpdk.org/guides/nics/overview.html

For example, axgbe and new hns3.

>From virtual ports the most obvious is afxdp.

> 
>> * Don't include virtio headers in dp-packet code.  This will break non-Linux
>>   builds.  In fact, construction of virtio-net header is only needed for
>>   netdev-linux, so it should be implemented there.
> 
> Isn't virtio available in non-linux envs? I had initially there, but
> it breaks because it has DPDK dependencies as well.

I believe that "linux/virtio_net.h" is not available outside of linux.
BTW, You could always try to check your patch-set on CirrusCI with FreeBSD
or on Windows with AppVeyor.

We'll have to move this code if someday we'll want to use DPDK on BSD
or Windows.  Right now netdev-dpdk depends on Linux, but mostly because
of the vhost-user support.

> 
>> * netdev-afxdp and netdev-windows are not handled by the patches.
> 
> That's correct.
> 
>> * I'm not sure if we need separate tso.{c,h} files.
> 
> The tso_init() is called from vswtchd/bridge.c, so I am open to
> ideas.

Need to think about this.

> 
>> * Docs and log messages are very confusing because they makes impression that
>>   system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
>>   You have a virtio header for netdev-linux, so we can use it.
> 
> OK, the doc is under the subdirectory dpdk/, but the vswitchd.xml
> and the other log messages could be more clear.
> 
>> * Is it safe to just drop virtio header on packet receive?
> 
> We don't support TSO from other devices yet, so we drop that and
> let the usual checking drop the packets if they miss csum or are
> oversized.
> 
>> * Not sure if it fully safe to say that checksum is valid while it's not
>>   calculated.  Have you checked this from the ipf and conntrack points of
>>   view.
> 
> Could you elaborate? I didn't get the question.

I meant that ipf and conntrack in userspace are using the checksum_valid()
functions and it might be bad if they think that checksum is correct while
it's not.  Just wanted to be sure that you've checked the callers.

One more point:

* I see that you're claiming that TSO is not supported with tunneling, so
  it might make sense to fail the encapsulation for packets with wrong checksums
  if we're not going to have software TSO implementation.


BTW, something strange happens to your mail account:
"""
An error occurred while sending mail. The mail server responded:
550 5.1.2 <fbl at sysclose.com>: Recipient address rejected: Domain not found.
Please check the message recipient "fbl at sysclose.com" and try again.
"""
So, I changed it back to .org one.

Best regards, Ilya Maximets.


More information about the dev mailing list