[ovs-dev] [PATCH 0/4] Add support for TSO with DPDK

Ilya Maximets i.maximets at ovn.org
Wed Dec 11 19:03:03 UTC 2019


On 11.12.2019 19:35, Stokes, Ian wrote:
> 
> 
> On 12/3/2019 2:19 PM, Flavio Leitner wrote:
>> On Tue, Dec 03, 2019 at 05:26:51PM +0100, Ilya Maximets wrote:
>>> On 03.12.2019 14:12, Flavio Leitner wrote:
>>>> On Tue, Dec 03, 2019 at 01:15:23PM +0100, Ilya Maximets wrote:
>>>>> On 02.12.2019 14:44, Flavio Leitner wrote:
>>>>>> Abbreviated as TSO, TCP Segmentation Offload is a feature which enables
>>>>>> the network stack to delegate the TCP segmentation to the NIC reducing
>>>>>> the per packet CPU overhead.
>>>>>>
>>>>>> A guest using vhost-user interface with TSO enabled can send TCP packets
>>>>>> much bigger than the MTU, which saves CPU cycles normally used to break
>>>>>> the packets down to MTU size and to calculate checksums.
>>>>>>
>>>>>> It also saves CPU cycles used to parse multiple packets/headers during
>>>>>> the packet processing inside virtual switch.
>>>>>>
>>>>>> If the destination of the packet is another guest in the same host, then
>>>>>> the same big packet can be sent through a vhost-user interface skipping
>>>>>> the segmentation completely. However, if the destination is not local,
>>>>>> the NIC hardware is instructed to do the TCP segmentation and checksum
>>>>>> calculation.
>>>>>>
>>>>>> The first 3 patches are not really part of TSO support, but they are
>>>>>> required to make sure everything works.
>>>>>>
>>>>>> The TSO is only enabled in the vhost-user ports in client mode which
>>>>>> means DPDK is required. The other ports (dpdk or linux) can receive
>>>>>> those packets.
>>>>>>
>>>>>> This patchset is based on branch dpdk-latest (v19.11 required).
>>>>>>
>>>>>> The numbers look good and I noticed no regression so far. However, my
>>>>>> environment is tuned but not well fined tuned, so consider those as
>>>>>> ball park numbers only.
>>>>>>
>>>>>> This is VM-to-external host (Gbits/sec)
>>>>>>                 | No patch  |  TSO off  |  TSO on
>>>>>>                ---------------------------------------------
>>>>>> TCPv4          |    10     |    10     |   38  (line rate)
>>>>>> -----------------------------------------------------------
>>>>>> TCPv6          |     9     |     9     |   38  (line rate)
>>>>>> -----------------------------------------------------------
>>>>>> V4 Conntrack   |     1     |     1     |   33
>>>>>>
>>>>>>
>>>>>> This is VM-to-VM in the same host (Gbits/sec)
>>>>>>                 | No patch  |  TSO off  |  TSO on
>>>>>>                ---------------------------------------------
>>>>>> TCPv4          |     9.4   |    9.4    |   31
>>>>>> -----------------------------------------------------------
>>>>>> TCPv6          |     8     |    8      |   30
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> There are good improvements sending to veth pairs or tap devices too.
>>>>>>
>>>>>> Flavio Leitner (4):
>>>>>>    dp-packet: preserve headroom when cloning a pkt batch
>>>>>>    vhost: Disable multi-segmented buffers
>>>>>>    dp-packet: handle new dpdk buffer flags
>>>>>>    netdev-dpdk: Add TCP Segmentation Offload support
>>>>>>
>>>>>>   Documentation/automake.mk           |   1 +
>>>>>>   Documentation/topics/dpdk/index.rst |   1 +
>>>>>>   Documentation/topics/dpdk/tso.rst   |  83 +++++++++++++++++++++
>>>>>>   NEWS                                |   1 +
>>>>>>   lib/automake.mk                     |   2 +
>>>>>>   lib/dp-packet.c                     |  50 ++++++++++++-
>>>>>>   lib/dp-packet.h                     |  38 ++++++++--
>>>>>>   lib/netdev-bsd.c                    |   7 ++
>>>>>>   lib/netdev-dpdk.c                   | 112 +++++++++++++++++++++++-----
>>>>>>   lib/netdev-dummy.c                  |   7 ++
>>>>>>   lib/netdev-linux.c                  |  71 ++++++++++++++++--
>>>>>>   lib/tso.c                           |  54 ++++++++++++++
>>>>>>   lib/tso.h                           |  23 ++++++
>>>>>>   vswitchd/bridge.c                   |   2 +
>>>>>>   vswitchd/vswitch.xml                |  14 ++++
>>>>>>   15 files changed, 430 insertions(+), 36 deletions(-)
>>>>>>   create mode 100644 Documentation/topics/dpdk/tso.rst
>>>>>>   create mode 100644 lib/tso.c
>>>>>>   create mode 100644 lib/tso.h
>>>>>>
>>>>>
>>>>>
>>>>> Hi Flavio,
>>>>>
>>>>> Thanks for working on this.
>>>>> I didn't read the code carefully,  just a couple of first look points:
>>>>>
>>>>> * Patch-set needs to have 'dpdk-latest' in a subject prefix.
>>>>
>>>> Ok.
>>>>
>>>>> * First patch seems to be OK even without TSO, batch_clone usually
>>>>>    happens on clone action that implies further packet modifications,
>>>>>    so it might make sense to have a headroom for that.
>>>>
>>>> It's in the cover letter that the first three patches are somewhat
>>>> unrelated, but needed to make sure everything works ok.
>>>>
>>>> Also that it preserves the original headroom and not the default
>>>> headroom used in the new packets.
>>>>
>>>>> * Maybe we need to enable LINEARBUF for usual vhost-user too just to
>>>>>    avoid receiving of mutli-segment mbufs?
>>>>
>>>> That's what the patch does, right? The flag is always passed
>>>> regardless of TSO being enabled or disabled.
>>>
>>> I meant dpdkvhostuser vs dpdkvhostuserclient.
>>
>> OK, I thought dpdkvhostuser ports are deprecated since 2017 and they
>> are going to be removed (next release perhaps?).
>>
> 
> I've raised it's removal a few times, but there was always feedback that some in the community are still using it/have to support it. Maybe we can check again and see is it acceptable now.

It might be still in use for container networking with virtio-user on the
container side.  The basic case is that it's easier to mount socket file
inside container than requesting application to create socket with specific
name so the OVS will be able to find it.  The replacement could be to use
dpdk memif vdev or vhost vdev, but at least vhost vdev requires some events
handlers on the OVS side to track LSC and stuff.  So, I'm not sure.

> 
> Aside: If we do intend to remove it I think we should Flag it in the NEWS for one release that it would be removed completely by 'OVS 2.X'.
> 
>> If not, then I agree I missed a warning though in case someone tries
>> to use today.
>>
> This is OK with me if it's being kept for the moment, we could always remove it later if/when dpdkvhostuser is removed.
> 
>>>>> * Third patch is not needed.  Similar patch was already merged recently.
>>>>
>>>> Cool, but that patch wasn't in dpdk-latest, so I can't drop that yet.
>>>>
>>>>> * I don't see the call to rte_eth_tx_prepare().  It is required to prepare
>>>>>    packets for TSO for Intel NICs that needs pseudo-header checksum
>>>>>    precalculated.
>>>>
>>>> Good catch, will fix in v2.
>>>>
>>>>> * Guest is allowed to disable offloading.  In this case we're not allowed
>>>>>    to send incomplete packets (no-checksum, oversized), they will be just
>>>>>    dropped by the guest.
>>>>
>>>> Good point, will fix in v2.
>>>>
>>>>> * NICs could not support TSO.  I see that you're moving this issue to the
>>>>>    user by forcing to be sure that TSO is supported.  Most of HW NICs
>>>>>    nowadays should support TSO, so it might be not a big deal, but
>>>>>    what about some software NIC that DPDK implements?  I'm not sure.
>>>>
>>>> What kind of some interface you have in mind? Since I will need to
>>>> handle VMs with TSO disabled, then most probably the same solution
>>>> applies to those software devices.
>>>
>>> Looking at the feature support table there are even few HW NICs that
>>> doesn't support TSO: https://doc.dpdk.org/guides/nics/overview.html
>>>
>>> For example, axgbe and new hns3.
> 
> @Ilya, I understand the concern here, but as TSO would be experimental I think it would be OK to flag this to the user that ports deployed for use with TSO must support the feature (similar to what was done with Partial HWOL).
> 
> It may well be supported in the future for these (or maybe never in some cases) but documenting it should be OK I would hope initially.

Yes, sure.

> 
>>>
>>>  From virtual ports the most obvious is afxdp.
>>
>> Do you have an use-case in mind that would use TSO and not have
>> support in the NIC?

I thought about usecases a bit and it seems that they are kind of specific.
One possible case is a VM chaining where first VM re-assembles TCP frames
and passes them to VMs on the same host after that.  But, yes, we could just
document HW support as a requirement for now as it's a experimental feature.

>>
>>>>> * Don't include virtio headers in dp-packet code.  This will break non-Linux
>>>>>    builds.  In fact, construction of virtio-net header is only needed for
>>>>>    netdev-linux, so it should be implemented there.
>>>>
>>>> Isn't virtio available in non-linux envs? I had initially there, but
>>>> it breaks because it has DPDK dependencies as well.
>>>
>>> I believe that "linux/virtio_net.h" is not available outside of linux.
>>
>> Sure, I was talking about virtio structures, then it would be a
>> matter of selecting the right one. Because if we want to mess with
>> csum and offloading, most probably we will need those fields anyway
>> or translate them.
>>
>>> BTW, You could always try to check your patch-set on CirrusCI with FreeBSD
>>> or on Windows with AppVeyor.
>>
>> I will try that, thanks.
>>
>>> We'll have to move this code if someday we'll want to use DPDK on BSD
>>> or Windows.  Right now netdev-dpdk depends on Linux, but mostly because
>>> of the vhost-user support.
>>>
> 
> @ Ilya, just to clarify is the concern that we should not introduce this until DPDK is supported in OVS for Windows and BSD?

No, I just want this code to be moved to netdev-linux, i.e. to the only user.

> 
> I guess my concern here is that I'm not sure of the effort required to enable these (is there a requirement to do so?), especially in the 2.13 timeline.

There was a request from person who tried to build OVS with DPDK on FreeBSD,
but I think that it wasn't a hard requirement for something specific.
So, there is no hurry here.

About efforts, there shouldn't be much trouble for FreeBSD support.  We need
to compile out vhost-user support from netdev-dpdk as it depends on Linux and
we need to implement few things in ovs-numa module.  But I don't think that
anybody has time for that now.

> 
> As DPDK is only supported for Linux with OVS I don't thinks this should block.

I believe that '#include <linux/virtio_net.h>' will block compiling on non-Linux
systems.  vhost library in DPDK depends on Linux, so it might not make much sense
enabling TSO on other operating systems anyway.  So, moving it to netdev-linux
seems to be a better solution than guarding the code that doesn't depend on DPDK
with '#ifdef DPDK_NETDEV'.

> 
> If in time effort to support Windows and BSD come about with DPDK then sure, we should revisit it. I just haven't heard a request as of yet for this.
> 
>>>>
>>>>> * netdev-afxdp and netdev-windows are not handled by the patches.
>>>>
>>>> That's correct.
>>>>
>>>>> * I'm not sure if we need separate tso.{c,h} files.
>>>>
>>>> The tso_init() is called from vswtchd/bridge.c, so I am open to
>>>> ideas.
>>>
>>> Need to think about this.
>>>
>>>>
>>>>> * Docs and log messages are very confusing because they makes impression that
>>>>>    system datapath doesn't support TSO.  Also, why we cant' have TSO without DPDK?
>>>>>    You have a virtio header for netdev-linux, so we can use it.
>>>>
>>>> OK, the doc is under the subdirectory dpdk/, but the vswitchd.xml
>>>> and the other log messages could be more clear.
>>>>
>>>>> * Is it safe to just drop virtio header on packet receive?
>>>>
>>>> We don't support TSO from other devices yet, so we drop that and
>>>> let the usual checking drop the packets if they miss csum or are
>>>> oversized.
>>>>
>>>>> * Not sure if it fully safe to say that checksum is valid while it's not
>>>>>    calculated.  Have you checked this from the ipf and conntrack points of
>>>>>    view.
>>>>
>>>> Could you elaborate? I didn't get the question.
>>>
>>> I meant that ipf and conntrack in userspace are using the checksum_valid()
>>> functions and it might be bad if they think that checksum is correct while
>>> it's not.  Just wanted to be sure that you've checked the callers.
>>
>> For that yes, the csum is valid either if the HW RX indicates that or if
>> the TX side indicates offloading csum calculation. It trusts that while
>> the packet is within the host there is no corruption.
>>
>> But there are places where the csum is recalculated and uses the old
>> value, which should be also okay, but I haven't tested that scenario yet.
>>
> +1
>>
>>> One more point:
>>>
>>> * I see that you're claiming that TSO is not supported with tunneling, so
>>>    it might make sense to fail the encapsulation for packets with wrong checksums
>>>    if we're not going to have software TSO implementation.
>>
>> I will add a log warning if that's the case.
> 
> Probably good to document as known limitation. It's something that can be worked towards as we'd seek to remive the experimental tag.
> 
> Thanks
> Ian
>>
>>> BTW, something strange happens to your mail account:
>>> """
>>> An error occurred while sending mail. The mail server responded:
>>> 550 5.1.2 <fbl at sysclose.com>: Recipient address rejected: Domain not found.
>>> Please check the message recipient "fbl at sysclose.com" and try again.
>>> """
>>> So, I changed it back to .org one.
>>
>> Oops, I fixed here, thanks!
>>


More information about the dev mailing list