[ovs-dev] [PATCH ovs v3 0/2] Introduce dpdkvdpa netdev

Roni Bar Yanai roniba at mellanox.com
Mon Oct 28 20:24:51 UTC 2019


Hi ilya, 
please see inline

>-----Original Message-----
>From: Ilya Maximets <i.maximets at ovn.org>
>Sent: Monday, October 28, 2019 3:46 PM
>To: Noa Levy <noae at mellanox.com>; ovs-dev at openvswitch.org; Roni Bar Yanai
><roniba at mellanox.com>
>Cc: Oz Shlomo <ozsh at mellanox.com>; Majd Dibbiny <majd at mellanox.com>;
>Ameer Mahagneh <ameerm at mellanox.com>; Eli Britstein
><elibr at mellanox.com>; William Tu <u9012063 at gmail.com>; Simon Horman
><simon.horman at netronome.com>
>Subject: Re: [ovs-dev] [PATCH ovs v3 0/2] Introduce dpdkvdpa netdev
>
>On 17.10.2019 13:16, Noa Ezra wrote:
>> There are two approaches to communicate with a guest, using virtIO or
>> SR-IOV.
>> SR-IOV allows working with port representor which is attached to the
>> OVS and a matching VF is given with pass-through to the VM.
>> HW rules can process packets from up-link and direct them to the VF
>> without going through SW (OVS) and therefore SR-IOV gives the best
>> performance.
>> However, SR-IOV architecture requires that the guest will use a driver
>> which is specific to the underlying HW. Specific HW driver has two
>> main
>> drawbacks:
>> 1. Breaks virtualization in some sense (VM aware of the HW), can also
>>     limit the type of images supported.
>> 2. Less natural support for live migration.
>>
>> Using virtIO interface solves both problems, but reduces performance
>> and causes losing of some functionality, for example, for some HW
>> offload, working directly with virtIO cannot be supported.
>> In order to solve this conflict, we created a new netdev type-dpdkvdpa.
>> The new netdev is basically similar to a regular dpdk netdev, but it
>> has some additional functionality for transferring packets from virtIO
>> guest (VM) to a VF and vice versa. With this solution we can benefit
>> both SR-IOV and virtIO.
>> vDPA netdev is designed to support both SW and HW use-cases.
>> HW mode will be used to configure vDPA capable devices. The support
>> for this mode is on progress in the dpdk community.
>> SW acceleration is used to leverage SR-IOV offloads to virtIO guests
>> by relaying packets between VF and virtio devices and as a pre-step
>> for supporting vDPA in HW mode.
>>
>> Running example:
>> 1. Configure OVS bridge and ports:
>> ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev
>> ovs-vsctl add-port br0-ovs pf -- set Interface pf type=dpdk options: \
>>          dpdk-devargs=<pf pci id>
>> ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \
>>          options:vdpa-socket-path=<sock path> \
>>          options:vdpa-accelerator-devargs=<vf pci id> \
>>          options:dpdk-devargs=<pf pci id>,representor=[id] 2. Run a
>> virtIO guest (VM) in server mode that creates the socket of
>>     the vDPA port.
>> 3. Send traffic.
>>
>> Noa Ezra (2):
>>    netdev-dpdk-vdpa: Introduce dpdkvdpa netdev
>>    netdev-dpdk: Add dpdkvdpa port
>>
>>   Documentation/automake.mk           |   1 +
>>   Documentation/topics/dpdk/index.rst |   1 +
>>   Documentation/topics/dpdk/vdpa.rst  |  90 +++++
>>   NEWS                                |   1 +
>>   lib/automake.mk                     |   4 +-
>>   lib/netdev-dpdk-vdpa.c              | 750
>++++++++++++++++++++++++++++++++++++
>>   lib/netdev-dpdk-vdpa.h              |  54 +++
>>   lib/netdev-dpdk.c                   | 162 ++++++++
>>   vswitchd/vswitch.xml                |  25 ++
>>   9 files changed, 1087 insertions(+), 1 deletion(-)
>>   create mode 100644 Documentation/topics/dpdk/vdpa.rst
>>   create mode 100755 lib/netdev-dpdk-vdpa.c
>>   create mode 100644 lib/netdev-dpdk-vdpa.h
>>
>
>Hi, everyone.
>
>So, I have a few questions (mostly to Roni?):
>
>1. What happened to idea of implementing this as a DPDK vdev?
We wanted to solve both OVS-kernel and OVS-DPDK issue.
The main argument against it was that we allow to define ports that are not
 OF ports on the switch. regardless of how useful we think this feature can be
 it breaks something fundamental, and looking back I agree.  
The vdev, was a workaround, but after speaking with some of our DPDK
experts, they had similar arguments, this time for DPDK world. You create 
a port that never gets packet and never sends packets. This totally
doesn't make sense even if it can be useful in some cases. When we remove
the kernel form the equation, we are left with vDPA.
In fact we followed your suggestion of having SW vDPA and HW vDPA. 
This makes total sense, for example open stack can configure vDPA and 
let the OVS probe the device and go with HW or SW, leaving same functionality
and same configuration.

>
>2. What was the results of "direct output optimization" patch [1] testing?
>    At this point I also have to mention that OVS is going to have TSO support
>    in one of the next releases (at least we have some progress on that, thanks
>    to idea of using extbuf).  This way in combine with patch [1] there should
>    be no any benefits from having separate netdev for forwarding purposes.
>    Even without this patch, havin TSO alone should provide good performance
>    benefits making it considerable to not have any extra netdev.

>
>    [1]
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork
>.ozlabs.org%2Fpatch%2F1105880%2F&amp;data=02%7C01%7Croniba%40mellanox
>.com%7C96d4cabbfe1f436e334108d75bad35b5%7Ca652971c7d2e4d9ba6a4d14925
>6f461b%7C0%7C0%7C637078671813245829&amp;sdata=7F1YgO4fgyr5vvPKIJquQq
>ocKH08O4%2FAjn%2FNfVe4fPU%3D&amp;reserved=0
Unfortunately the improvement was minor, few percentage. The performance difference
Is dramatic between direct and vDPA SW. I think that also as a system vDPA is much easier
to follow. You configure one port, instead of having 3 ports and also direct rule.
We really hope TSO will be supported soon, and it can be a solution (with significant less 
performance) for kernel-OVS using direct, maybe It can have further optimizations.
>
>
>3. Regarding implementation, I expected less code duplication. It's unclear
>    why you're not reusing hotplug capabilities of exiting dpdk netdev.
>    I expected that this new netdev will call netdev_open() like a 3 times
>    with a few tweaks for TSO enabling or stripping/adding some options.
>
I'll let Noa answer here.
>4. This code seem doesn't make sense in sw mode without full HW offloading.
>    i.e. it doesn't make sense to use this netdev while VF representor is
>    attached to userspace datapath.  This is just because, according to the
>    solution architecture, all the traffic will be copied firstly from the
>    VM to VF (parsed on the fly to enable TSO), and will immediately appear
>    on VF representor to be handled by OVS (same parsing with partial offload
>    + actions execution).  I'm completely unsure if it any faster than just
>    sending packet from VM to VF withoout bypassing OVS processing.  Maybe
>    even slower and like 3 times heavier for PCI bandwidth.
>
Agree, but internally we already have full hardware offload of vxlan, and for
Vxlan + connection tracking. We are planning to start submitting vxlan full 
offload first, and as far as I know it is planned in the next few weeks. We are
doing a lot of effort pushing it ASAP.
I think we better start reviewing vDPA it since it takes time and commits are not 
conflicting. vDPA and full offload are two features that adds up to full solution, and
no feature can work as a standalone (unless you willing to use sr-iov for guest).
Of course you won't use vDPA if you don't have full hardware offload, it will be slower.
>5. "HW mode will be used to configure vDPA capable devices. The support
>     for this mode is on progress in the dpdk community."
>    AFAIU, host side of vDPA support is done already for some time in DPDK.
>    API is prepared and there is a vhost-vdpa example that allows you to
>    check the functionality.  As I understand, the missing part in dpdk right
>    now is a DPDK virtio driver for guest, but this is not a limitation for
>    not implementing vDPA support from the host side.  Am I missing something?
Why this is not a limitation? How could it be used without it?
Anyway, I'll check again internally, if we are ready. 
>
>Best regards, Ilya Maximets.


More information about the dev mailing list