[ovs-dev] 答复: 答复: [PATCH V3 2/4] Add GSO support for DPDK data path

Ilya Maximets i.maximets at ovn.org
Tue Feb 9 10:51:46 UTC 2021


On 2/8/21 8:54 PM, William Tu wrote:
> On Mon, Feb 8, 2021 at 6:36 AM Ilya Maximets <i.maximets at ovn.org> wrote:
>>
>> On 2/8/21 1:56 AM, Yi Yang (杨燚)-云服务集团 wrote:
>>> Yes, GSO  is ok, but GRO may be have that issue, I didn't see that issue in my openstack environment, so maybe it will be great if we can have a test case to trigger that issue.
>>>
>>> -----邮件原件-----
>>> 发件人: William Tu [mailto:u9012063 at gmail.com]
>>> 发送时间: 2021年2月7日 23:46
>>> 收件人: Yi Yang (杨燚)-云服务集团 <yangyi01 at inspur.com>
>>> 抄送: i.maximets at ovn.org; yang_y_yi at 163.com; ovs-dev at openvswitch.org; fbl at sysclose.org
>>> 主题: Re: [ovs-dev] 答复: [PATCH V3 2/4] Add GSO support for DPDK data path
>>>
>>> On Tue, Oct 27, 2020 at 6:02 PM Yi Yang (杨燚)-云服务集团 <yangyi01 at inspur.com> wrote:
>>>>
>>>> -----邮件原件-----
>>>> 发件人: dev [mailto:ovs-dev-bounces at openvswitch.org] 代表 Ilya Maximets
>>>> 发送时间: 2020年10月27日 21:03
>>>> 收件人: yang_y_yi at 163.com; ovs-dev at openvswitch.org
>>>> 抄送: fbl at sysclose.org; i.maximets at ovn.org
>>>> 主题: Re: [ovs-dev] [PATCH V3 2/4] Add GSO support for DPDK data path
>>>>
>>>> On 8/7/20 12:56 PM, yang_y_yi at 163.com wrote:
>>>>> From: Yi Yang <yangyi01 at inspur.com>
>>>>>
>>>>> GSO(Generic Segment Offload) can segment large UDP  and TCP packet
>>>>> to small packets per MTU of destination , especially for the case
>>>>> that physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO,
>>>>> GSO can make sure userspace TSO can still work but not drop.
>>>>>
>>>>> In addition, GSO can help improve UDP performane when UFO is enabled
>>>>> in VM.
>>>>>
>>>>> GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx
>>>>> function of physical NIC.
>>>>>
>>>>> Signed-off-by: Yi Yang <yangyi01 at inspur.com>
>>>>> ---
>>>>>  lib/dp-packet.h    |  21 +++-
>>>>>  lib/netdev-dpdk.c  | 358
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++----
>>>>>  lib/netdev-linux.c |  17 ++-
>>>>>  lib/netdev.c       |  67 +++++++---
>>>>>  4 files changed, 417 insertions(+), 46 deletions(-)
>>>
>>> snip
>>>
>>>>>
>>>>> @@ -2339,24 +2428,19 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
>>>>>      return cnt;
>>>>>  }
>>>>>
>>>>> -/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes
>>>>> ownership of
>>>>> - * 'pkts', even in case of failure.
>>>>> - *
>>>>> - * Returns the number of packets that weren't transmitted. */
>>>>> static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
>>>>> -                         struct rte_mbuf **pkts, int cnt)
>>>>> +__netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
>>>>> +                           struct rte_mbuf **pkts, int cnt)
>>>>>  {
>>>>>      uint32_t nb_tx = 0;
>>>>> -    uint16_t nb_tx_prep = cnt;
>>>>> +    uint32_t nb_tx_prep;
>>>>>
>>>>> -    if (userspace_tso_enabled()) {
>>>>> -        nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>>>> -        if (nb_tx_prep != cnt) {
>>>>> -            VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. "
>>>>> -                         "Only %u/%u are valid: %s", dev->up.name, nb_tx_prep,
>>>>> -                         cnt, rte_strerror(rte_errno));
>>>>> -        }
>>>>> +    nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
>>>>> +    if (nb_tx_prep != cnt) {
>>>>> +        VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. "
>>>>> +                          "Only %u/%u are valid: %s",
>>>>> +                     dev->up.name, nb_tx_prep,
>>>>> +                     cnt, rte_strerror(rte_errno));
>>>>>      }
>>>>>
>>>>>      while (nb_tx != nb_tx_prep) {
>>>>> @@ -2384,6 +2468,200 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
>>>>>      return cnt - nb_tx;
>>>>>  }
>>>>>
>>>>> +static inline void
>>>>> +set_multiseg_udptcp_cksum(struct rte_mbuf *mbuf)
>>>>
>>>> I didn't review the patch, only had a quick glance, but this part bothers me.  OVS doesn't support multi-segment mbufs, so it should not be possible for such mbufs being transmitted by OVS.  So, I do not understand why this function needs to work with such mbufs.
>>>>
>>>> [Yi Yang] Only DPDK driver/Tx function will use it, not OVS, set_multiseg_udptcp_cksum is called in GSO part, it is last step before Tx function, it is a big external mbuf before rte_gso_segment, that isn't a multi-segmented mbuf.
>>>>
>>>
>>> Hi Ilya,
>>>
>>> Now I understand Yi Yang's point better and I agree with him.
>>> Looks like the patch does the GSO at the DPDK TX function.
>>> It creates multi-seg mbuf after rte_gso_segment(), but will immediately send out the multi-seg mbuf to DPDK port, without traversing inside other part of OVS code. I guess this case it should work OK?
>>
>> Hi.  GSO itself should be possible to implement, but we should
>> not enable support for multi-segment mbufs in the NIC, since this
>> might end up in multi-segment packets being received by OVS
>> causing lots of trouble (for example dp_packet_clone() uses simple
>> memcpy() that will result in invalid memory reads.)
> 
> I see your point, thanks!
> I guess you're saying that we have to do s.t like:
> nic->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS
> to enable NIC's multi-segment support for GSO.
> And once enabling it, OVS might receive multi-segment
> mbuf from the NIC, and OVS doesn't support it.

OK.  This probably is not true.  I missed the fact that MULTI_SEGS
enabled only for tx offloads.  So, this should work, I guess.

But still, software fallback should not be implemented only for
netdev-dpdk.  It's needed for other port types too and it should
not depend on DPDK.

> 
>>
>> GSO should be implemented in a same way TSO implemented.  Right now
>> TSO implementation has no software fallback to segment large packets
>> if netdev doesn't support - it it simply drops them.  However,
>> software fallback should be implemented someday and it should be
>> implemented in a generic way for all the netdev types instead of
>> doing it only for one (netdev-dpdk).  Brief look at the patch tells
>> me that the generic check from netdev_send_prepare_packet() was
>> removed.  This means that huge packets could reach e.g. netdev-afxdp
>> that will not behave correctly.
>> Generic software fallback should produce sequence of 'dp_packet's
>> instead of multi-segment mbufs, so there should be no need to enable
>> support for multi-segment mbufs in DPDK PMD.
>>
> I agree, thanks!
> So which one do we need, for dp_packet:
> 1) GSO 2) Tunneled GSO, ex: vxlan and geneve, or 3) Both?
> 
> I feel we can assume all HW nic today support TSO, but not
> tunnel packet's TSO. So we only need to implement (2).

Well, we need everything.  We can't depend on fact that all ports
supports even TSO since this is under control of a VM configuration in
case of a VM attached via vhost-user port.  AF_XDP ports doesn't
support TSO, some DPDK vdevs probably doesn't support TSO.
If OVS is running inside a VM, ports might not support TSO,
since this depends on what is the underlying port implementation and
how the VM was configured.


More information about the dev mailing list