[ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW checksum offload support for DPDK pnic

Gao Zhenyu sysugaozhenyu at gmail.com
Wed Jun 21 08:32:21 UTC 2017


I get it.  Maybe caculating it in OVS part is doable as well.
So, how about adding more options to let people choose HW-tcp-cksum(reduce
cpu cycles) or SW-tcp-cksum(may be better performance)?
Then we have NO-TCP-CKSUM, SW-TCP-CKSUM, HW-TCP-CKSUM.

BTW, when will DPDK support tx checksum offload with vectorization?

Thanks
Zhenyu Gao


2017-06-21 16:03 GMT+08:00 Chandran, Sugesh <sugesh.chandran at intel.com>:

>
>
>
>
> *Regards*
>
> *_Sugesh*
>
>
>
> *From:* Gao Zhenyu [mailto:sysugaozhenyu at gmail.com]
> *Sent:* Monday, June 19, 2017 1:23 PM
> *To:* Chandran, Sugesh <sugesh.chandran at intel.com>
> *Cc:* blp at ovn.org; u9012063 at gmail.com; ktraynor at redhat.com; Kavanagh,
> Mark B <mark.b.kavanagh at intel.com>; dev at openvswitch.org
> *Subject:* Re: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW
> checksum offload support for DPDK pnic
>
>
>
> Thanks for that comments.
>
> [Sugesh] Any reason, why this patch does only the TCP checksum offload??
> The command line option says tx_checksum offload (it could be mistakenly
> considered for full checksum offload).
>
> [Zhenyu Gao] DPDK nic supports many hw offload feature like IPv4,IPV6,TCP,
> UDP,VXLAN,GRE. I would like to make them work step by step. A huge patch
> may introduce more potential issues.
>
> TCP offload is a basic and essential feature so I prefer to implement it
> first.
>
> *[Sugesh] Ok, Fine!*
>
>
>
> [Sugesh] What is the performance improvement offered with this feature? Do
> you have any numbers to share?
> [Zhenyu Gao]I think DPDK uses non-vector functions when Tx checksum
> offload is enabled. Will it give enough performance improvement to mitigate
> that cost?
>
> It is a draft patch to collect advise and suggestions. In my draft
> testing, it doesn't show improvment or regression
>
> In ovs-dpdk + veth environment, veth support tcp cksum offload by default,
> but it introduces tcp connection issue because veth believes it supports
> cksum and offload to ovs, but dpdk side doesn't do the offloading.
>
> So I have to use ethtool -K eth1 tx off to disable all tx offloading if
> using original ovs-dpdk. That means we cannot consume TSO as well.
>
> *[Sugesh] This is a concern. We have to consider other usecases as well.
> Most of the high performance ovs-dpdk applications doesn’t use any
> kernel/veth pair interfaces in OVS-DPDK datapath.*
>
>
>
>
>
> It is a ovs-dpdk + veth environment. So it consumes sendmsg/ recvmsg on
> RX/TX in ovs-dpdk side. The netperf was executed on ovs-dpdk + veth side.
> The veth side enabled tx-tcp hw cksum, disabled tso.    Bottleneck was not
> in cksum, and running testing in a vhost VM is more reasonable.
>
> *[Sugesh] I agree with you. But its worthwhile to know what is the
> performance delta. Also if the cost of vectorization is high, we may
> consider to do the checksum calculation in software itself. I feel x86
> instructions can do checksum calculation pretty efficient. Have you
> consider that option?*
>
>
> [root at 16ee46e4b793 ~]# netperf -H 10.100.85.247 -t TCP_RR -l 10
> MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> to 10.100.85.247 () port 0 AF_INET : first burst 0
> Local /Remote
> Socket Size   Request  Resp.   Elapsed  Trans.
> Send   Recv   Size     Size    Time     Rate
> bytes  Bytes  bytes    bytes   secs.    per sec
>
> 16384  87380  1        1       10.00    15001.87(HW tcp-cksum)
> 15062.72(No HW tcp-cksum)
> 16384  87380
>
>
> [root at 16ee46e4b793 ~]#     netperf -H 10.100.85.247 -t TCP_STREAM -l 10
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 10.100.85.247 () port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
>
>  87380  16384  16384    10.02     263.41(HW tcp-cksum)   265.31(No HW
> tcp-cksum)
>
>
>
> I would like to keep it disabled in default setting unless we implement
> more tx offloading like TSO.(Do you have concern on it?)  BTW, I think I
> can rename NETDEV_TX_CHECKSUM_OFFLOAD into NETDEV_TX_TCP_CHECKSUM_OFFLOAD.
>
> Please let me know if you get any questions. :)
>
> *[Sugesh] On Rx checksum offload case, it works with vector instructions.
> The latest DPDK support rx checksum offload with vectorization. *
>
> Thanks
>
>
>
> 2017-06-19 17:26 GMT+08:00 Chandran, Sugesh <sugesh.chandran at intel.com>:
>
> Hi Zhenyu,
>
> Thank you for working on this,
> I have couple of questions in this patch.
>
> Regards
> _Sugesh
>
>
> > -----Original Message-----
> > From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> > bounces at openvswitch.org] On Behalf Of Zhenyu Gao
> > Sent: Friday, June 16, 2017 1:54 PM
> > To: blp at ovn.org; u9012063 at gmail.com; ktraynor at redhat.com; Kavanagh,
> > Mark B <mark.b.kavanagh at intel.com>; dev at openvswitch.org
> > Subject: [ovs-dev] [RFC PATCH v1] net-dpdk: Introducing TX tcp HW
> > checksum offload support for DPDK pnic
> >
> > This patch introduce TX tcp-checksum offload support for DPDK pnic.
> > The feature is disabled by default and can be enabled by setting tx-
> > checksum-offload, which like:
> > ovs-vsctl set Interface dpdk-eth3 \
> >  options:tx-checksum-offload=true
> > ---
> >  lib/netdev-dpdk.c    | 112
> > +++++++++++++++++++++++++++++++++++++++++++++++----
> >  vswitchd/vswitch.xml |  13 ++++--
> >  2 files changed, 115 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index bba4de3..5a68a48
> > 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -32,6 +32,7 @@
> >  #include <rte_mbuf.h>
> >  #include <rte_meter.h>
> >  #include <rte_virtio_net.h>
> > +#include <rte_ip.h>
> >
> >  #include "dirs.h"
> >  #include "dp-packet.h"
> > @@ -328,6 +329,7 @@ struct ingress_policer {
> >
> >  enum dpdk_hw_ol_features {
> >      NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
> > +    NETDEV_TX_CHECKSUM_OFFLOAD = 1 << 1,
> >  };
> >
> >  struct netdev_dpdk {
> > @@ -649,6 +651,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> > *dev, int n_rxq, int n_txq)
> >      int diag = 0;
> >      int i;
> >      struct rte_eth_conf conf = port_conf;
> > +    struct rte_eth_txconf *txconf;
> > +    struct rte_eth_dev_info dev_info;
> >
> >      if (dev->mtu > ETHER_MTU) {
> >          conf.rxmode.jumbo_frame = 1;
> > @@ -676,9 +680,16 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> > *dev, int n_rxq, int n_txq)
> >              break;
> >          }
> >
> > +        rte_eth_dev_info_get(dev->port_id, &dev_info);
> > +        txconf = &dev_info.default_txconf;
> > +        if (dev->hw_ol_features & NETDEV_TX_CHECKSUM_OFFLOAD) {
> > +            /*Enable tx offload feature on pnic*/
> > +            txconf->txq_flags = 0;
> > +        }
> > +
> >          for (i = 0; i < n_txq; i++) {
> >              diag = rte_eth_tx_queue_setup(dev->port_id, i,
> dev->txq_size,
> > -                                          dev->socket_id, NULL);
> > +                                          dev->socket_id, txconf);
> >              if (diag) {
> >                  VLOG_INFO("Interface %s txq(%d) setup error: %s",
> >                            dev->up.name, i, rte_strerror(-diag)); @@
> -724,11 +735,15 @@
> > dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)  {
> >      struct rte_eth_dev_info info;
> >      bool rx_csum_ol_flag = false;
> > +    bool tx_csum_ol_flag = false;
> >      uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
> >                                       DEV_RX_OFFLOAD_TCP_CKSUM |
> >                                       DEV_RX_OFFLOAD_IPV4_CKSUM;
> > +    uint32_t tx_chksm_offload_capa = DEV_TX_OFFLOAD_TCP_CKSUM;
>
> [Sugesh] Any reason, why this patch does only the TCP checksum offload??
> The command line option says tx_checksum offload (it could be mistakenly
> considered for full checksum offload).
>
> > +
> >      rte_eth_dev_info_get(dev->port_id, &info);
> >      rx_csum_ol_flag = (dev->hw_ol_features &
> > NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> > +    tx_csum_ol_flag = (dev->hw_ol_features &
> > + NETDEV_TX_CHECKSUM_OFFLOAD) != 0;
> >
> >      if (rx_csum_ol_flag &&
> >          (info.rx_offload_capa & rx_chksm_offload_capa) != @@ -736,9
> +751,15
> > @@ dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
> >          VLOG_WARN_ONCE("Rx checksum offload is not supported on device
> > %"PRIu8,
> >                         dev->port_id);
> >          dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
> > -        return;
> > +    } else if (tx_csum_ol_flag &&
> > +               (info.tx_offload_capa & tx_chksm_offload_capa) !=
> > +                tx_chksm_offload_capa) {
> > +        VLOG_WARN_ONCE("Tx checksum offload is not supported on device
> > %"PRIu8,
> > +                       dev->port_id);
> > +        dev->hw_ol_features &= ~NETDEV_TX_CHECKSUM_OFFLOAD;
> > +    } else {
> > +        netdev_request_reconfigure(&dev->up);
> >      }
> > -    netdev_request_reconfigure(&dev->up);
> >  }
> >
> > --
>
> [Sugesh] What is the performance improvement offered with this feature? Do
> you have any numbers to share?
> I think DPDK uses non-vector functions when Tx checksum offload is
> enabled. Will it give enough performance improvement to mitigate that cost?
>
> Finally Rx checksum offload is going to be a default option (there wont be
> any configuration option to enable/disable, Kevin's patch for the support
> is already acked and waiting to merge).  Similarly can't we enable it by
> default when it is supported?
>
>
>
> > 1.8.3.1
>
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>


More information about the dev mailing list