[ovs-dev] [PATCH] netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.

Chandran, Sugesh sugesh.chandran at intel.com
Wed Aug 24 15:18:29 UTC 2016


Typo error in the heading. 
This is the RFC patch, and the heading has to be,

"[RFC PATCHv4] netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports."
Sorry for missing it out.



Regards
_Sugesh


> -----Original Message-----
> From: Chandran, Sugesh
> Sent: Wednesday, August 24, 2016 3:54 PM
> To: dev at openvswitch.org; jesse at kernel.org
> Cc: Chandran, Sugesh <sugesh.chandran at intel.com>
> Subject: [PATCH] netdev-dpdk: Enable Rx checksum offloading feature on
> DPDK physical ports.
> 
> Add Rx checksum offloading feature support on DPDK physical ports. By
> default,
> the Rx checksum offloading is enabled if NIC supports. However,
> the checksum offloading can be turned OFF either while adding a new DPDK
> physical port to OVS or at runtime.
> 
> The rx checksum offloading can be turned off by setting the parameter to
> 'false'. For eg: To disable the rx checksum offloading when adding a port,
> 
>  'ovs-vsctl add-port br0 dpdk0 -- \
>   set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
> 
> OR (to disable at run time after port is being added to OVS)
> 
> 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
> 
> Similarly to turn ON rx checksum offloading at run time,
> 
> 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
> 
> This is a RFC patch as the new checksum offload flags
> 'PKT_RX_L4_CKSUM_GOOD'
> and 'PKT_RX_IP_CKSUM_GOOD' will be available only in DPDK 16.11 release.
> OVS
> must compile with DPDK 16.11 release to use the checksum offloading
> feature.
> 
> The Tx checksum offloading support is not implemented due to the following
> reasons.
> 
> 1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
> mode driver. Vector packet processing is turned OFF when checksum
> offloading
> is enabled which causes significant performance drop at Tx side.
> 
> 2) Normally, OVS generates checksum for tunnel packets in software at the
> 'tunnel push' operation, where the tunnel headers are created. However
> enabling Tx checksum offloading involves,
> 
>   *) Mark every packets for tx checksum offloading at 'tunnel_push' and
>   recirculate.
>   *) At the time of xmit, validate the same flag and instruct the NIC to do the
>   checksum calculation.  In case NIC doesnt support Tx checksum offloading,
>   the checksum calculation has to be done in software before sending out the
>   packets.
> 
> No significant performance improvement noticed with Tx checksum
> offloading
> due to the e overhead of additional validations + non vector packet
> processing.
> In some test scenarios, it introduces performance drop too.
> 
> Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
> decapsulation even though the SSE vector Rx function is disabled in DPDK poll
> mode driver.
> 
> Signed-off-by: Sugesh Chandran <sugesh.chandran at intel.com>
> 
> ---
> v4
> - Unconditonally clear off the checksum flag one time in pop operation than
> doing
> separately in IP and UDP layers.
> 
> v3
> - Reset the checksum offload flags in tunnel pop operation after the
> validation.
> - Reconfigure the dpdk port with rx checksum offload only if new
> configuration
> is different than current one.
> 
> v2
> - Set Rx checksum enabled by default.
> - Modified commit message, explaining the tradeoff with tx checksum
> offloading.
> - Use dpdk mbuf checksum offload flags  instead of defining new
> metadata field in OVS dp_packet.
> - validate udp checksum mbuf flag only if the checksum present in the
> packet.
> - Doc update with Rx checksum offloading feature.
> ---
>  INSTALL.DPDK-ADVANCED.md | 18 ++++++++++++++++--
>  lib/dp-packet.h          | 29 +++++++++++++++++++++++++++++
>  lib/netdev-dpdk.c        | 46
> ++++++++++++++++++++++++++++++++++++++++++++++
>  lib/netdev-native-tnl.c  | 38 +++++++++++++++++++++++---------------
>  vswitchd/vswitch.xml     | 13 +++++++++++++
>  5 files changed, 127 insertions(+), 17 deletions(-)
> 
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 857c805..6cc42d9 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -14,7 +14,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>  9. [Flow Control](#fc)
>  10. [Pdump](#pdump)
>  11. [Jumbo Frames](#jumbo)
> -12. [Vsperf](#vsperf)
> +12. [Rx Checksum Offload](#rx_csum)
> +13. [Vsperf](#vsperf)
> 
>  ## <a name="overview"></a> 1. Overview
> 
> @@ -834,7 +835,20 @@ vhost ports:
>       ifconfig eth1 mtu 9000
>       ```
> 
> -## <a name="vsperf"></a> 12. Vsperf
> +## <a name="rx_csum"></a> 12. Rx Checksum Offload
> +By default, DPDK physical ports are enabled with Rx checksum offload. Rx
> +checksum offload can be configured on a DPDK physical port either when
> adding
> +or at run time.
> +
> +e.g. To disable Rx checksum offload when adding a DPDK port dpdk0:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:rx-
> checksum-offload=false`
> +
> +e.g. To disable the Rx checksum offloading on a existing DPDK port dpdk0:
> +
> +`ovs-vsctl set Interface dpdk0 type=dpdk options:rx-checksum-
> offload=false`
> +
> +## <a name="vsperf"></a> 13. Vsperf
> 
>  Vsperf project goal is to develop vSwitch test framework that can be used to
>  validate the suitability of different vSwitch implementations in a Telco
> deployment
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 7c1e637..ee601d0 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -592,6 +592,35 @@ dp_packet_rss_invalidate(struct dp_packet *p)
>  #endif
>  }
> 
> +static inline bool
> +dp_packet_ip_checksum_valid(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> +    return p->mbuf.ol_flags & PKT_RX_IP_CKSUM_GOOD;
> +#else
> +    return 0;
> +#endif
> +}
> +
> +static inline bool
> +dp_packet_l4_checksum_valid(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> +    return p->mbuf.ol_flags & PKT_RX_L4_CKSUM_GOOD;
> +#else
> +    return 0;
> +#endif
> +}
> +
> +static inline void
> +reset_dp_packet_checksum_ol_flags(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> +    p->mbuf.ol_flags &= ~(PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_L4_CKSUM_BAD |
> +                          PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD);
> +#endif
> +}
> +
>  enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a
> batch. */
> 
>  struct dp_packet_batch {
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6d334db..46c4045 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -326,6 +326,10 @@ struct ingress_policer {
>      rte_spinlock_t policer_lock;
>  };
> 
> +enum dpdk_hw_ol_features {
> +    NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
> +};
> +
>  struct netdev_dpdk {
>      struct netdev up;
>      int port_id;
> @@ -387,6 +391,10 @@ struct netdev_dpdk {
> 
>      /* DPDK-ETH Flow control */
>      struct rte_eth_fc_conf fc_conf;
> +
> +    /* DPDK-ETH hardware offload features,
> +     * from the enum set 'dpdk_hw_ol_features' */
> +    uint32_t hw_ol_features;
>  };
> 
>  struct netdev_rxq_dpdk {
> @@ -624,6 +632,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> *dev, int n_rxq, int n_txq)
>          conf.rxmode.jumbo_frame = 0;
>          conf.rxmode.max_rx_pkt_len = 0;
>      }
> +    conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
> +                                  NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
>      /* A device may report more queues than it makes available (this has
>       * been observed for Intel xl710, which reserves some of them for
>       * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
> @@ -684,6 +694,28 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> *dev, int n_rxq, int n_txq)
>  }
> 
>  static void
> +dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
> +    OVS_REQUIRES(dev->mutex)
> +{
> +    struct rte_eth_dev_info info;
> +    bool rx_csum_ol_flag = false;
> +    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
> +                                     DEV_RX_OFFLOAD_TCP_CKSUM |
> +                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
> +    rte_eth_dev_info_get(dev->port_id, &info);
> +    rx_csum_ol_flag = (dev->hw_ol_features &
> NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> +
> +    if (rx_csum_ol_flag &&
> +        (info.rx_offload_capa & rx_chksm_offload_capa) !=
> +         rx_chksm_offload_capa) {
> +        VLOG_WARN("Failed to enable Rx checksum offload on device %d",
> +                   dev->port_id);
> +        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
> +    }
> +    netdev_request_reconfigure(&dev->up);
> +}
> +
> +static void
>  dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev-
> >mutex)
>  {
>      if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) {
> @@ -838,6 +870,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned
> int port_no,
> 
>      /* Initialize the flow control to NULL */
>      memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
> +
> +    /* Initilize the hardware offload flags to 0 */
> +    dev->hw_ol_features = 0;
>      if (type == DPDK_DEV_ETH) {
>          err = dpdk_eth_dev_init(dev);
>          if (err) {
> @@ -1071,6 +1106,8 @@ static int
>  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    bool rx_chksm_ofld;
> +    bool temp_flag;
> 
>      ovs_mutex_lock(&dev->mutex);
> 
> @@ -1090,6 +1127,15 @@ netdev_dpdk_set_config(struct netdev *netdev,
> const struct smap *args)
> 
>      dpdk_eth_flow_ctrl_setup(dev);
> 
> +    /* Rx checksum offload configuration */
> +    /* By default the Rx checksum offload is ON */
> +    rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
> +    temp_flag = (dev->hw_ol_features &
> NETDEV_RX_CHECKSUM_OFFLOAD)
> +                        != 0;
> +    if (temp_flag != rx_chksm_ofld) {
> +        dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
> +        dpdk_eth_checksum_offload_configure(dev);
> +    }
>      ovs_mutex_unlock(&dev->mutex);
> 
>      return 0;
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index ce2582f..31a12d6 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -85,9 +85,11 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet
> *packet, struct flow_tnl *tnl,
> 
>          ovs_be32 ip_src, ip_dst;
> 
> -        if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> -            VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> -            return NULL;
> +        if(OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
> +            if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> +                VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> +                return NULL;
> +            }
>          }
> 
>          if (ntohs(ip->ip_tot_len) > l3_size) {
> @@ -179,20 +181,26 @@ udp_extract_tnl_md(struct dp_packet *packet,
> struct flow_tnl *tnl,
>      }
> 
>      if (udp->udp_csum) {
> -        uint32_t csum;
> -        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> -            csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> -        } else {
> -            csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> -        }
> -
> -        csum = csum_continue(csum, udp, dp_packet_size(packet) -
> -                             ((const unsigned char *)udp -
> -                              (const unsigned char *)dp_packet_l2(packet)));
> -        if (csum_finish(csum)) {
> -            return NULL;
> +        if(OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
> +            uint32_t csum;
> +            if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> +                csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> +            } else {
> +                csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> +            }
> +
> +            csum = csum_continue(csum, udp, dp_packet_size(packet) -
> +                                 ((const unsigned char *)udp -
> +                                  (const unsigned char *)dp_packet_l2(packet)));
> +            if (csum_finish(csum)) {
> +                return NULL;
> +            }
>          }
>          tnl->flags |= FLOW_TNL_F_CSUM;
> +
> +        /* Reset the checksum offload flags if present, to avoid wrong
> +         * interpretation in the further packet processing when recirculated.*/
> +        reset_dp_packet_checksum_ol_flags(packet);
>      }
> 
>      tnl->tp_src = udp->udp_src;
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 69b5592..19d5a4b 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3193,6 +3193,19 @@
>        </column>
>      </group>
> 
> +    <group title="Rx Checksum Offload Configuration">
> +      <p>
> +        The checksum validation on the incoming packets are performed on NIC
> +        using Rx checksum offload feature. Implemented only for <code>dpdk
> +        </code>physical interfaces.
> +      </p>
> +
> +      <column name="options" key="rx-checksum-offload" type='{"type":
> "boolean"}'>
> +        Set to <code>false</code> to disble Rx checksum offloading on <code>
> +        dpdk</code>physical ports. By default, Rx checksum offload is enabled.
> +      </column>
> +    </group>
> +
>      <group title="Common Columns">
>        The overall purpose of these columns is described under <code>Common
>        Columns</code> at the beginning of this document.
> --
> 2.5.0




More information about the dev mailing list