[ovs-dev] [RFC PATCHv5] netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.

Sugesh Chandran sugesh.chandran at intel.com
Thu Aug 25 09:43:38 UTC 2016


Add Rx checksum offloading feature support on DPDK physical ports. By default,
the Rx checksum offloading is enabled if NIC supports. However,
the checksum offloading can be turned OFF either while adding a new DPDK
physical port to OVS or at runtime.

The rx checksum offloading can be turned off by setting the parameter to
'false'. For eg: To disable the rx checksum offloading when adding a port,

 'ovs-vsctl add-port br0 dpdk0 -- \
  set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'

OR (to disable at run time after port is being added to OVS)

'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'

Similarly to turn ON rx checksum offloading at run time,

'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'

This is a RFC patch as the new checksum offload flags 'PKT_RX_L4_CKSUM_GOOD'
and 'PKT_RX_IP_CKSUM_GOOD' will be available only in DPDK 16.11 release. OVS
must compile with DPDK 16.11 release to use the checksum offloading feature.

The Tx checksum offloading support is not implemented due to the following
reasons.

1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
mode driver. Vector packet processing is turned OFF when checksum offloading
is enabled which causes significant performance drop at Tx side.

2) Normally, OVS generates checksum for tunnel packets in software at the
'tunnel push' operation, where the tunnel headers are created. However
enabling Tx checksum offloading involves,

  *) Mark every packets for tx checksum offloading at 'tunnel_push' and
  recirculate.
  *) At the time of xmit, validate the same flag and instruct the NIC to do the
  checksum calculation.  In case NIC doesnt support Tx checksum offloading,
  the checksum calculation has to be done in software before sending out the
  packets.

No significant performance improvement noticed with Tx checksum offloading
due to the e overhead of additional validations + non vector packet processing.
In some test scenarios, it introduces performance drop too.

Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
decapsulation even though the SSE vector Rx function is disabled in DPDK poll
mode driver.

Signed-off-by: Sugesh Chandran <sugesh.chandran at intel.com>

---
v5
- Reset the checksum flag at common tunnel pop function than in
'udp_extract_tnl_md' function.

v4
- Unconditonally clear off the checksum flag one time in pop operation than doing
separately in IP and UDP layers.

v3
- Reset the checksum offload flags in tunnel pop operation after the validation.
- Reconfigure the dpdk port with rx checksum offload only if new configuration
is different than current one.

v2
- Set Rx checksum enabled by default.
- Modified commit message, explaining the tradeoff with tx checksum offloading.
- Use dpdk mbuf checksum offload flags  instead of defining new
metadata field in OVS dp_packet.
- validate udp checksum mbuf flag only if the checksum present in the packet.
- Doc update with Rx checksum offloading feature.
---
 INSTALL.DPDK-ADVANCED.md | 18 ++++++++++++++++--
 lib/dp-packet.h          | 29 +++++++++++++++++++++++++++++
 lib/netdev-dpdk.c        | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/netdev-native-tnl.c  | 34 +++++++++++++++++++---------------
 lib/netdev.c             |  4 ++++
 vswitchd/vswitch.xml     | 13 +++++++++++++
 6 files changed, 127 insertions(+), 17 deletions(-)

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
index 857c805..6cc42d9 100755
--- a/INSTALL.DPDK-ADVANCED.md
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -14,7 +14,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
 9. [Flow Control](#fc)
 10. [Pdump](#pdump)
 11. [Jumbo Frames](#jumbo)
-12. [Vsperf](#vsperf)
+12. [Rx Checksum Offload](#rx_csum)
+13. [Vsperf](#vsperf)
 
 ## <a name="overview"></a> 1. Overview
 
@@ -834,7 +835,20 @@ vhost ports:
      ifconfig eth1 mtu 9000
      ```
 
-## <a name="vsperf"></a> 12. Vsperf
+## <a name="rx_csum"></a> 12. Rx Checksum Offload
+By default, DPDK physical ports are enabled with Rx checksum offload. Rx
+checksum offload can be configured on a DPDK physical port either when adding
+or at run time.
+
+e.g. To disable Rx checksum offload when adding a DPDK port dpdk0:
+
+`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:rx-checksum-offload=false`
+
+e.g. To disable the Rx checksum offloading on a existing DPDK port dpdk0:
+
+`ovs-vsctl set Interface dpdk0 type=dpdk options:rx-checksum-offload=false`
+
+## <a name="vsperf"></a> 13. Vsperf
 
 Vsperf project goal is to develop vSwitch test framework that can be used to
 validate the suitability of different vSwitch implementations in a Telco deployment
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 7c1e637..ee601d0 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -592,6 +592,35 @@ dp_packet_rss_invalidate(struct dp_packet *p)
 #endif
 }
 
+static inline bool
+dp_packet_ip_checksum_valid(struct dp_packet *p)
+{
+#ifdef DPDK_NETDEV
+    return p->mbuf.ol_flags & PKT_RX_IP_CKSUM_GOOD;
+#else
+    return 0;
+#endif
+}
+
+static inline bool
+dp_packet_l4_checksum_valid(struct dp_packet *p)
+{
+#ifdef DPDK_NETDEV
+    return p->mbuf.ol_flags & PKT_RX_L4_CKSUM_GOOD;
+#else
+    return 0;
+#endif
+}
+
+static inline void
+reset_dp_packet_checksum_ol_flags(struct dp_packet *p)
+{
+#ifdef DPDK_NETDEV
+    p->mbuf.ol_flags &= ~(PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
+                          PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD);
+#endif
+}
+
 enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a batch. */
 
 struct dp_packet_batch {
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 6d334db..46c4045 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -326,6 +326,10 @@ struct ingress_policer {
     rte_spinlock_t policer_lock;
 };
 
+enum dpdk_hw_ol_features {
+    NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
+};
+
 struct netdev_dpdk {
     struct netdev up;
     int port_id;
@@ -387,6 +391,10 @@ struct netdev_dpdk {
 
     /* DPDK-ETH Flow control */
     struct rte_eth_fc_conf fc_conf;
+
+    /* DPDK-ETH hardware offload features,
+     * from the enum set 'dpdk_hw_ol_features' */
+    uint32_t hw_ol_features;
 };
 
 struct netdev_rxq_dpdk {
@@ -624,6 +632,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
         conf.rxmode.jumbo_frame = 0;
         conf.rxmode.max_rx_pkt_len = 0;
     }
+    conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
+                                  NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
      * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
@@ -684,6 +694,28 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 }
 
 static void
+dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
+    OVS_REQUIRES(dev->mutex)
+{
+    struct rte_eth_dev_info info;
+    bool rx_csum_ol_flag = false;
+    uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
+                                     DEV_RX_OFFLOAD_TCP_CKSUM |
+                                     DEV_RX_OFFLOAD_IPV4_CKSUM;
+    rte_eth_dev_info_get(dev->port_id, &info);
+    rx_csum_ol_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
+
+    if (rx_csum_ol_flag &&
+        (info.rx_offload_capa & rx_chksm_offload_capa) !=
+         rx_chksm_offload_capa) {
+        VLOG_WARN("Failed to enable Rx checksum offload on device %d",
+                   dev->port_id);
+        dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
+    }
+    netdev_request_reconfigure(&dev->up);
+}
+
+static void
 dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex)
 {
     if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) {
@@ -838,6 +870,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
 
     /* Initialize the flow control to NULL */
     memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
+
+    /* Initilize the hardware offload flags to 0 */
+    dev->hw_ol_features = 0;
     if (type == DPDK_DEV_ETH) {
         err = dpdk_eth_dev_init(dev);
         if (err) {
@@ -1071,6 +1106,8 @@ static int
 netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    bool rx_chksm_ofld;
+    bool temp_flag;
 
     ovs_mutex_lock(&dev->mutex);
 
@@ -1090,6 +1127,15 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
 
     dpdk_eth_flow_ctrl_setup(dev);
 
+    /* Rx checksum offload configuration */
+    /* By default the Rx checksum offload is ON */
+    rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
+    temp_flag = (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD)
+                        != 0;
+    if (temp_flag != rx_chksm_ofld) {
+        dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
+        dpdk_eth_checksum_offload_configure(dev);
+    }
     ovs_mutex_unlock(&dev->mutex);
 
     return 0;
diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
index ce2582f..23e987c 100644
--- a/lib/netdev-native-tnl.c
+++ b/lib/netdev-native-tnl.c
@@ -85,9 +85,11 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
 
         ovs_be32 ip_src, ip_dst;
 
-        if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
-            VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
-            return NULL;
+        if(OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
+            if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
+                VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
+                return NULL;
+            }
         }
 
         if (ntohs(ip->ip_tot_len) > l3_size) {
@@ -179,18 +181,20 @@ udp_extract_tnl_md(struct dp_packet *packet, struct flow_tnl *tnl,
     }
 
     if (udp->udp_csum) {
-        uint32_t csum;
-        if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
-            csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
-        } else {
-            csum = packet_csum_pseudoheader(dp_packet_l3(packet));
-        }
-
-        csum = csum_continue(csum, udp, dp_packet_size(packet) -
-                             ((const unsigned char *)udp -
-                              (const unsigned char *)dp_packet_l2(packet)));
-        if (csum_finish(csum)) {
-            return NULL;
+        if(OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
+            uint32_t csum;
+            if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
+                csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
+            } else {
+                csum = packet_csum_pseudoheader(dp_packet_l3(packet));
+            }
+
+            csum = csum_continue(csum, udp, dp_packet_size(packet) -
+                                 ((const unsigned char *)udp -
+                                  (const unsigned char *)dp_packet_l2(packet)));
+            if (csum_finish(csum)) {
+                return NULL;
+            }
         }
         tnl->flags |= FLOW_TNL_F_CSUM;
     }
diff --git a/lib/netdev.c b/lib/netdev.c
index 10f2d0f..626e1a4 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -740,6 +740,10 @@ netdev_pop_header(struct netdev *netdev, struct dp_packet_batch *batch)
     for (i = 0; i < batch->count; i++) {
         buffers[i] = netdev->netdev_class->pop_header(buffers[i]);
         if (buffers[i]) {
+            /* Reset the checksum offload flags if present, to avoid wrong
+             * interpretation in the further packet processing when
+             * recirculated.*/
+            reset_dp_packet_checksum_ol_flags(buffers[i]);
             buffers[n_cnt++] = buffers[i];
         }
     }
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 69b5592..19d5a4b 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3193,6 +3193,19 @@
       </column>
     </group>
 
+    <group title="Rx Checksum Offload Configuration">
+      <p>
+        The checksum validation on the incoming packets are performed on NIC
+        using Rx checksum offload feature. Implemented only for <code>dpdk
+        </code>physical interfaces.
+      </p>
+
+      <column name="options" key="rx-checksum-offload" type='{"type": "boolean"}'>
+        Set to <code>false</code> to disble Rx checksum offloading on <code>
+        dpdk</code>physical ports. By default, Rx checksum offload is enabled.
+      </column>
+    </group>
+
     <group title="Common Columns">
       The overall purpose of these columns is described under <code>Common
       Columns</code> at the beginning of this document.
-- 
2.5.0




More information about the dev mailing list