[ovs-dev] [PATCH] netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports.
Chandran, Sugesh
sugesh.chandran at intel.com
Wed Aug 24 15:18:29 UTC 2016
Typo error in the heading.
This is the RFC patch, and the heading has to be,
"[RFC PATCHv4] netdev-dpdk: Enable Rx checksum offloading feature on DPDK physical ports."
Sorry for missing it out.
Regards
_Sugesh
> -----Original Message-----
> From: Chandran, Sugesh
> Sent: Wednesday, August 24, 2016 3:54 PM
> To: dev at openvswitch.org; jesse at kernel.org
> Cc: Chandran, Sugesh <sugesh.chandran at intel.com>
> Subject: [PATCH] netdev-dpdk: Enable Rx checksum offloading feature on
> DPDK physical ports.
>
> Add Rx checksum offloading feature support on DPDK physical ports. By
> default,
> the Rx checksum offloading is enabled if NIC supports. However,
> the checksum offloading can be turned OFF either while adding a new DPDK
> physical port to OVS or at runtime.
>
> The rx checksum offloading can be turned off by setting the parameter to
> 'false'. For eg: To disable the rx checksum offloading when adding a port,
>
> 'ovs-vsctl add-port br0 dpdk0 -- \
> set Interface dpdk0 type=dpdk options:rx-checksum-offload=false'
>
> OR (to disable at run time after port is being added to OVS)
>
> 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=false'
>
> Similarly to turn ON rx checksum offloading at run time,
>
> 'ovs-vsctl set Interface dpdk0 options:rx-checksum-offload=true'
>
> This is a RFC patch as the new checksum offload flags
> 'PKT_RX_L4_CKSUM_GOOD'
> and 'PKT_RX_IP_CKSUM_GOOD' will be available only in DPDK 16.11 release.
> OVS
> must compile with DPDK 16.11 release to use the checksum offloading
> feature.
>
> The Tx checksum offloading support is not implemented due to the following
> reasons.
>
> 1) Checksum offloading and vectorization are mutually exclusive in DPDK poll
> mode driver. Vector packet processing is turned OFF when checksum
> offloading
> is enabled which causes significant performance drop at Tx side.
>
> 2) Normally, OVS generates checksum for tunnel packets in software at the
> 'tunnel push' operation, where the tunnel headers are created. However
> enabling Tx checksum offloading involves,
>
> *) Mark every packets for tx checksum offloading at 'tunnel_push' and
> recirculate.
> *) At the time of xmit, validate the same flag and instruct the NIC to do the
> checksum calculation. In case NIC doesnt support Tx checksum offloading,
> the checksum calculation has to be done in software before sending out the
> packets.
>
> No significant performance improvement noticed with Tx checksum
> offloading
> due to the e overhead of additional validations + non vector packet
> processing.
> In some test scenarios, it introduces performance drop too.
>
> Rx checksum offloading still offers 8-9% of improvement on VxLAN tunneling
> decapsulation even though the SSE vector Rx function is disabled in DPDK poll
> mode driver.
>
> Signed-off-by: Sugesh Chandran <sugesh.chandran at intel.com>
>
> ---
> v4
> - Unconditonally clear off the checksum flag one time in pop operation than
> doing
> separately in IP and UDP layers.
>
> v3
> - Reset the checksum offload flags in tunnel pop operation after the
> validation.
> - Reconfigure the dpdk port with rx checksum offload only if new
> configuration
> is different than current one.
>
> v2
> - Set Rx checksum enabled by default.
> - Modified commit message, explaining the tradeoff with tx checksum
> offloading.
> - Use dpdk mbuf checksum offload flags instead of defining new
> metadata field in OVS dp_packet.
> - validate udp checksum mbuf flag only if the checksum present in the
> packet.
> - Doc update with Rx checksum offloading feature.
> ---
> INSTALL.DPDK-ADVANCED.md | 18 ++++++++++++++++--
> lib/dp-packet.h | 29 +++++++++++++++++++++++++++++
> lib/netdev-dpdk.c | 46
> ++++++++++++++++++++++++++++++++++++++++++++++
> lib/netdev-native-tnl.c | 38 +++++++++++++++++++++++---------------
> vswitchd/vswitch.xml | 13 +++++++++++++
> 5 files changed, 127 insertions(+), 17 deletions(-)
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index 857c805..6cc42d9 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -14,7 +14,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
> 9. [Flow Control](#fc)
> 10. [Pdump](#pdump)
> 11. [Jumbo Frames](#jumbo)
> -12. [Vsperf](#vsperf)
> +12. [Rx Checksum Offload](#rx_csum)
> +13. [Vsperf](#vsperf)
>
> ## <a name="overview"></a> 1. Overview
>
> @@ -834,7 +835,20 @@ vhost ports:
> ifconfig eth1 mtu 9000
> ```
>
> -## <a name="vsperf"></a> 12. Vsperf
> +## <a name="rx_csum"></a> 12. Rx Checksum Offload
> +By default, DPDK physical ports are enabled with Rx checksum offload. Rx
> +checksum offload can be configured on a DPDK physical port either when
> adding
> +or at run time.
> +
> +e.g. To disable Rx checksum offload when adding a DPDK port dpdk0:
> +
> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:rx-
> checksum-offload=false`
> +
> +e.g. To disable the Rx checksum offloading on a existing DPDK port dpdk0:
> +
> +`ovs-vsctl set Interface dpdk0 type=dpdk options:rx-checksum-
> offload=false`
> +
> +## <a name="vsperf"></a> 13. Vsperf
>
> Vsperf project goal is to develop vSwitch test framework that can be used to
> validate the suitability of different vSwitch implementations in a Telco
> deployment
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 7c1e637..ee601d0 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -592,6 +592,35 @@ dp_packet_rss_invalidate(struct dp_packet *p)
> #endif
> }
>
> +static inline bool
> +dp_packet_ip_checksum_valid(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> + return p->mbuf.ol_flags & PKT_RX_IP_CKSUM_GOOD;
> +#else
> + return 0;
> +#endif
> +}
> +
> +static inline bool
> +dp_packet_l4_checksum_valid(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> + return p->mbuf.ol_flags & PKT_RX_L4_CKSUM_GOOD;
> +#else
> + return 0;
> +#endif
> +}
> +
> +static inline void
> +reset_dp_packet_checksum_ol_flags(struct dp_packet *p)
> +{
> +#ifdef DPDK_NETDEV
> + p->mbuf.ol_flags &= ~(PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD);
> +#endif
> +}
> +
> enum { NETDEV_MAX_BURST = 32 }; /* Maximum number packets in a
> batch. */
>
> struct dp_packet_batch {
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6d334db..46c4045 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -326,6 +326,10 @@ struct ingress_policer {
> rte_spinlock_t policer_lock;
> };
>
> +enum dpdk_hw_ol_features {
> + NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
> +};
> +
> struct netdev_dpdk {
> struct netdev up;
> int port_id;
> @@ -387,6 +391,10 @@ struct netdev_dpdk {
>
> /* DPDK-ETH Flow control */
> struct rte_eth_fc_conf fc_conf;
> +
> + /* DPDK-ETH hardware offload features,
> + * from the enum set 'dpdk_hw_ol_features' */
> + uint32_t hw_ol_features;
> };
>
> struct netdev_rxq_dpdk {
> @@ -624,6 +632,8 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> *dev, int n_rxq, int n_txq)
> conf.rxmode.jumbo_frame = 0;
> conf.rxmode.max_rx_pkt_len = 0;
> }
> + conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
> + NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> /* A device may report more queues than it makes available (this has
> * been observed for Intel xl710, which reserves some of them for
> * SRIOV): rte_eth_*_queue_setup will fail if a queue is not
> @@ -684,6 +694,28 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> *dev, int n_rxq, int n_txq)
> }
>
> static void
> +dpdk_eth_checksum_offload_configure(struct netdev_dpdk *dev)
> + OVS_REQUIRES(dev->mutex)
> +{
> + struct rte_eth_dev_info info;
> + bool rx_csum_ol_flag = false;
> + uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
> + DEV_RX_OFFLOAD_TCP_CKSUM |
> + DEV_RX_OFFLOAD_IPV4_CKSUM;
> + rte_eth_dev_info_get(dev->port_id, &info);
> + rx_csum_ol_flag = (dev->hw_ol_features &
> NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
> +
> + if (rx_csum_ol_flag &&
> + (info.rx_offload_capa & rx_chksm_offload_capa) !=
> + rx_chksm_offload_capa) {
> + VLOG_WARN("Failed to enable Rx checksum offload on device %d",
> + dev->port_id);
> + dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD;
> + }
> + netdev_request_reconfigure(&dev->up);
> +}
> +
> +static void
> dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev-
> >mutex)
> {
> if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) {
> @@ -838,6 +870,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned
> int port_no,
>
> /* Initialize the flow control to NULL */
> memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
> +
> + /* Initilize the hardware offload flags to 0 */
> + dev->hw_ol_features = 0;
> if (type == DPDK_DEV_ETH) {
> err = dpdk_eth_dev_init(dev);
> if (err) {
> @@ -1071,6 +1106,8 @@ static int
> netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
> {
> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + bool rx_chksm_ofld;
> + bool temp_flag;
>
> ovs_mutex_lock(&dev->mutex);
>
> @@ -1090,6 +1127,15 @@ netdev_dpdk_set_config(struct netdev *netdev,
> const struct smap *args)
>
> dpdk_eth_flow_ctrl_setup(dev);
>
> + /* Rx checksum offload configuration */
> + /* By default the Rx checksum offload is ON */
> + rx_chksm_ofld = smap_get_bool(args, "rx-checksum-offload", true);
> + temp_flag = (dev->hw_ol_features &
> NETDEV_RX_CHECKSUM_OFFLOAD)
> + != 0;
> + if (temp_flag != rx_chksm_ofld) {
> + dev->hw_ol_features ^= NETDEV_RX_CHECKSUM_OFFLOAD;
> + dpdk_eth_checksum_offload_configure(dev);
> + }
> ovs_mutex_unlock(&dev->mutex);
>
> return 0;
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index ce2582f..31a12d6 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -85,9 +85,11 @@ netdev_tnl_ip_extract_tnl_md(struct dp_packet
> *packet, struct flow_tnl *tnl,
>
> ovs_be32 ip_src, ip_dst;
>
> - if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> - VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> - return NULL;
> + if(OVS_UNLIKELY(!dp_packet_ip_checksum_valid(packet))) {
> + if (csum(ip, IP_IHL(ip->ip_ihl_ver) * 4)) {
> + VLOG_WARN_RL(&err_rl, "ip packet has invalid checksum");
> + return NULL;
> + }
> }
>
> if (ntohs(ip->ip_tot_len) > l3_size) {
> @@ -179,20 +181,26 @@ udp_extract_tnl_md(struct dp_packet *packet,
> struct flow_tnl *tnl,
> }
>
> if (udp->udp_csum) {
> - uint32_t csum;
> - if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> - csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> - } else {
> - csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> - }
> -
> - csum = csum_continue(csum, udp, dp_packet_size(packet) -
> - ((const unsigned char *)udp -
> - (const unsigned char *)dp_packet_l2(packet)));
> - if (csum_finish(csum)) {
> - return NULL;
> + if(OVS_UNLIKELY(!dp_packet_l4_checksum_valid(packet))) {
> + uint32_t csum;
> + if (netdev_tnl_is_header_ipv6(dp_packet_data(packet))) {
> + csum = packet_csum_pseudoheader6(dp_packet_l3(packet));
> + } else {
> + csum = packet_csum_pseudoheader(dp_packet_l3(packet));
> + }
> +
> + csum = csum_continue(csum, udp, dp_packet_size(packet) -
> + ((const unsigned char *)udp -
> + (const unsigned char *)dp_packet_l2(packet)));
> + if (csum_finish(csum)) {
> + return NULL;
> + }
> }
> tnl->flags |= FLOW_TNL_F_CSUM;
> +
> + /* Reset the checksum offload flags if present, to avoid wrong
> + * interpretation in the further packet processing when recirculated.*/
> + reset_dp_packet_checksum_ol_flags(packet);
> }
>
> tnl->tp_src = udp->udp_src;
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 69b5592..19d5a4b 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3193,6 +3193,19 @@
> </column>
> </group>
>
> + <group title="Rx Checksum Offload Configuration">
> + <p>
> + The checksum validation on the incoming packets are performed on NIC
> + using Rx checksum offload feature. Implemented only for <code>dpdk
> + </code>physical interfaces.
> + </p>
> +
> + <column name="options" key="rx-checksum-offload" type='{"type":
> "boolean"}'>
> + Set to <code>false</code> to disble Rx checksum offloading on <code>
> + dpdk</code>physical ports. By default, Rx checksum offload is enabled.
> + </column>
> + </group>
> +
> <group title="Common Columns">
> The overall purpose of these columns is described under <code>Common
> Columns</code> at the beginning of this document.
> --
> 2.5.0
More information about the dev
mailing list