[ovs-dev] [dpdk-hwol PATCH v6 1/2] netdev-dpdk: Upgrade to dpdk v18.08

Ophir Munk ophirmu at mellanox.com
Thu Oct 25 16:24:47 UTC 2018


1. Enable compilation and linkage with dpdk 18.08.0
The following dpdk commits which were introduced after dpdk 17.11.x
require OVS updates to accommodate to the dpdk changes.
- ce17edde ("ethdev: introduce Rx queue offloads API")
- ab3ce1e0 ("ethdev: remove old offload API")
- c06ddf96 ("meter: add configuration profile")
- e58638c3 ("ethdev: fix TPID handling in flow API")
- cd8c7c7c ("ethdev: replace bus specific struct with generic dev")
- ac8d22de ("ethdev: flatten RSS configuration in flow API")

2. Limit configured rss hash functions to only those supported
by the eth device.

3. Set default RSS key in struct action_rss_data, required by OVS commit
- e8a2b5bf ("netdev-dpdk: implement flow offload with rte flow")
when configured with "other_config:hw-offload=true"
Remark: calling RSS with 0 length (default) key is rejected
in DPDK 18.08 and will be enabled in DPDK 18.11. It has no effect
when running in a "hw-offload=false" configuration.

4. Update references to DPDK version 18.08 in Documentation and in
travis linux-build script

5. There are currently warnings on DPDK deprecated functions calls:
- rte_eth_dev_attach
- rte_eth_dev_detach
- rte_eth_devargs_parse
The deprecated functions calls replacements will be added to
DPDK 18.11.

Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
---
v1:
First version

v2:
Avoid seg faults cases as described in 
https://patchwork.ozlabs.org/patch/965451/
by using the patch in:
https://github.com/kevintraynor/ovs-dpdk-
master/commit/88f46cc5ab338eb4f3ca5db1eacd0effefe4fa0c

v3:
- Rebase on latest dpdk-hwol branch
- Updates based on latest reviews to versions v1 & v2

v4:
This patch got lost in mailing list server due to administrative issues and
is now obsolete

v5:
- updated commit message
- Address all reviews (some skipped by mistake) from recent versions
- it is suggested to ignore deprecated functions warnings as the functions 
replacements are missing in DPDK 18.08 and will be added to DPDK 18.11 

v6:
- Rebase on latest dpdk-hwol branch
- Updates based on patch reviews
https://patchwork.ozlabs.org/project/openvswitch/list/?series=70067

 .travis/linux-build.sh                   |   2 +-
 Documentation/intro/install/dpdk.rst     |  14 +--
 Documentation/topics/dpdk/vhost-user.rst |   6 +-
 lib/netdev-dpdk.c                        | 142 ++++++++++++++++++++-----------
 4 files changed, 105 insertions(+), 59 deletions(-)

diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
index 1fe5bbf..4c9e952 100755
--- a/.travis/linux-build.sh
+++ b/.travis/linux-build.sh
@@ -83,7 +83,7 @@ fi
 
 if [ "$DPDK" ]; then
     if [ -z "$DPDK_VER" ]; then
-        DPDK_VER="17.11.4"
+        DPDK_VER="18.08"
     fi
     install_dpdk $DPDK_VER
     if [ "$CC" = "clang" ]; then
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index 312b1a8..73610ef 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
 In addition to the requirements described in :doc:`general`, building Open
 vSwitch with DPDK will require the following:
 
-- DPDK 17.11.4
+- DPDK 18.08.0
 
 - A `DPDK supported NIC`_
 
@@ -71,9 +71,9 @@ Install DPDK
 #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
 
        $ cd /usr/src/
-       $ wget http://fast.dpdk.org/rel/dpdk-17.11.4.tar.xz
-       $ tar xf dpdk-17.11.4.tar.xz
-       $ export DPDK_DIR=/usr/src/dpdk-stable-17.11.4
+       $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
+       $ tar xf dpdk-18.08.tar.xz
+       $ export DPDK_DIR=/usr/src/dpdk-stable-18.08
        $ cd $DPDK_DIR
 
 #. (Optional) Configure DPDK as a shared library
@@ -283,9 +283,9 @@ with either the ovs-vswitchd logs, or by running either of the commands::
 
   $ ovs-vswitchd --version
   ovs-vswitchd (Open vSwitch) 2.9.0
-  DPDK 17.11.0
+  DPDK 18.08.0
   $ ovs-vsctl get Open_vSwitch . dpdk_version
-  "DPDK 17.11.0"
+  "DPDK 18.08.0"
 
 At this point you can use ovs-vsctl to set up bridges and other Open vSwitch
 features. Seeing as we've configured the DPDK datapath, we will use DPDK-type
@@ -673,7 +673,7 @@ Limitations
   The latest list of validated firmware versions can be found in the `DPDK
   release notes`_.
 
-.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_17_11.html
+.. _DPDK release notes: http://dpdk.org/doc/guides/rel_notes/release_18_08.html
 
 - Upper bound MTU: DPDK device drivers differ in how the L2 frame for a
   given MTU value is calculated e.g. i40e driver includes 2 x vlan headers in
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 176bf17..56f58ba 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -320,9 +320,9 @@ To begin, instantiate a guest as described in :ref:`dpdk-vhost-user` or
 DPDK sources to VM and build DPDK::
 
     $ cd /root/dpdk/
-    $ wget http://fast.dpdk.org/rel/dpdk-17.11.4.tar.xz
-    $ tar xf dpdk-17.11.4.tar.xz
-    $ export DPDK_DIR=/root/dpdk/dpdk-stable-17.11.4
+    $ wget http://fast.dpdk.org/rel/dpdk-18.08.tar.xz
+    $ tar xf dpdk-18.08.tar.xz
+    $ export DPDK_DIR=/root/dpdk/dpdk-stable-18.08
     $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
     $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
     $ cd $DPDK_DIR
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f91aa27..adfad29 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -168,11 +168,7 @@ static const struct rte_eth_conf port_conf = {
     .rxmode = {
         .mq_mode = ETH_MQ_RX_RSS,
         .split_hdr_size = 0,
-        .header_split   = 0, /* Header Split disabled */
-        .hw_ip_checksum = 0, /* IP checksum offload disabled */
-        .hw_vlan_filter = 0, /* VLAN filtering disabled */
-        .jumbo_frame    = 0, /* Jumbo Frame Support disabled */
-        .hw_strip_crc   = 0,
+        .offloads = 0,
     },
     .rx_adv_conf = {
         .rss_conf = {
@@ -364,6 +360,7 @@ struct dpdk_ring {
 struct ingress_policer {
     struct rte_meter_srtcm_params app_srtcm_params;
     struct rte_meter_srtcm in_policer;
+    struct rte_meter_srtcm_profile in_prof;
     rte_spinlock_t policer_lock;
 };
 
@@ -894,6 +891,8 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
     struct rte_eth_dev_info info;
     uint16_t conf_mtu;
 
+    rte_eth_dev_info_get(dev->port_id, &info);
+
     /* As of DPDK 17.11.1 a few PMDs require to explicitly enable
      * scatter to support jumbo RX. Checking the offload capabilities
      * is not an option as PMDs are not required yet to report
@@ -901,20 +900,25 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
      * (testing or code review). Listing all such PMDs feels harder
      * than highlighting the one known not to need scatter */
     if (dev->mtu > ETHER_MTU) {
-        rte_eth_dev_info_get(dev->port_id, &info);
         if (strncmp(info.driver_name, "net_nfp", 7)) {
-            conf.rxmode.enable_scatter = 1;
+            conf.rxmode.offloads |= DEV_RX_OFFLOAD_SCATTER;
         }
     }
 
     conf.intr_conf.lsc = dev->lsc_interrupt_mode;
-    conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
-                                  NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
+
+    if (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) {
+        conf.rxmode.offloads |= DEV_RX_OFFLOAD_CHECKSUM;
+    }
 
     if (dev->hw_ol_features & NETDEV_RX_HW_CRC_STRIP) {
-        conf.rxmode.hw_strip_crc = 1;
+        conf.rxmode.offloads |= DEV_RX_OFFLOAD_CRC_STRIP;
     }
 
+    /* Limit configured rss hash functions to only those supported
+     * by the eth device. */
+    conf.rx_adv_conf.rss_conf.rss_hf &= info.flow_type_rss_offloads;
+
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
      * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
@@ -1932,16 +1936,18 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
 
 static inline bool
 netdev_dpdk_policer_pkt_handle(struct rte_meter_srtcm *meter,
+                               struct rte_meter_srtcm_profile *profile,
                                struct rte_mbuf *pkt, uint64_t time)
 {
     uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct ether_hdr);
 
-    return rte_meter_srtcm_color_blind_check(meter, time, pkt_len) ==
-                                                e_RTE_METER_GREEN;
+    return rte_meter_srtcm_color_blind_check(meter, profile, time, pkt_len) ==
+                                             e_RTE_METER_GREEN;
 }
 
 static int
 netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
+                        struct rte_meter_srtcm_profile *profile,
                         struct rte_mbuf **pkts, int pkt_cnt,
                         bool should_steal)
 {
@@ -1953,7 +1959,8 @@ netdev_dpdk_policer_run(struct rte_meter_srtcm *meter,
     for (i = 0; i < pkt_cnt; i++) {
         pkt = pkts[i];
         /* Handle current packet */
-        if (netdev_dpdk_policer_pkt_handle(meter, pkt, current_time)) {
+        if (netdev_dpdk_policer_pkt_handle(meter, profile,
+                pkt, current_time)) {
             if (cnt != i) {
                 pkts[cnt] = pkt;
             }
@@ -1975,8 +1982,8 @@ ingress_policer_run(struct ingress_policer *policer, struct rte_mbuf **pkts,
     int cnt = 0;
 
     rte_spinlock_lock(&policer->policer_lock);
-    cnt = netdev_dpdk_policer_run(&policer->in_policer, pkts,
-                                  pkt_cnt, should_steal);
+    cnt = netdev_dpdk_policer_run(&policer->in_policer, &policer->in_prof,
+                                  pkts, pkt_cnt, should_steal);
     rte_spinlock_unlock(&policer->policer_lock);
 
     return cnt;
@@ -2767,8 +2774,12 @@ netdev_dpdk_policer_construct(uint32_t rate, uint32_t burst)
     policer->app_srtcm_params.cir = rate_bytes;
     policer->app_srtcm_params.cbs = burst_bytes;
     policer->app_srtcm_params.ebs = 0;
-    err = rte_meter_srtcm_config(&policer->in_policer,
-                                    &policer->app_srtcm_params);
+    err = rte_meter_srtcm_profile_config(&policer->in_prof,
+                                         &policer->app_srtcm_params);
+    if (!err) {
+        err = rte_meter_srtcm_config(&policer->in_policer,
+                                     &policer->in_prof);
+    }
     if (err) {
         VLOG_ERR("Could not create rte meter for ingress policer");
         free(policer);
@@ -3017,9 +3028,23 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
         return ENODEV;
     }
 
+    ovs_mutex_lock(&dpdk_mutex);
     ovs_mutex_lock(&dev->mutex);
     rte_eth_dev_info_get(dev->port_id, &dev_info);
     ovs_mutex_unlock(&dev->mutex);
+    const struct rte_bus *bus;
+    const struct rte_pci_device *pci_dev;
+    uint16_t vendor_id = PCI_ANY_ID;
+    uint16_t device_id = PCI_ANY_ID;
+    bus = rte_bus_find_by_device(dev_info.device);
+    if (bus && !strcmp(bus->name, "pci")) {
+        pci_dev = RTE_DEV_TO_PCI(dev_info.device);
+        if (pci_dev) {
+            vendor_id = pci_dev->id.vendor_id;
+            device_id = pci_dev->id.device_id;
+        }
+    }
+    ovs_mutex_unlock(&dpdk_mutex);
 
     smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id);
     smap_add_format(args, "numa_id", "%d",
@@ -3042,14 +3067,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
     smap_add_format(args, "if_type", "%"PRIu32, IF_TYPE_ETHERNETCSMACD);
     smap_add_format(args, "if_descr", "%s %s", rte_version(),
                                                dev_info.driver_name);
-
-    if (dev_info.pci_dev) {
-        smap_add_format(args, "pci-vendor_id", "0x%x",
-                        dev_info.pci_dev->id.vendor_id);
-        smap_add_format(args, "pci-device_id", "0x%x",
-                        dev_info.pci_dev->id.device_id);
-    }
-
+    smap_add_format(args, "pci-vendor_id", "0x%x", vendor_id);
+    smap_add_format(args, "pci-device_id", "0x%x", device_id);
     return 0;
 }
 
@@ -3727,6 +3746,7 @@ struct egress_policer {
     struct qos_conf qos_conf;
     struct rte_meter_srtcm_params app_srtcm_params;
     struct rte_meter_srtcm egress_meter;
+    struct rte_meter_srtcm_profile egress_prof;
 };
 
 static void
@@ -3749,11 +3769,17 @@ egress_policer_qos_construct(const struct smap *details,
     policer = xmalloc(sizeof *policer);
     qos_conf_init(&policer->qos_conf, &egress_policer_ops);
     egress_policer_details_to_param(details, &policer->app_srtcm_params);
-    err = rte_meter_srtcm_config(&policer->egress_meter,
-                                 &policer->app_srtcm_params);
+    err = rte_meter_srtcm_profile_config(&policer->egress_prof,
+                                         &policer->app_srtcm_params);
+    if (!err) {
+        err = rte_meter_srtcm_config(&policer->egress_meter,
+                                     &policer->egress_prof);
+    }
+
     if (!err) {
         *conf = &policer->qos_conf;
     } else {
+        VLOG_ERR("Could not create rte meter for egress policer");
         free(policer);
         *conf = NULL;
         err = -err;
@@ -3803,7 +3829,8 @@ egress_policer_run(struct qos_conf *conf, struct rte_mbuf **pkts, int pkt_cnt,
     struct egress_policer *policer =
         CONTAINER_OF(conf, struct egress_policer, qos_conf);
 
-    cnt = netdev_dpdk_policer_run(&policer->egress_meter, pkts,
+    cnt = netdev_dpdk_policer_run(&policer->egress_meter,
+                                  &policer->egress_prof, pkts,
                                   pkt_cnt, should_steal);
 
     return cnt;
@@ -3888,7 +3915,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
     if (!err) {
         /* A new mempool was created or re-used. */
         netdev_change_seq_changed(&dev->up);
-    } else if (err != EEXIST){
+    } else if (err != EEXIST) {
         return err;
     }
     if (netdev_dpdk_get_vid(dev) >= 0) {
@@ -4103,15 +4130,15 @@ dump_flow_pattern(struct rte_flow_item *item)
 
         VLOG_DBG("rte flow vlan pattern:\n");
         if (vlan_spec) {
-            VLOG_DBG("  Spec: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
-                     ntohs(vlan_spec->tpid), ntohs(vlan_spec->tci));
+            VLOG_DBG("  Spec: inner_type=0x%"PRIx16", tci=0x%"PRIx16"\n",
+                     ntohs(vlan_spec->inner_type), ntohs(vlan_spec->tci));
         } else {
             VLOG_DBG("  Spec = null\n");
         }
 
         if (vlan_mask) {
-            VLOG_DBG("  Mask: tpid=0x%"PRIx16", tci=0x%"PRIx16"\n",
-                     vlan_mask->tpid, vlan_mask->tci);
+            VLOG_DBG("  Mask: inner_type=0x%"PRIx16", tci=0x%"PRIx16"\n",
+                     ntohs(vlan_mask->inner_type), ntohs(vlan_mask->tci));
         } else {
             VLOG_DBG("  Mask = null\n");
         }
@@ -4281,27 +4308,39 @@ add_flow_action(struct flow_actions *actions, enum rte_flow_action_type type,
     actions->cnt++;
 }
 
+struct action_rss_data {
+    struct rte_flow_action_rss conf;
+    uint16_t queue[0];
+};
+
 static struct rte_flow_action_rss *
 add_flow_rss_action(struct flow_actions *actions,
                     struct netdev *netdev) {
     int i;
-    struct rte_flow_action_rss *rss;
-
-    rss = xmalloc(sizeof(*rss) + sizeof(uint16_t) * netdev->n_rxq);
-    /*
-     * Setting it to NULL will let the driver use the default RSS
-     * configuration we have set: &port_conf.rx_adv_conf.rss_conf.
-     */
-    rss->rss_conf = NULL;
-    rss->num = netdev->n_rxq;
+    struct action_rss_data *rss_data;
+
+    rss_data = xmalloc(sizeof(struct action_rss_data) +
+                       sizeof(uint16_t) * netdev->n_rxq);
+    *rss_data = (struct action_rss_data) {
+        .conf = (struct rte_flow_action_rss) {
+            .func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+            .level = 0,
+            .types = 0,
+            .queue_num = netdev->n_rxq,
+            .queue = rss_data->queue,
+            .key_len = 0,
+            .key  = NULL
+        },
+    };
 
-    for (i = 0; i < rss->num; i++) {
-        rss->queue[i] = i;
+    /* Override queue array with default */
+    for (i = 0; i < netdev->n_rxq; i++) {
+       rss_data->queue[i] = i;
     }
 
-    add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, rss);
+    add_flow_action(actions, RTE_FLOW_ACTION_TYPE_RSS, &rss_data->conf);
 
-    return rss;
+    return &rss_data->conf;
 }
 
 static int
@@ -4365,7 +4404,7 @@ netdev_dpdk_add_rte_flow_offload(struct netdev *netdev,
         vlan_mask.tci  = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
 
         /* match any protocols */
-        vlan_mask.tpid = 0;
+        vlan_mask.inner_type = 0;
 
         add_flow_pattern(&patterns, RTE_FLOW_ITEM_TYPE_VLAN,
                          &vlan_spec, &vlan_mask);
@@ -4520,7 +4559,14 @@ end_proto_check:
 
     flow = rte_flow_create(dev->port_id, &flow_attr, patterns.items,
                            actions.actions, &error);
-    free(rss);
+    /*
+     * 'rss' points to a memory (struct rte_flow_action_rss) that is contained
+     * in a bigger memory (struct action_rss_data) that was allocated in
+     * function add_flow_rss_actions(). The bigger memory holds additional
+     * space for the RSS queues. Given the 'rss' pointer we are freeing the
+     * bigger memory by using the CONTAINER_OF() macro.
+     */
+    free(CONTAINER_OF(rss, struct action_rss_data, conf));
     if (!flow) {
         VLOG_ERR("rte flow creat error: %u : message : %s\n",
                  error.type, error.message);
-- 
1.8.3.1



More information about the dev mailing list