[ovs-dev] [PATCH] netdev-afxdp: Best-effort configuration of XDP mode.

Ilya Maximets i.maximets at ovn.org
Thu Nov 7 10:01:22 UTC 2019


Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.

Devices like 'veth' interfaces in Linux supports native XDP, but
doesn't support zero-copy mode.  This case can not be covered by
existing API and we have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP
mode for veth interfaces due to kernel limitations, however it is
supported in native mode.

This change introduces ability to use native XDP without zero-copy
along with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting
from the fastest one and will choose the first acceptable for current
interface.  This will guarantee the best possible performance.

If user will want to choose specific mode, it's still possible by
setting the 'options:xdp-mode'.

This change additionally changes the API by renaming the configuration
knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
themselves to be more user-friendly.

The full list of currently supported modes:
  * native-with-zerocopy - former DRV
  * native               - new one, DRV without zero-copy
  * generic              - former SKB
  * best-effort          - new one, chooses the best available from
                           3 above modes

Since 'best-effort' is a default mode, users will not need to
explicitely set 'xdp-mode' in most cases.

TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces
and this mode has no issues with TCP.

Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
---

With this patch I modified the user-visible API, but I think it's OK
since it's still an experimental netdev.  Comments are welcome.

 Documentation/intro/install/afxdp.rst |  54 ++++---
 NEWS                                  |  12 +-
 lib/netdev-afxdp.c                    | 223 ++++++++++++++++----------
 lib/netdev-afxdp.h                    |   9 ++
 lib/netdev-linux-private.h            |   8 +-
 tests/system-afxdp-macros.at          |   7 -
 vswitchd/vswitch.xml                  |  38 +++--
 7 files changed, 227 insertions(+), 124 deletions(-)

diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
index a136db0c9..937770ad0 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
   make check-afxdp TESTSUITEFLAGS='1'
 
 .. note::
-   Not all test cases pass at this time. Currenly all TCP related
-   tests, ex: using wget or http, are skipped due to XDP limitations
-   on veth. cvlan test is also skipped.
+   Not all test cases pass at this time. Currenly all cvlan tests are skipped
+   due to kernel issues.
 
 If a test case fails, check the log at::
 
@@ -177,33 +176,35 @@ in :doc:`general`::
   ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
 
 Make sure your device driver support AF_XDP, netdev-afxdp supports
-the following additional options (see man ovs-vswitchd.conf.db for
+the following additional options (see ``man ovs-vswitchd.conf.db`` for
 more details):
 
- * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
+ * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
+   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best of
+   supported modes, so in most cases you don't need to change it.
 
- * **use-need-wakeup**: default "true" if libbpf supports it, otherwise false.
+ * ``use-need-wakeup``: default ``true`` if libbpf supports it,
+   otherwise ``false``.
 
 For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
-configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
-The **xdpmode** can be "drv" or "skb"::
+configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
+``n_rxq``::
 
   ethtool -L enp2s0 combined 1
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
   ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
-    options:n_rxq=1 options:xdpmode=drv \
-    other_config:pmd-rxq-affinity="0:4"
+                                   other_config:pmd-rxq-affinity="0:4"
 
 Or, use 4 pmds/cores and 4 queues by doing::
 
   ethtool -L enp2s0 combined 4
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
   ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
-    options:n_rxq=4 options:xdpmode=drv \
-    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
+    options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
 
 .. note::
-   pmd-rxq-affinity is optional. If not specified, system will auto-assign.
+   ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
+   ``n_rxq`` equals ``1`` by default.
 
 To validate that the bridge has successfully instantiated, you can use the::
 
@@ -214,12 +215,21 @@ Should show something like::
   Port "ens802f0"
    Interface "ens802f0"
       type: afxdp
-      options: {n_rxq="1", xdpmode=drv}
+      options: {n_rxq="1"}
 
 Otherwise, enable debugging by::
 
   ovs-appctl vlog/set netdev_afxdp::dbg
 
+To check which XDP mode was chosen by ``best-effort``, you can look for
+``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
+
+  # ovs-appctl dpctl/show
+  netdev at ovs-netdev:
+    <...>
+    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
+                      xdp-mode=best-effort,
+                      xdp-mode-in-use=native-with-zerocopy)
 
 References
 ----------
@@ -323,8 +333,11 @@ Limitations/Known Issues
 #. Most of the tests are done using i40e single port. Multiple ports and
    also ixgbe driver also needs to be tested.
 #. No latency test result (TODO items)
-#. Due to limitations of current upstream kernel, TCP and various offloading
+#. Due to limitations of current upstream kernel, various offloading
    (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
+   Also, TCP is not working over virtual interfaces in generic XDP mode.
+   Some more information and possible workaround available `here
+   <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
 
 
 PVP using tap device
@@ -335,8 +348,7 @@ First, start OVS, then add physical port::
   ethtool -L enp2s0 combined 1
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
   ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
-    options:n_rxq=1 options:xdpmode=drv \
-    other_config:pmd-rxq-affinity="0:4"
+    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
 
 Start a VM with virtio and tap device::
 
@@ -414,13 +426,11 @@ Create namespace and veth peer devices::
 
 Attach the veth port to br0 (linux kernel mode)::
 
-  ovs-vsctl add-port br0 afxdp-p0 -- \
-    set interface afxdp-p0 options:n_rxq=1
+  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
 
-Or, use AF_XDP with skb mode::
+Or, use AF_XDP::
 
-  ovs-vsctl add-port br0 afxdp-p0 -- \
-    set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
+  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
 
 Setup the OpenFlow rules::
 
diff --git a/NEWS b/NEWS
index 0d65d5a7f..100d7b6a8 100644
--- a/NEWS
+++ b/NEWS
@@ -5,11 +5,19 @@ Post-v2.12.0
        separate project. You can find it at
        https://github.com/ovn-org/ovn.git
    - Userspace datapath:
+     * Add option to enable, disable and query TCP sequence checking in
+       conntrack.
+   - AF_XDP:
      * New option 'use-need-wakeup' for netdev-afxdp to control enabling
        of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
        if supported by libbpf.
-     * Add option to enable, disable and query TCP sequence checking in
-       conntrack.
+     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
+       Modes also updated.  New values:
+         native-with-zerocopy  - former DRV
+         native                - new one, DRV without zero-copy
+         generic               - former SKB
+         best-effort [default] - new one, chooses the best available from
+                                 3 above modes
    - DPDK:
      * DPDK pdump packet capture support disabled by default. New configure
        option '--enable-dpdk-pdump' to enable it.
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index af654d498..74dde219d 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
 #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
 
 static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
-                                             int mode, bool use_need_wakeup);
-static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
+                                             enum afxdp_mode mode,
+                                             bool use_need_wakeup,
+                                             bool report_socket_failures);
+static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
 static void xsk_destroy(struct xsk_socket_info *xsk);
 static int xsk_configure_all(struct netdev *netdev);
 static void xsk_destroy_all(struct netdev *netdev);
 
+static struct {
+    const char *name;
+    uint32_t bind_flags;
+    uint32_t xdp_flags;
+} xdp_modes[] = {
+    [OVS_AF_XDP_MODE_UNSPEC] = {
+        .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
+    },
+    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
+        .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
+    },
+    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
+        .name = "native-with-zerocopy",
+        .bind_flags = XDP_ZEROCOPY,
+        .xdp_flags = XDP_FLAGS_DRV_MODE,
+    },
+    [OVS_AF_XDP_MODE_NATIVE] = {
+        .name = "native",
+        .bind_flags = XDP_COPY,
+        .xdp_flags = XDP_FLAGS_DRV_MODE,
+    },
+    [OVS_AF_XDP_MODE_GENERIC] = {
+        .name = "generic",
+        .bind_flags = XDP_COPY,
+        .xdp_flags = XDP_FLAGS_SKB_MODE,
+    },
+};
+
 struct unused_pool {
     struct xsk_umem_info *umem_info;
     int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
@@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
 }
 
 static struct xsk_umem_info *
-xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
+xsk_configure_umem(void *buffer, uint64_t size)
 {
     struct xsk_umem_config uconfig;
     struct xsk_umem_info *umem;
@@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
     ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
                            &uconfig);
     if (ret) {
-        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
-                 ovs_strerror(errno),
-                 xdpmode == XDP_COPY ? "SKB": "DRV");
+        VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
         free(umem);
         return NULL;
     }
@@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
 
 static struct xsk_socket_info *
 xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
-                     uint32_t queue_id, int xdpmode, bool use_need_wakeup)
+                     uint32_t queue_id, enum afxdp_mode mode,
+                     bool use_need_wakeup, bool report_socket_failures)
 {
     struct xsk_socket_config cfg;
     struct xsk_socket_info *xsk;
@@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
     cfg.rx_size = CONS_NUM_DESCS;
     cfg.tx_size = PROD_NUM_DESCS;
     cfg.libbpf_flags = 0;
-
-    if (xdpmode == XDP_ZEROCOPY) {
-        cfg.bind_flags = XDP_ZEROCOPY;
-        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
-    } else {
-        cfg.bind_flags = XDP_COPY;
-        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
-    }
+    cfg.bind_flags = xdp_modes[mode].bind_flags;
+    cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
 
 #ifdef HAVE_XDP_NEED_WAKEUP
     if (use_need_wakeup) {
@@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
     ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
                              &xsk->rx, &xsk->tx, &cfg);
     if (ret) {
-        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
-                 "use-need-wakeup: %s qid: %d",
-                 ovs_strerror(errno),
-                 xdpmode == XDP_COPY ? "SKB": "DRV",
-                 use_need_wakeup ? "true" : "false",
-                 queue_id);
+        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
+             "xsk_socket__create failed (%s) mode: %s, "
+             "use-need-wakeup: %s, qid: %d",
+             ovs_strerror(errno), xdp_modes[mode].name,
+             use_need_wakeup ? "true" : "false", queue_id);
         free(xsk);
         return NULL;
     }
@@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
 }
 
 static struct xsk_socket_info *
-xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
-              bool use_need_wakeup)
+xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
+              bool use_need_wakeup, bool report_socket_failures)
 {
     struct xsk_socket_info *xsk;
     struct xsk_umem_info *umem;
@@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
     memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
 
     /* Create AF_XDP socket. */
-    umem = xsk_configure_umem(bufs,
-                              NUM_FRAMES * FRAME_SIZE,
-                              xdpmode);
+    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
     if (!umem) {
         free_pagealign(bufs);
         return NULL;
@@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
 
     VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
 
-    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
-                               use_need_wakeup);
+    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
+                               use_need_wakeup, report_socket_failures);
     if (!xsk) {
         /* Clean up umem and xpacket pool. */
         if (xsk_umem__delete(umem->umem)) {
@@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
     return xsk;
 }
 
+static int
+xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
+                    enum afxdp_mode mode, bool report_socket_failures)
+{
+    struct xsk_socket_info *xsk_info;
+
+    VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
+             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
+             dev->use_need_wakeup ? "true" : "false");
+    xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
+                             report_socket_failures);
+    if (!xsk_info) {
+        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
+             "%s: Failed to create AF_XDP socket on queue %d in %s mode.",
+             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
+        dev->xsks[queue_id] = NULL;
+        return -1;
+    }
+    dev->xsks[queue_id] = xsk_info;
+    atomic_init(&xsk_info->tx_dropped, 0);
+    xsk_info->outstanding_tx = 0;
+    xsk_info->available_rx = PROD_NUM_DESCS;
+    return 0;
+}
+
+
 static int
 xsk_configure_all(struct netdev *netdev)
 {
     struct netdev_linux *dev = netdev_linux_cast(netdev);
-    struct xsk_socket_info *xsk_info;
     int i, ifindex, n_rxq, n_txq;
+    int qid = 0;
 
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
 
@@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
     n_rxq = netdev_n_rxq(netdev);
     dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
 
-    /* Configure each queue. */
-    for (i = 0; i < n_rxq; i++) {
-        VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
-                 netdev_get_name(netdev), i,
-                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
-                 dev->use_need_wakeup ? "true" : "false");
-        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
-                                 dev->use_need_wakeup);
-        if (!xsk_info) {
-            VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
-            dev->xsks[i] = NULL;
+    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
+        /* Trying to configure first queue with different modes to
+         * find the most suitable. */
+        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
+            if (!xsk_configure_queue(dev, ifindex, qid, i,
+                                     i == OVS_AF_XDP_MODE_MAX - 1)) {
+                dev->xdp_mode_in_use = i;
+                VLOG_INFO("%s: %s XDP mode will be in use.",
+                          netdev_get_name(netdev), xdp_modes[i].name);
+                break;
+            }
+        }
+        if (i == OVS_AF_XDP_MODE_MAX) {
+            VLOG_ERR("%s: Failed to detect suitable XDP mode.",
+                     netdev_get_name(netdev));
+            goto err;
+        }
+        qid++;
+    } else {
+        dev->xdp_mode_in_use = dev->xdp_mode;
+    }
+
+    /* Configure remaining queues. */
+    for (; qid < n_rxq; qid++) {
+        if (xsk_configure_queue(dev, ifindex, qid,
+                                dev->xdp_mode_in_use, true)) {
+            VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
+                     netdev_get_name(netdev), qid);
             goto err;
         }
-        dev->xsks[i] = xsk_info;
-        atomic_init(&xsk_info->tx_dropped, 0);
-        xsk_info->outstanding_tx = 0;
-        xsk_info->available_rx = PROD_NUM_DESCS;
     }
 
     n_txq = netdev_n_txq(netdev);
@@ -500,7 +559,7 @@ xsk_destroy_all(struct netdev *netdev)
             if (dev->xsks[i]) {
                 xsk_destroy(dev->xsks[i]);
                 dev->xsks[i] = NULL;
-                VLOG_INFO("Destroyed xsk[%d].", i);
+                VLOG_DBG("%s: Destroyed xsk[%d].", netdev_get_name(netdev), i);
             }
         }
 
@@ -510,7 +569,7 @@ xsk_destroy_all(struct netdev *netdev)
 
     VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
-    xsk_remove_xdp_program(ifindex, dev->xdpmode);
+    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
 
     if (dev->tx_locks) {
         for (i = 0; i < netdev_n_txq(netdev); i++) {
@@ -526,9 +585,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
                         char **errp OVS_UNUSED)
 {
     struct netdev_linux *dev = netdev_linux_cast(netdev);
-    const char *str_xdpmode;
-    int xdpmode, new_n_rxq;
+    const char *str_xdp_mode;
+    enum afxdp_mode xdp_mode;
     bool need_wakeup;
+    int new_n_rxq;
 
     ovs_mutex_lock(&dev->mutex);
     new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
@@ -539,14 +599,17 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
         return EINVAL;
     }
 
-    str_xdpmode = smap_get_def(args, "xdpmode", "skb");
-    if (!strcasecmp(str_xdpmode, "drv")) {
-        xdpmode = XDP_ZEROCOPY;
-    } else if (!strcasecmp(str_xdpmode, "skb")) {
-        xdpmode = XDP_COPY;
-    } else {
-        VLOG_ERR("%s: Incorrect xdpmode (%s).",
-                 netdev_get_name(netdev), str_xdpmode);
+    str_xdp_mode = smap_get_def(args, "xdp-mode", "best-effort");
+    for (xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
+         xdp_mode < OVS_AF_XDP_MODE_MAX;
+         xdp_mode++) {
+        if (!strcasecmp(str_xdp_mode, xdp_modes[xdp_mode].name)) {
+            break;
+        }
+    }
+    if (xdp_mode == OVS_AF_XDP_MODE_MAX) {
+        VLOG_ERR("%s: Incorrect xdp-mode (%s).",
+                 netdev_get_name(netdev), str_xdp_mode);
         ovs_mutex_unlock(&dev->mutex);
         return EINVAL;
     }
@@ -560,10 +623,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
 #endif
 
     if (dev->requested_n_rxq != new_n_rxq
-        || dev->requested_xdpmode != xdpmode
+        || dev->requested_xdp_mode != xdp_mode
         || dev->requested_need_wakeup != need_wakeup) {
         dev->requested_n_rxq = new_n_rxq;
-        dev->requested_xdpmode = xdpmode;
+        dev->requested_xdp_mode = xdp_mode;
         dev->requested_need_wakeup = need_wakeup;
         netdev_request_reconfigure(netdev);
     }
@@ -578,8 +641,9 @@ netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
 
     ovs_mutex_lock(&dev->mutex);
     smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
-    smap_add_format(args, "xdpmode", "%s",
-                    dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb");
+    smap_add_format(args, "xdp-mode", "%s", xdp_modes[dev->xdp_mode].name);
+    smap_add_format(args, "xdp-mode-in-use", "%s",
+                    xdp_modes[dev->xdp_mode_in_use].name);
     smap_add_format(args, "use-need-wakeup", "%s",
                     dev->use_need_wakeup ? "true" : "false");
     ovs_mutex_unlock(&dev->mutex);
@@ -596,7 +660,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
     ovs_mutex_lock(&dev->mutex);
 
     if (netdev->n_rxq == dev->requested_n_rxq
-        && dev->xdpmode == dev->requested_xdpmode
+        && dev->xdp_mode == dev->requested_xdp_mode
         && dev->use_need_wakeup == dev->requested_need_wakeup
         && dev->xsks) {
         goto out;
@@ -607,9 +671,9 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
     netdev->n_rxq = dev->requested_n_rxq;
     netdev->n_txq = netdev->n_rxq;
 
-    dev->xdpmode = dev->requested_xdpmode;
+    dev->xdp_mode = dev->requested_xdp_mode;
     VLOG_INFO("%s: Setting XDP mode to %s.", netdev_get_name(netdev),
-              dev->xdpmode == XDP_ZEROCOPY ? "DRV" : "SKB");
+              xdp_modes[dev->xdp_mode].name);
 
     if (setrlimit(RLIMIT_MEMLOCK, &r)) {
         VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", ovs_strerror(errno));
@@ -618,7 +682,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
 
     err = xsk_configure_all(netdev);
     if (err) {
-        VLOG_ERR("AF_XDP device %s reconfig failed.", netdev_get_name(netdev));
+        VLOG_ERR("%s: AF_XDP device reconfiguration failed.",
+                 netdev_get_name(netdev));
     }
     netdev_change_seq_changed(netdev);
 out:
@@ -638,17 +703,9 @@ netdev_afxdp_get_numa_id(const struct netdev *netdev)
 }
 
 static void
-xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
+xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
 {
-    uint32_t flags;
-
-    flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
-
-    if (xdpmode == XDP_COPY) {
-        flags |= XDP_FLAGS_SKB_MODE;
-    } else if (xdpmode == XDP_ZEROCOPY) {
-        flags |= XDP_FLAGS_DRV_MODE;
-    }
+    uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
 
     bpf_set_link_xdp_fd(ifindex, -1, flags);
 }
@@ -662,7 +719,7 @@ signal_remove_xdp(struct netdev *netdev)
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
 
     VLOG_WARN("Force removing xdp program.");
-    xsk_remove_xdp_program(ifindex, dev->xdpmode);
+    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
 }
 
 static struct dp_packet_afxdp *
@@ -782,7 +839,8 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
 }
 
 static inline int
-kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
+kick_tx(struct xsk_socket_info *xsk_info, enum afxdp_mode mode,
+        bool use_need_wakeup)
 {
     int ret, retries;
     static const int KERNEL_TX_BATCH_SIZE = 16;
@@ -791,11 +849,11 @@ kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
         return 0;
     }
 
-    /* In SKB_MODE packet transmission is synchronous, and the kernel xmits
+    /* In generic mode packet transmission is synchronous, and the kernel xmits
      * only TX_BATCH_SIZE(16) packets for a single sendmsg syscall.
      * So, we have to kick the kernel (n_packets / 16) times to be sure that
      * all packets are transmitted. */
-    retries = (xdpmode == XDP_COPY)
+    retries = (mode == OVS_AF_XDP_MODE_GENERIC)
               ? xsk_info->outstanding_tx / KERNEL_TX_BATCH_SIZE
               : 0;
 kick_retry:
@@ -962,7 +1020,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
                            &orig);
         COVERAGE_INC(afxdp_tx_full);
         afxdp_complete_tx(xsk_info);
-        kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
+        kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
         error = ENOMEM;
         goto out;
     }
@@ -986,7 +1044,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
     xsk_ring_prod__submit(&xsk_info->tx, dp_packet_batch_size(batch));
     xsk_info->outstanding_tx += dp_packet_batch_size(batch);
 
-    ret = kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
+    ret = kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
     if (OVS_UNLIKELY(ret)) {
         VLOG_WARN_RL(&rl, "%s: error sending AF_XDP packet: %s.",
                      netdev_get_name(netdev), ovs_strerror(ret));
@@ -1052,10 +1110,11 @@ netdev_afxdp_construct(struct netdev *netdev)
     /* Queues should not be used before the first reconfiguration. Clearing. */
     netdev->n_rxq = 0;
     netdev->n_txq = 0;
-    dev->xdpmode = 0;
+    dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC;
+    dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC;
 
     dev->requested_n_rxq = NR_QUEUE;
-    dev->requested_xdpmode = XDP_COPY;
+    dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
     dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
 
     dev->xsks = NULL;
diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
index e2f400b72..4fe861d2d 100644
--- a/lib/netdev-afxdp.h
+++ b/lib/netdev-afxdp.h
@@ -25,6 +25,15 @@
 /* These functions are Linux AF_XDP specific, so they should be used directly
  * only by Linux-specific code. */
 
+enum afxdp_mode {
+    OVS_AF_XDP_MODE_UNSPEC,
+    OVS_AF_XDP_MODE_BEST_EFFORT,
+    OVS_AF_XDP_MODE_NATIVE_ZC,
+    OVS_AF_XDP_MODE_NATIVE,
+    OVS_AF_XDP_MODE_GENERIC,
+    OVS_AF_XDP_MODE_MAX,
+};
+
 struct netdev;
 struct xsk_socket_info;
 struct xdp_umem;
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
index c14f2fb81..8873caa9d 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -100,10 +100,14 @@ struct netdev_linux {
     /* AF_XDP information. */
     struct xsk_socket_info **xsks;
     int requested_n_rxq;
-    int xdpmode;                /* AF_XDP running mode: driver or skb. */
-    int requested_xdpmode;
+
+    enum afxdp_mode xdp_mode;               /* Configured AF_XDP mode. */
+    enum afxdp_mode requested_xdp_mode;     /* Requested  AF_XDP mode. */
+    enum afxdp_mode xdp_mode_in_use;        /* Effective  AF_XDP mode. */
+
     bool use_need_wakeup;
     bool requested_need_wakeup;
+
     struct ovs_spin *tx_locks;  /* spin lock array for TX queues. */
 #endif
 };
diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
index f0683c0a9..5ee2ceb1a 100644
--- a/tests/system-afxdp-macros.at
+++ b/tests/system-afxdp-macros.at
@@ -30,10 +30,3 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
      AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
     ]
 )
-
-# OVS_START_L7([namespace], [protocol])
-#
-# AF_XDP doesn't work with TCP over virtual interfaces for now.
-#
-m4_define([OVS_START_L7],
-   [AT_SKIP_IF([:])])
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index efdfb83bb..02a68deb1 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3107,18 +3107,38 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         </p>
       </column>
 
-      <column name="options" key="xdpmode"
+      <column name="options" key="xdp-mode"
               type='{"type": "string",
-                     "enum": ["set", ["skb", "drv"]]}'>
+                     "enum": ["set", ["best-effort", "native-with-zerocopy",
+                                      "native", "generic"]]}'>
         <p>
           Specifies the operational mode of the XDP program.
-          If "drv", the XDP program is loaded into the device driver with
-          zero-copy RX and TX enabled. This mode requires device driver with
-          AF_XDP support and has the best performance.
-          If "skb", the XDP program is using generic XDP mode in kernel with
-          extra data copying between userspace and kernel. No device driver
-          support is needed. Note that this is afxdp netdev type only.
-          Defaults to "skb" mode.
+          <p>
+            In <code>native-with-zerocopy</code> mode the XDP program is loaded
+            into the device driver with zero-copy RX and TX enabled.  This mode
+            requires device driver support and has the best performance because
+            there should be no copying of packets.
+          </p>
+          <p>
+            <code>native</code> is the same as
+            <code>native-with-zerocopy</code>, but without zero-copy
+            capability.  This requires at least one copy between kernel and the
+            userspace. This mode also requires support from device driver.
+          </p>
+          <p>
+            In <code>generic</code> case the XDP program in kernel works after
+            skb allocation on early stages of packet processing inside the
+            network stack.  This mode doesn't require driver support, but has
+            much lower performance.
+          </p>
+          <p>
+            <code>best-effort</code> tries to detect and choose the best
+            (fastest) from the available modes for current interface.
+          </p>
+          <p>
+            Note that this option is specific to netdev-afxdp.
+            Defaults to <code>best-effort</code> mode.
+          </p>
         </p>
       </column>
 
-- 
2.17.1



More information about the dev mailing list