[ovs-dev] [PATCHv5] netdev-afxdp: Add need_wakeup supprt.

William Tu u9012063 at gmail.com
Sat Oct 19 00:31:30 UTC 2019


On Fri, Oct 18, 2019 at 05:14:46PM +0200, Ilya Maximets wrote:
> On 26.09.2019 21:29, William Tu wrote:
> >The patch adds support for using need_wakeup flag in AF_XDP rings.
> >A new option, use_need_wakeup, is added. When this option is used,
> >it means that OVS has to explicitly wake up the kernel RX, using poll()
> >syscall and wake up TX, using sendto() syscall. This feature improves
> >the performance by avoiding unnecessary sendto syscalls for TX.
> >For RX, instead of kernel always busy-spinning on fille queue, OVS wakes
> >up the kernel RX processing when fill queue is replenished.
> >
> >The need_wakeup feature is merged into Linux kernel bpf-next tee with commit
> >77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings") and
> >OVS enables it by default. Running the feature before this version causes
> >xsk bind fails, please use options:use_need_wakeup=false to disable it.
> >If users enable it but runs in an older version of libbpf, then the
> >need_wakeup feature has no effect, and a warning message is logged.
> >
> >For virtual interface, it's better set use_need_wakeup=false, since
> >the virtual device's AF_XDP xmit is synchronous: the sendto syscall
> >enters kernel and process the TX packet on tx queue directly.
> >
> >On Intel Xeon E5-2620 v3 2.4GHz system, performance of physical port
> >to physical port improves from 6.1Mpps to 7.3Mpps.
> >
> >Suggested-by: Ilya Maximets <i.maximets at ovn.org>
> >Signed-off-by: William Tu <u9012063 at gmail.com>
> >---
> >v5:
> >- address feedback from Ilya
> >   - update commit msg about kernel version using bpf-next tree
> >   - remove __func__, and use DBG in log_
> >   - fix alignment
> >   - remove the kernel version requirement for need_wakeup, we can
> >     wait until tag is available
> >
> >v4:
> >- move use_need_wakeup check inside xsk_rx_wakeup_if_needed
> >
> >v3:
> >- add warning when user enables it but libbpf not support it
> >- revise documentation
> >
> >v2:
> >- address feedbacks from Ilya and Eelco
> >- add options:use_need_wakeup, default to true
> >- remove poll timeout=1sec, make poll() return immediately
> >- naming change: rename to xsk_rx_wakeup_if_needing
> >- fix indents and return value for errno
> >---
> >  Documentation/intro/install/afxdp.rst |  15 ++++-
> >  acinclude.m4                          |   5 ++
> >  lib/netdev-afxdp.c                    | 108 ++++++++++++++++++++++++++++++----
> >  lib/netdev-linux-private.h            |   2 +
> >  vswitchd/vswitch.xml                  |  12 ++++
> >  5 files changed, 126 insertions(+), 16 deletions(-)
> >
> >diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
> >index 820e9d993d8f..545516b2bbec 100644
> >--- a/Documentation/intro/install/afxdp.rst
> >+++ b/Documentation/intro/install/afxdp.rst
> >@@ -176,9 +176,18 @@ in :doc:`general`::
> >    ovs-vswitchd ...
> >    ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> >-Make sure your device driver support AF_XDP, and to use 1 PMD (on core 4)
> >-on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
> >-pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
> >+Make sure your device driver support AF_XDP, netdev-afxdp supports
> >+the following additional options (see man ovs-vswitchd.conf.db for
> >+more details):
> >+
> >+ * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
> >+
> >+ * **use_need_wakeup**: disable by setting to "false", otherwise default
> >+   is "true"
> >+
> >+For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
> >+configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
> >+The **xdpmode** can be "drv" or "skb"::
> >    ethtool -L enp2s0 combined 1
> >    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> >diff --git a/acinclude.m4 b/acinclude.m4
> >index f0e38898b17a..f1ecb86adf7c 100644
> >--- a/acinclude.m4
> >+++ b/acinclude.m4
> >@@ -276,6 +276,11 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> >                [Define to 1 if AF_XDP support is available and enabled.])
> >      LIBBPF_LDADD=" -lbpf -lelf"
> >      AC_SUBST([LIBBPF_LDADD])
> >+
> >+    AC_CHECK_DECL([xsk_ring_prod__needs_wakeup], [
> >+      AC_DEFINE([HAVE_XDP_NEED_WAKEUP], [1],
> >+        [XDP need wakeup support detected in xsk.h.])
> >+    ], [], [[#include <bpf/xsk.h>]])
> >    fi
> >    AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
> >  ])
> >diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> >index 6e01803272aa..fee9413bfd0a 100644
> >--- a/lib/netdev-afxdp.c
> >+++ b/lib/netdev-afxdp.c
> >@@ -26,6 +26,7 @@
> >  #include <linux/rtnetlink.h>
> >  #include <linux/if_xdp.h>
> >  #include <net/if.h>
> >+#include <poll.h>
> >  #include <stdlib.h>
> >  #include <sys/resource.h>
> >  #include <sys/socket.h>
> >@@ -82,7 +83,7 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
> >  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> >  static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
> >-                                             int mode);
> >+                                             int mode, bool use_need_wakeup);
> >  static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
> >  static void xsk_destroy(struct xsk_socket_info *xsk);
> >  static int xsk_configure_all(struct netdev *netdev);
> >@@ -117,6 +118,54 @@ struct xsk_socket_info {
> >      atomic_uint64_t tx_dropped;
> >  };
> >+#ifdef HAVE_XDP_NEED_WAKEUP
> >+static inline void
> >+xsk_rx_wakeup_if_needed(struct xsk_umem_info *umem,
> >+                        struct netdev *netdev, int fd)
> >+{
> >+    struct netdev_linux *dev = netdev_linux_cast(netdev);
> >+    struct pollfd pfd;
> >+    int ret;
> >+
> >+    if (!dev->use_need_wakeup) {
> >+        return;
> >+    }
> >+
> >+    if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
> >+        pfd.fd = fd;
> >+        pfd.events = POLLIN;
> >+
> >+        ret = poll(&pfd, 1, 0);
> >+        if (OVS_UNLIKELY(ret < 0)) {
> >+            VLOG_WARN_RL(&rl, "%s: error polling rx fd: %s.",
> >+                         netdev_get_name(netdev),
> >+                         ovs_strerror(errno));
> >+        }
> >+    }
> >+}
> >+
> >+static inline bool
> >+xsk_tx_need_wakeup(struct xsk_socket_info *xsk_info)
> >+{
> >+    return xsk_ring_prod__needs_wakeup(&xsk_info->tx);
> >+}
> >+
> >+#else /* !HAVE_XDP_NEED_WAKEUP */
> >+static inline void
> >+xsk_rx_wakeup_if_needed(struct xsk_umem_info *umem OVS_UNUSED,
> >+                        struct netdev *netdev OVS_UNUSED,
> >+                        int fd OVS_UNUSED)
> >+{
> >+    /* Nothing. */
> >+}
> >+
> >+static inline bool
> >+xsk_tx_need_wakeup(struct xsk_socket_info *xsk_info OVS_UNUSED)
> >+{
> >+    return true;
> >+}
> >+#endif /* HAVE_XDP_NEED_WAKEUP */
> >+
> >  static void
> >  netdev_afxdp_cleanup_unused_pool(struct unused_pool *pool)
> >  {
> >@@ -234,7 +283,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
> >  static struct xsk_socket_info *
> >  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> >-                     uint32_t queue_id, int xdpmode)
> >+                     uint32_t queue_id, int xdpmode, bool use_need_wakeup)
> >  {
> >      struct xsk_socket_config cfg;
> >      struct xsk_socket_info *xsk;
> >@@ -257,6 +306,12 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> >          cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> >      }
> >+    if (use_need_wakeup) {
> >+#ifdef HAVE_XDP_NEED_WAKEUP
> >+        cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
> >+#endif
> >+    }
> >+
> >      if (if_indextoname(ifindex, devname) == NULL) {
> >          VLOG_ERR("ifindex %d to devname failed (%s)",
> >                   ifindex, ovs_strerror(errno));
> >@@ -267,9 +322,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> >      ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
> >                               &xsk->rx, &xsk->tx, &cfg);
> >      if (ret) {
> >-        VLOG_ERR("xsk_socket__create failed (%s) mode: %s qid: %d",
> >+        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
> >+                 "use_need_wakeup: %s qid: %d",
> >                   ovs_strerror(errno),
> >                   xdpmode == XDP_COPY ? "SKB": "DRV",
> >+                 use_need_wakeup ? "true" : "false",
> >                   queue_id);
> >          free(xsk);
> >          return NULL;
> >@@ -311,7 +368,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> >  }
> >  static struct xsk_socket_info *
> >-xsk_configure(int ifindex, int xdp_queue_id, int xdpmode)
> >+xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> >+              bool use_need_wakeup)
> >  {
> >      struct xsk_socket_info *xsk;
> >      struct xsk_umem_info *umem;
> >@@ -334,7 +392,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode)
> >      VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
> >-    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode);
> >+    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
> >+                               use_need_wakeup);
> >      if (!xsk) {
> >          /* Clean up umem and xpacket pool. */
> >          if (xsk_umem__delete(umem->umem)) {
> >@@ -365,9 +424,12 @@ xsk_configure_all(struct netdev *netdev)
> >      /* Configure each queue. */
> >      for (i = 0; i < n_rxq; i++) {
> >-        VLOG_INFO("%s: configure queue %d mode %s", __func__, i,
> >-                  dev->xdpmode == XDP_COPY ? "SKB" : "DRV");
> >-        xsk_info = xsk_configure(ifindex, i, dev->xdpmode);
> >+        VLOG_DBG("%s: configure queue %d mode %s use_need_wakeup %s.",
> >+                 netdev_get_name(netdev), i,
> >+                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
> >+                 dev->use_need_wakeup ? "true" : "false");
> >+        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
> >+                                 dev->use_need_wakeup);
> >          if (!xsk_info) {
> >              VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
> >              dev->xsks[i] = NULL;
> >@@ -459,6 +521,7 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> >      struct netdev_linux *dev = netdev_linux_cast(netdev);
> >      const char *str_xdpmode;
> >      int xdpmode, new_n_rxq;
> >+    bool need_wakeup;
> >      ovs_mutex_lock(&dev->mutex);
> >      new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> >@@ -481,10 +544,19 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> >          return EINVAL;
> >      }
> >+    need_wakeup = smap_get_bool(args, "use_need_wakeup", true);
> >+    if (need_wakeup) {
> >+#ifndef HAVE_XDP_NEED_WAKEUP
> >+        VLOG_WARN("XDP need_wakeup is not supported in libbpf.");
> 
> This warning breaks all the tests on system without need_wakeup support.
> It seems better to enable this feature by default only if supported by libbpf.

Thank you. I think that's a better idea.

> 
> One more thing that we need to actually drop need_wakeup to false here,
> otherwise get_config() and some log messages will report wrong information.
> 
> Suggesting following incremental:
> 
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index a516cc727..60259bdd3 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -68,6 +68,12 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>  #define PROD_NUM_DESCS      XSK_RING_PROD__DEFAULT_NUM_DESCS
>  #define CONS_NUM_DESCS      XSK_RING_CONS__DEFAULT_NUM_DESCS
> +#ifdef HAVE_XDP_NEED_WAKEUP
> +#define NEED_WAKEUP_DEFAULT true
> +#else
> +#define NEED_WAKEUP_DEFAULT false
> +#endif
> +
>  /* The worst case is all 4 queues TX/CQ/RX/FILL are full + some packets
>   * still on processing in threads. Number of packets currently in OVS
>   * processing is hard to estimate because it depends on number of ports.
> @@ -307,11 +313,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>          cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
>      }
> -    if (use_need_wakeup) {
>  #ifdef HAVE_XDP_NEED_WAKEUP
> +    if (use_need_wakeup) {
>          cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
> -#endif
>      }
> +#endif
>      if (if_indextoname(ifindex, devname) == NULL) {
>          VLOG_ERR("ifindex %d to devname failed (%s)",
> @@ -545,12 +551,13 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
>          return EINVAL;
>      }
> -    need_wakeup = smap_get_bool(args, "use_need_wakeup", true);
> -    if (need_wakeup) {
> +    need_wakeup = smap_get_bool(args, "use_need_wakeup", NEED_WAKEUP_DEFAULT);
>  #ifndef HAVE_XDP_NEED_WAKEUP
> +    if (need_wakeup) {
>          VLOG_WARN("XDP need_wakeup is not supported in libbpf.");
> -#endif
> +        need_wakeup = false;
>      }
> +#endif
>      if (dev->requested_n_rxq != new_n_rxq
>          || dev->requested_xdpmode != xdpmode
> @@ -1049,7 +1056,7 @@ netdev_afxdp_construct(struct netdev *netdev)
>      dev->requested_n_rxq = NR_QUEUE;
>      dev->requested_xdpmode = XDP_COPY;
> -    dev->requested_need_wakeup = true;
> +    dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
>      dev->xsks = NULL;
>      dev->tx_locks = NULL;
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 24d061891..decccfe84 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3130,7 +3130,7 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>            syscall and wakes up TX, using sendto() syscall. For physical
>            devices, this feature improves the performance by avoiding
>            unnecessary sendto syscalls.
> -          Defaults is true.
> +          Defaults to true if supported by libbpf.
>          </p>
>        </column>
> ---
> 
> Some updates about default value also will be needed in commit message and,
> probably, in afxdp docs.

OK. the diff works fine.
I will re-do the performance number and send out next version.

Regards,
William


> 
> Best regards, Ilya Maximets.


More information about the dev mailing list