[ovs-dev] [PATCHv3] netdev-afxdp: Enable loading XDP program.

William Tu u9012063 at gmail.com
Wed Nov 6 23:24:49 UTC 2019


On Tue, Nov 05, 2019 at 05:06:10PM +0100, Eelco Chaudron wrote:
> Hi William,
> 
> See some comments inline.
> 
> //Eelco
> 
> 
Hi Eelco,

Thanks for your feedback.

> On 1 Nov 2019, at 21:14, William Tu wrote:
> 
> >Now netdev-afxdp always forwards all packets to userspace because
> >it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'.
> >There are some cases when users want to keep packets in kernel instead
> >of sending to userspace, for example, management traffic such as SSH
> >should be processed in kernel.
> >
> >The patch enables loading the user-provide XDP program by doing
> >  $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj>
> >
> >So users can implement their filtering logic or traffic steering idea
> >in their XDP program, and rest of the traffic passes to AF_XDP socket
> >handled by OVS.
> >
> >Signed-off-by: William Tu <u9012063 at gmail.com>
> >---
> >v3:
> >    Feedbacks from Eelco.
> >    - keep using xdpobj not xdp-obj (because we alread use xdpmode)
> >      or we change both to xdp-obj and xdp-mode?
> >    - log a info message when using external program for better debugging
> >    - combine some failure messages
> >    - update doc
> >    NEW:
> >    - add options:xdpobj=__default__, to set back to libbpf default prog
> >    - Tested-at:
> >https://travis-ci.org/williamtu/ovs-travis/builds/606153231
> >
> >v2:
> >    A couple fixes and remove RFC
> >---
> > Documentation/intro/install/afxdp.rst |  49 +++++++++++
> > lib/netdev-afxdp.c                    | 148
> >+++++++++++++++++++++++++++++-----
> > lib/netdev-linux-private.h            |   2 +
> > 3 files changed, 181 insertions(+), 18 deletions(-)
> >
> >diff --git a/Documentation/intro/install/afxdp.rst
> >b/Documentation/intro/install/afxdp.rst
> >index a136db0c950a..89bc97da9548 100644
> >--- a/Documentation/intro/install/afxdp.rst
> >+++ b/Documentation/intro/install/afxdp.rst
> >@@ -273,6 +273,55 @@ Or, use OVS pmd tool::
> >   ovs-appctl dpif-netdev/pmd-stats-show
> >
> >
> >+Loading Custom XDP Program
> >+--------------------------
> >+By defailt, netdev-afxdp always forwards all packets to userspace because
> >+it is using libbpf's default XDP program. There are some cases when users
> >+want to keep packets in kernel instead of sending to userspace, for
> >example,
> >+management traffic such as SSH should be processed in kernel. This can be
> >+done by loading the user-provided XDP program::
> >+
> >+  ovs-vsctl -- set int afxdp-p0 options:xdpobj=<path/to/xdp/obj>
> >+
> >+So users can implement their filtering logic or traffic steering idea
> >+in their XDP program, and rest of the traffic passes to AF_XDP socket
> >+handled by OVS. To set it back to default, use::
> >+
> >+  ovs-vsctl -- set int afxdp-p0 options:xdpobj=__default__
> >+
> >+Below is a sample C program compiled under kernel's samples/bpf/.
> >+
> >+.. code-block:: c
> >+
> >+  #include <uapi/linux/bpf.h>
> >+  #include "bpf_helpers.h"
> >+
> 
> You mentioned you would add the below but did not, any reason, or just
> forgot about it?
> 
> #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0)
> 
> /* Kernel version before 5.3 needed an additional map */
> struct bpf_map_def SEC("maps") qidconf_map = {
>         .type = BPF_MAP_TYPE_ARRAY,
>         .key_size = sizeof(int),
>         .value_size = sizeof(int),
>         .max_entries = 64,
> };
> #endif

Yes, I forgot...

> 
> >+  /* OVS will associate map 'xsks_map' to xsk socket. */
> >+  struct bpf_map_def SEC("maps") xsks_map = {
> >+      .type = BPF_MAP_TYPE_XSKMAP,
> >+      .key_size = sizeof(int),
> >+      .value_size = sizeof(int),
> >+      .max_entries = 32,
> >+  };
> >+
> >+  SEC("xdp_sock")
> >+  int xdp_sock_prog(struct xdp_md *ctx)
> >+  {
> >+      int index = ctx->rx_queue_index;
> >+
> >+      /* Customized by user.
> >+       * For example
> >+       * 1) filter out all SSH traffic and return XDP_PASS
> >+       *    for kernel to process.
> >+       * 2) Drop unwanted packet by returning XDP_DROP.
> >+       */
> >+
> >+      /* Rest of packets goes to AF_XDP. */
> >+      return bpf_redirect_map(&xsks_map, index, 0);
> >+  }
> >+  char _license[] SEC("license") = "GPL";
> >+
> >+
> > Example Script
> > --------------
> >
> >diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> >index af654d498a88..be74527946a4 100644
> >--- a/lib/netdev-afxdp.c
> >+++ b/lib/netdev-afxdp.c
> >@@ -21,6 +21,7 @@
> > #include "netdev-afxdp.h"
> > #include "netdev-afxdp-pool.h"
> >
> >+#include <bpf/bpf.h>
> > #include <errno.h>
> > #include <inttypes.h>
> > #include <linux/rtnetlink.h>
> >@@ -30,6 +31,7 @@
> > #include <stdlib.h>
> > #include <sys/resource.h>
> > #include <sys/socket.h>
> >+#include <sys/stat.h>
> > #include <sys/types.h>
> > #include <unistd.h>
> >
> >@@ -88,8 +90,11 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
> >
> > #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> >
> >+#define LIBBPF_XDP_PROGRAM "__default__"
> >+
> > static struct xsk_socket_info *xsk_configure(int ifindex, int
> >xdp_queue_id,
> >-                                             int mode, bool
> >use_need_wakeup);
> >+                                             int mode, bool
> >use_need_wakeup,
> >+                                             bool use_default_xdp);
> > static void dest(uint32_t ifindex, int xdpmode);
> > static void xsk_destroy(struct xsk_socket_info *xsk);
> > static int xsk_configure_all(struct netdev *netdev);
> >@@ -213,6 +218,50 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
> >     ovs_mutex_unlock(&unused_pools_mutex);
> > }
> >
> >+static int
> >+xsk_load_prog(const char *path, struct bpf_object **obj,
> >+              int *prog_fd)
> >+{
> >+    struct bpf_prog_load_attr attr = {
> >+        .prog_type = BPF_PROG_TYPE_XDP,
> >+        .file = path,
> >+    };
> >+
> >+    if (bpf_prog_load_xattr(&attr, obj, prog_fd)) {
> >+        VLOG_ERR("Can't load XDP program at '%s'", path);
> >+        return EINVAL;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> >+static int
> >+xsk_configure_map(struct bpf_object *obj, struct xsk_socket_info *xsk,
> >+                  int queue_id)
> >+{
> >+    struct bpf_map *map;
> >+    int xsks_map_fd;
> >+    int xsk_fd;
> >+    int ret;
> >+
> >+    map = bpf_object__find_map_by_name(obj, "xsks_map");
> >+    if (!map) {
> >+        VLOG_WARN("XDP map 'xsks_map' not found");
> >+        return EINVAL;
> >+    }
> >+
> >+    xsks_map_fd = bpf_map__fd(map);
> >+    xsk_fd = xsk_socket__fd(xsk->xsk);
> >+
> >+    ret = bpf_map_update_elem(xsks_map_fd, &queue_id, &xsk_fd, 0);
> >+    if (ret) {
> >+        VLOG_WARN("Can't set XSK map, map fd: %d", xsks_map_fd);
> 
> I think this VLOG and the one above should be set to ERR, because when this
> happens we end up in the re-configure loop, and no idea why (until you
> change the log level).

OK, will do it.

> 
> >+        return EINVAL;
> >+    }
> >+
> >+    return 0;
> >+}
> >+
> > static struct xsk_umem_info *
> > xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
> > {
> >@@ -290,7 +339,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int
> >xdpmode)
> >
> > static struct xsk_socket_info *
> > xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> >-                     uint32_t queue_id, int xdpmode, bool
> >use_need_wakeup)
> >+                     uint32_t queue_id, int xdpmode, bool
> >use_need_wakeup,
> >+                     bool use_default_xdp)
> > {
> >     struct xsk_socket_config cfg;
> >     struct xsk_socket_info *xsk;
> >@@ -319,6 +369,10 @@ xsk_configure_socket(struct xsk_umem_info *umem,
> >uint32_t ifindex,
> >     }
> > #endif
> >
> >+    if (!use_default_xdp) {
> >+        cfg.libbpf_flags |= XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
> 
> Instead of using this flag, and the bpf_map_update_elem() function above,
> would it not be easier just let the API take care of this?
> See earlier comment,
> xsk_socket__create()->xsk_setup_xdp_prog()->xsk_lookup_bpf_maps(), however I
> won’t mind keeping it as is.

I see your porint now.
So I can load custom XDP program, then create xsk.
In this case, I don't have to set this flag, and also no need to
call bpf_map_update_elem(). I will fix it.

> 
> Maybe this also takes care of cleaning up the program on unload/cleanup?
> This is a link to an example:
> 
> https://github.com/chaudron/xdp-tutorial/blob/master/advanced03-AF_XDP/af_xdp_user.c
> 

Yes, that's easier.

> >+    }
> >+
> >     if (if_indextoname(ifindex, devname) == NULL) {
> >         VLOG_ERR("ifindex %d to devname failed (%s)",
> >                  ifindex, ovs_strerror(errno));
> >@@ -339,17 +393,19 @@ xsk_configure_socket(struct xsk_umem_info *umem,
> >uint32_t ifindex,
> >         return NULL;
> >     }
> >
> >-    /* Make sure the built-in AF_XDP program is loaded. */
> >-    ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
> >-    if (ret || !prog_id) {
> >-        if (ret) {
> >-            VLOG_ERR("Get XDP prog ID failed (%s)", ovs_strerror(errno));
> >-        } else {
> >-            VLOG_ERR("No XDP program is loaded at ifindex %d", ifindex);
> >+    if (use_default_xdp) {
> >+        /* Make sure the built-in AF_XDP program is loaded. */
> >+        ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
> >+        if (ret || !prog_id) {
> >+            if (ret) {
> >+                VLOG_ERR("Get XDP prog ID failed (%s)",
> >ovs_strerror(errno));
> >+            } else {
> >+                VLOG_ERR("No XDP program is loaded at ifindex %d",
> >ifindex);
> >+            }
> >+            xsk_socket__delete(xsk->xsk);
> >+            free(xsk);
> >+            return NULL;
> >         }
> >-        xsk_socket__delete(xsk->xsk);
> >-        free(xsk);
> >-        return NULL;
> >     }
> >
> >     while (!xsk_ring_prod__reserve(&xsk->umem->fq,
> >@@ -376,7 +432,7 @@ xsk_configure_socket(struct xsk_umem_info *umem,
> >uint32_t ifindex,
> >
> > static struct xsk_socket_info *
> > xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> >-              bool use_need_wakeup)
> >+              bool use_need_wakeup, bool use_default_xdp)
> > {
> >     struct xsk_socket_info *xsk;
> >     struct xsk_umem_info *umem;
> >@@ -400,7 +456,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int
> >xdpmode,
> >     VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
> >
> >     xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
> >-                               use_need_wakeup);
> >+                               use_need_wakeup, use_default_xdp);
> >     if (!xsk) {
> >         /* Clean up umem and xpacket pool. */
> >         if (xsk_umem__delete(umem->umem)) {
> >@@ -420,6 +476,10 @@ xsk_configure_all(struct netdev *netdev)
> >     struct netdev_linux *dev = netdev_linux_cast(netdev);
> >     struct xsk_socket_info *xsk_info;
> >     int i, ifindex, n_rxq, n_txq;
> >+    bool use_default_xdp;
> >+    struct bpf_object *obj;
> >+    int prog_fd = 0;
> >+    int ret;
> >
> >     ifindex = linux_get_ifindex(netdev_get_name(netdev));
> >
> >@@ -428,6 +488,7 @@ xsk_configure_all(struct netdev *netdev)
> >
> >     n_rxq = netdev_n_rxq(netdev);
> >     dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
> >+    use_default_xdp = dev->xdpobj ? false : true;
> >
> >     /* Configure each queue. */
> >     for (i = 0; i < n_rxq; i++) {
> >@@ -436,7 +497,7 @@ xsk_configure_all(struct netdev *netdev)
> >                  dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
> >                  dev->use_need_wakeup ? "true" : "false");
> >         xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
> >-                                 dev->use_need_wakeup);
> >+                                 dev->use_need_wakeup, use_default_xdp);
> >         if (!xsk_info) {
> >             VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
> >             dev->xsks[i] = NULL;
> >@@ -446,6 +507,28 @@ xsk_configure_all(struct netdev *netdev)
> >         atomic_init(&xsk_info->tx_dropped, 0);
> >         xsk_info->outstanding_tx = 0;
> >         xsk_info->available_rx = PROD_NUM_DESCS;
> >+
> >+        if (dev->xdpobj) {
> >+            if (prog_fd == 0) {
> >+                /* XDP program is per-netdev, so all queues share
> >+                   the same XDP program. */
> >+                ret = xsk_load_prog(dev->xdpobj, &obj, &prog_fd);
> >+                if (ret) {
> >+                    goto err;
> >+                }
> >+            }
> >+            ret = xsk_configure_map(obj, xsk_info, i);
> >+            if (ret) {
> >+                goto err;
> >+            }
> >+
> >+            /* Clear existing XDP program and its associated map. */
> >+            bpf_set_link_xdp_fd(ifindex, -1, dev->xdpmode);
> >+
> >+            bpf_set_link_xdp_fd(ifindex, prog_fd, dev->xdpmode);
> >+            VLOG_INFO("%s: Load custom XDP program at %s.",
> >+                      netdev_get_name(netdev), dev->xdpobj);
> >+        }
> >     }
> >
> >     n_txq = netdev_n_txq(netdev);
> >@@ -519,6 +602,8 @@ xsk_destroy_all(struct netdev *netdev)
> >         free(dev->tx_locks);
> >         dev->tx_locks = NULL;
> >     }
> >+    free(CONST_CAST(char *, dev->xdpobj));
> >+    dev->xdpobj = NULL;
> > }
> 
> We need to make sure we close all the open BPF programs or else they will
> all remain in memory.
> I did a lot of tries when debugging and this is what I see with bpftool:
> 
> $ bpftool prog
> 2023: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:04-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 2027: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:04-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 2031: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:04-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 2035: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:04-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 2039: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:05-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 2043: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:05-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 2047: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:05-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 2051: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-05T10:54:05-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> ...
> ...
> 
> Guess same thing for the maps:
> 
> $ bpftool map
> 1094: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 1097: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 1098: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 63  memlock 4096B
> ...
> ...

Thanks for testing it.
Let me try to use the way you suggest and make sure maps
are cleaned up!

Regards,
William



More information about the dev mailing list