[ovs-dev] [PATCHv3] netdev-afxdp: Enable loading XDP program.
William Tu
u9012063 at gmail.com
Wed Nov 6 23:24:49 UTC 2019
On Tue, Nov 05, 2019 at 05:06:10PM +0100, Eelco Chaudron wrote:
> Hi William,
>
> See some comments inline.
>
> //Eelco
>
>
Hi Eelco,
Thanks for your feedback.
> On 1 Nov 2019, at 21:14, William Tu wrote:
>
> >Now netdev-afxdp always forwards all packets to userspace because
> >it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'.
> >There are some cases when users want to keep packets in kernel instead
> >of sending to userspace, for example, management traffic such as SSH
> >should be processed in kernel.
> >
> >The patch enables loading the user-provide XDP program by doing
> > $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj>
> >
> >So users can implement their filtering logic or traffic steering idea
> >in their XDP program, and rest of the traffic passes to AF_XDP socket
> >handled by OVS.
> >
> >Signed-off-by: William Tu <u9012063 at gmail.com>
> >---
> >v3:
> > Feedbacks from Eelco.
> > - keep using xdpobj not xdp-obj (because we alread use xdpmode)
> > or we change both to xdp-obj and xdp-mode?
> > - log a info message when using external program for better debugging
> > - combine some failure messages
> > - update doc
> > NEW:
> > - add options:xdpobj=__default__, to set back to libbpf default prog
> > - Tested-at:
> >https://travis-ci.org/williamtu/ovs-travis/builds/606153231
> >
> >v2:
> > A couple fixes and remove RFC
> >---
> > Documentation/intro/install/afxdp.rst | 49 +++++++++++
> > lib/netdev-afxdp.c | 148
> >+++++++++++++++++++++++++++++-----
> > lib/netdev-linux-private.h | 2 +
> > 3 files changed, 181 insertions(+), 18 deletions(-)
> >
> >diff --git a/Documentation/intro/install/afxdp.rst
> >b/Documentation/intro/install/afxdp.rst
> >index a136db0c950a..89bc97da9548 100644
> >--- a/Documentation/intro/install/afxdp.rst
> >+++ b/Documentation/intro/install/afxdp.rst
> >@@ -273,6 +273,55 @@ Or, use OVS pmd tool::
> > ovs-appctl dpif-netdev/pmd-stats-show
> >
> >
> >+Loading Custom XDP Program
> >+--------------------------
> >+By defailt, netdev-afxdp always forwards all packets to userspace because
> >+it is using libbpf's default XDP program. There are some cases when users
> >+want to keep packets in kernel instead of sending to userspace, for
> >example,
> >+management traffic such as SSH should be processed in kernel. This can be
> >+done by loading the user-provided XDP program::
> >+
> >+ ovs-vsctl -- set int afxdp-p0 options:xdpobj=<path/to/xdp/obj>
> >+
> >+So users can implement their filtering logic or traffic steering idea
> >+in their XDP program, and rest of the traffic passes to AF_XDP socket
> >+handled by OVS. To set it back to default, use::
> >+
> >+ ovs-vsctl -- set int afxdp-p0 options:xdpobj=__default__
> >+
> >+Below is a sample C program compiled under kernel's samples/bpf/.
> >+
> >+.. code-block:: c
> >+
> >+ #include <uapi/linux/bpf.h>
> >+ #include "bpf_helpers.h"
> >+
>
> You mentioned you would add the below but did not, any reason, or just
> forgot about it?
>
> #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0)
>
> /* Kernel version before 5.3 needed an additional map */
> struct bpf_map_def SEC("maps") qidconf_map = {
> .type = BPF_MAP_TYPE_ARRAY,
> .key_size = sizeof(int),
> .value_size = sizeof(int),
> .max_entries = 64,
> };
> #endif
Yes, I forgot...
>
> >+ /* OVS will associate map 'xsks_map' to xsk socket. */
> >+ struct bpf_map_def SEC("maps") xsks_map = {
> >+ .type = BPF_MAP_TYPE_XSKMAP,
> >+ .key_size = sizeof(int),
> >+ .value_size = sizeof(int),
> >+ .max_entries = 32,
> >+ };
> >+
> >+ SEC("xdp_sock")
> >+ int xdp_sock_prog(struct xdp_md *ctx)
> >+ {
> >+ int index = ctx->rx_queue_index;
> >+
> >+ /* Customized by user.
> >+ * For example
> >+ * 1) filter out all SSH traffic and return XDP_PASS
> >+ * for kernel to process.
> >+ * 2) Drop unwanted packet by returning XDP_DROP.
> >+ */
> >+
> >+ /* Rest of packets goes to AF_XDP. */
> >+ return bpf_redirect_map(&xsks_map, index, 0);
> >+ }
> >+ char _license[] SEC("license") = "GPL";
> >+
> >+
> > Example Script
> > --------------
> >
> >diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> >index af654d498a88..be74527946a4 100644
> >--- a/lib/netdev-afxdp.c
> >+++ b/lib/netdev-afxdp.c
> >@@ -21,6 +21,7 @@
> > #include "netdev-afxdp.h"
> > #include "netdev-afxdp-pool.h"
> >
> >+#include <bpf/bpf.h>
> > #include <errno.h>
> > #include <inttypes.h>
> > #include <linux/rtnetlink.h>
> >@@ -30,6 +31,7 @@
> > #include <stdlib.h>
> > #include <sys/resource.h>
> > #include <sys/socket.h>
> >+#include <sys/stat.h>
> > #include <sys/types.h>
> > #include <unistd.h>
> >
> >@@ -88,8 +90,11 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
> >
> > #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> >
> >+#define LIBBPF_XDP_PROGRAM "__default__"
> >+
> > static struct xsk_socket_info *xsk_configure(int ifindex, int
> >xdp_queue_id,
> >- int mode, bool
> >use_need_wakeup);
> >+ int mode, bool
> >use_need_wakeup,
> >+ bool use_default_xdp);
> > static void dest(uint32_t ifindex, int xdpmode);
> > static void xsk_destroy(struct xsk_socket_info *xsk);
> > static int xsk_configure_all(struct netdev *netdev);
> >@@ -213,6 +218,50 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
> > ovs_mutex_unlock(&unused_pools_mutex);
> > }
> >
> >+static int
> >+xsk_load_prog(const char *path, struct bpf_object **obj,
> >+ int *prog_fd)
> >+{
> >+ struct bpf_prog_load_attr attr = {
> >+ .prog_type = BPF_PROG_TYPE_XDP,
> >+ .file = path,
> >+ };
> >+
> >+ if (bpf_prog_load_xattr(&attr, obj, prog_fd)) {
> >+ VLOG_ERR("Can't load XDP program at '%s'", path);
> >+ return EINVAL;
> >+ }
> >+
> >+ return 0;
> >+}
> >+
> >+static int
> >+xsk_configure_map(struct bpf_object *obj, struct xsk_socket_info *xsk,
> >+ int queue_id)
> >+{
> >+ struct bpf_map *map;
> >+ int xsks_map_fd;
> >+ int xsk_fd;
> >+ int ret;
> >+
> >+ map = bpf_object__find_map_by_name(obj, "xsks_map");
> >+ if (!map) {
> >+ VLOG_WARN("XDP map 'xsks_map' not found");
> >+ return EINVAL;
> >+ }
> >+
> >+ xsks_map_fd = bpf_map__fd(map);
> >+ xsk_fd = xsk_socket__fd(xsk->xsk);
> >+
> >+ ret = bpf_map_update_elem(xsks_map_fd, &queue_id, &xsk_fd, 0);
> >+ if (ret) {
> >+ VLOG_WARN("Can't set XSK map, map fd: %d", xsks_map_fd);
>
> I think this VLOG and the one above should be set to ERR, because when this
> happens we end up in the re-configure loop, and no idea why (until you
> change the log level).
OK, will do it.
>
> >+ return EINVAL;
> >+ }
> >+
> >+ return 0;
> >+}
> >+
> > static struct xsk_umem_info *
> > xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
> > {
> >@@ -290,7 +339,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int
> >xdpmode)
> >
> > static struct xsk_socket_info *
> > xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> >- uint32_t queue_id, int xdpmode, bool
> >use_need_wakeup)
> >+ uint32_t queue_id, int xdpmode, bool
> >use_need_wakeup,
> >+ bool use_default_xdp)
> > {
> > struct xsk_socket_config cfg;
> > struct xsk_socket_info *xsk;
> >@@ -319,6 +369,10 @@ xsk_configure_socket(struct xsk_umem_info *umem,
> >uint32_t ifindex,
> > }
> > #endif
> >
> >+ if (!use_default_xdp) {
> >+ cfg.libbpf_flags |= XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
>
> Instead of using this flag, and the bpf_map_update_elem() function above,
> would it not be easier just let the API take care of this?
> See earlier comment,
> xsk_socket__create()->xsk_setup_xdp_prog()->xsk_lookup_bpf_maps(), however I
> won’t mind keeping it as is.
I see your porint now.
So I can load custom XDP program, then create xsk.
In this case, I don't have to set this flag, and also no need to
call bpf_map_update_elem(). I will fix it.
>
> Maybe this also takes care of cleaning up the program on unload/cleanup?
> This is a link to an example:
>
> https://github.com/chaudron/xdp-tutorial/blob/master/advanced03-AF_XDP/af_xdp_user.c
>
Yes, that's easier.
> >+ }
> >+
> > if (if_indextoname(ifindex, devname) == NULL) {
> > VLOG_ERR("ifindex %d to devname failed (%s)",
> > ifindex, ovs_strerror(errno));
> >@@ -339,17 +393,19 @@ xsk_configure_socket(struct xsk_umem_info *umem,
> >uint32_t ifindex,
> > return NULL;
> > }
> >
> >- /* Make sure the built-in AF_XDP program is loaded. */
> >- ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
> >- if (ret || !prog_id) {
> >- if (ret) {
> >- VLOG_ERR("Get XDP prog ID failed (%s)", ovs_strerror(errno));
> >- } else {
> >- VLOG_ERR("No XDP program is loaded at ifindex %d", ifindex);
> >+ if (use_default_xdp) {
> >+ /* Make sure the built-in AF_XDP program is loaded. */
> >+ ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
> >+ if (ret || !prog_id) {
> >+ if (ret) {
> >+ VLOG_ERR("Get XDP prog ID failed (%s)",
> >ovs_strerror(errno));
> >+ } else {
> >+ VLOG_ERR("No XDP program is loaded at ifindex %d",
> >ifindex);
> >+ }
> >+ xsk_socket__delete(xsk->xsk);
> >+ free(xsk);
> >+ return NULL;
> > }
> >- xsk_socket__delete(xsk->xsk);
> >- free(xsk);
> >- return NULL;
> > }
> >
> > while (!xsk_ring_prod__reserve(&xsk->umem->fq,
> >@@ -376,7 +432,7 @@ xsk_configure_socket(struct xsk_umem_info *umem,
> >uint32_t ifindex,
> >
> > static struct xsk_socket_info *
> > xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> >- bool use_need_wakeup)
> >+ bool use_need_wakeup, bool use_default_xdp)
> > {
> > struct xsk_socket_info *xsk;
> > struct xsk_umem_info *umem;
> >@@ -400,7 +456,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int
> >xdpmode,
> > VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
> >
> > xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
> >- use_need_wakeup);
> >+ use_need_wakeup, use_default_xdp);
> > if (!xsk) {
> > /* Clean up umem and xpacket pool. */
> > if (xsk_umem__delete(umem->umem)) {
> >@@ -420,6 +476,10 @@ xsk_configure_all(struct netdev *netdev)
> > struct netdev_linux *dev = netdev_linux_cast(netdev);
> > struct xsk_socket_info *xsk_info;
> > int i, ifindex, n_rxq, n_txq;
> >+ bool use_default_xdp;
> >+ struct bpf_object *obj;
> >+ int prog_fd = 0;
> >+ int ret;
> >
> > ifindex = linux_get_ifindex(netdev_get_name(netdev));
> >
> >@@ -428,6 +488,7 @@ xsk_configure_all(struct netdev *netdev)
> >
> > n_rxq = netdev_n_rxq(netdev);
> > dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
> >+ use_default_xdp = dev->xdpobj ? false : true;
> >
> > /* Configure each queue. */
> > for (i = 0; i < n_rxq; i++) {
> >@@ -436,7 +497,7 @@ xsk_configure_all(struct netdev *netdev)
> > dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
> > dev->use_need_wakeup ? "true" : "false");
> > xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
> >- dev->use_need_wakeup);
> >+ dev->use_need_wakeup, use_default_xdp);
> > if (!xsk_info) {
> > VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
> > dev->xsks[i] = NULL;
> >@@ -446,6 +507,28 @@ xsk_configure_all(struct netdev *netdev)
> > atomic_init(&xsk_info->tx_dropped, 0);
> > xsk_info->outstanding_tx = 0;
> > xsk_info->available_rx = PROD_NUM_DESCS;
> >+
> >+ if (dev->xdpobj) {
> >+ if (prog_fd == 0) {
> >+ /* XDP program is per-netdev, so all queues share
> >+ the same XDP program. */
> >+ ret = xsk_load_prog(dev->xdpobj, &obj, &prog_fd);
> >+ if (ret) {
> >+ goto err;
> >+ }
> >+ }
> >+ ret = xsk_configure_map(obj, xsk_info, i);
> >+ if (ret) {
> >+ goto err;
> >+ }
> >+
> >+ /* Clear existing XDP program and its associated map. */
> >+ bpf_set_link_xdp_fd(ifindex, -1, dev->xdpmode);
> >+
> >+ bpf_set_link_xdp_fd(ifindex, prog_fd, dev->xdpmode);
> >+ VLOG_INFO("%s: Load custom XDP program at %s.",
> >+ netdev_get_name(netdev), dev->xdpobj);
> >+ }
> > }
> >
> > n_txq = netdev_n_txq(netdev);
> >@@ -519,6 +602,8 @@ xsk_destroy_all(struct netdev *netdev)
> > free(dev->tx_locks);
> > dev->tx_locks = NULL;
> > }
> >+ free(CONST_CAST(char *, dev->xdpobj));
> >+ dev->xdpobj = NULL;
> > }
>
> We need to make sure we close all the open BPF programs or else they will
> all remain in memory.
> I did a lot of tries when debugging and this is what I see with bpftool:
>
> $ bpftool prog
> 2023: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:04-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> 2027: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:04-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> 2031: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:04-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> 2035: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:04-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> 2039: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:05-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> 2043: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:05-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> 2047: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:05-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> 2051: xdp name xdp_prog_simple tag 3b185187f1855c4c gpl
> loaded_at 2019-11-05T10:54:05-0500 uid 0
> xlated 16B jited 35B memlock 4096B
> ...
> ...
>
> Guess same thing for the maps:
>
> $ bpftool map
> 1094: (null) name xsks_map flags 0x0
> key 4B value 4B max_entries 32 memlock 4096B
> 1097: (null) name xsks_map flags 0x0
> key 4B value 4B max_entries 32 memlock 4096B
> 1098: (null) name xsks_map flags 0x0
> key 4B value 4B max_entries 63 memlock 4096B
> ...
> ...
Thanks for testing it.
Let me try to use the way you suggest and make sure maps
are cleaned up!
Regards,
William
More information about the dev
mailing list