[ovs-dev] [PATCHv13] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Fri Jun 21 14:24:54 UTC 2019


On 19.06.2019 22:51, William Tu wrote:
> The patch introduces experimental AF_XDP support for OVS netdev.
> AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> type built upon the eBPF and XDP technology.  It is aims to have comparable
> performance to DPDK but cooperate better with existing kernel's networking
> stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
> attached to the netdev, by-passing a couple of Linux kernel's subsystems
> As a result, AF_XDP socket shows much better performance than AF_PACKET
> For more details about AF_XDP, please see linux kernel's
> Documentation/networking/af_xdp.rst. Note that by default, this feature is
> not compiled in.
> 
> Signed-off-by: William Tu <u9012063 at gmail.com>
> ---
> v1->v2:
> - add a list to maintain unused umem elements
> - remove copy from rx umem to ovs internal buffer
> - use hugetlb to reduce misses (not much difference)
> - use pmd mode netdev in OVS (huge performance improve)
> - remove malloc dp_packet, instead put dp_packet in umem
> 
> v2->v3:
> - rebase on the OVS master, 7ab4b0653784
>   ("configure: Check for more specific function to pull in pthread library.")
> - remove the dependency on libbpf and dpif-bpf.
>   instead, use the built-in XDP_ATTACH feature.
> - data structure optimizations for better performance, see[1]
> - more test cases support
> v3: https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html
> 
> v3->v4:
> - Use AF_XDP API provided by libbpf
> - Remove the dependency on XDP_ATTACH kernel patch set
> - Add documentation, bpf.rst
> 
> v4->v5:
> - rebase to master
> - remove rfc, squash all into a single patch
> - add --enable-afxdp, so by default, AF_XDP is not compiled
> - add options: xdpmode=drv,skb
> - add multiple queue and multiple PMD support, with options: n_rxq
> - improve documentation, rename bpf.rst to af_xdp.rst
> 
> v5->v6
> - rebase to master, commit 0cdd5b13de91b98
> - address errors from sparse and clang
> - pass travis-ci test
> - address feedback from Ben
> - fix issues reported by 0-day robot
> - improved documentation
> 
> v6-v7
> - rebase to master, commit abf11558c1515bf3b1
> - address feedbacks from Ilya, Ben, and Eelco, see:
>   https://www.mail-archive.com/ovs-dev@openvswitch.org/msg32357.html
> - add XDP mode change, implement get/set_config, reconfigure
> - Fix reconfiguration/crash issue caused by libbpf, see patch:
>   [PATCH bpf 0/2] libbpf: fixes for AF_XDP teardown
> - perf optimization for batching umem_push/pop
> - perf optimization for batching kick_tx
> - test build with dpdk
> - fix/refactor atomic operation
> - make AF_XDP x86 specific, otherwise fail at build time
> - lots of code refactoring
> - add PVP setup in documentation
> 
> v7-v8:
> - Address feedback from Ilya at:
>   https://patchwork.ozlabs.org/patch/1095019/
> - add netdev-linux-private.h
> - fix afxdp reconfigure issue
> - sort include headers
> - remove unnecessary OVS_UNUSED
> - coding style fixes
> - error case handling and memory leak
> 
> v8-v9:
> - rebase to master 180bbbed3a3867d52
> - Address review feedback from Ben, Ilya and Eelco, at:
>   https://patchwork.ozlabs.org/patch/1097740/
> - == From Ilya ==
> - Optimize the reconfiguration logic
> - Implement .rxq_recv and .send for afxdp
> - Remove system-afxdp-traffic.at, reuse existing code
> - Use Ilya's rdtsc code
> - remove --disable-system
> - == From Eelco ==
> - Fix bug when remove br0, util(revalidator49)|EMER|lib/poll-loop.c:111:
>   assertion !fd != !wevent failed
> - Fix bug and use default value from libbpf, ex: XSK_RING_PROD__DEFAULT...
> - Clear xdp program when receive signal, ctrl+c
> - Add options to vswitch.xml, set xdpmode default to skb-mode
> - No support for ARM and PPC, now x86_64 only
> - remove redundant header includes and function/macro definitions
> - remove some ifdef HAVE_AF_XDP
> - == From others/both about afxdp rx and tx ==
> - Several umem push/pop error handling improvement/fixes
> - add lock to address concurrent_txq case
> - improve error handling
> - add stats
> - Things that are not done yet
> - MTU limitation
> - n_txq_desc/n_rxq_desc option.
> 
> v9-v10
> - remove x86_64 limitation, suggested by Ben and Eelco
> - add xmalloc_pagealign, free_pagealign
> - minor refector
> 
> v10-v11
> - address feedback from Ilya at
>   https://patchwork.ozlabs.org/patch/1106495/
> - fix typos, and some refactoring
> - refactor existing code and introduce xmalloc pagealign
> - fix a couple of error handling case
> - allocate per-txq lock
> - dynamic allocate xsk array
> - fix cycle_counter_update() for non-x86/non-linux case
> 
> v11-v12
> - mainly address a couple of crashes reported by Eelco
>   https://patchwork.ozlabs.org/patch/1110729/
> - fix cleanup xdp program problem when ovs-vswtichd restarts
> - following cases should remove xdp program
>   - kill `pidof ovs-vswitchd`
>   - ovs-appctl -t ovs-vswtichd exit --cleanup
>   - note: ovs-ctl restart does not have "--cleanup" so still an issue
> - work around issues of xsk_ring_cons__peek at libbpf, reported at
>   https://marc.info/?l=xdp-newbies&m=156055471727857&w=2
> - variable name refactoring
> - there are some performance degradation, but let's make sure
>   everything works first
> 
> v12-v13
> - rebase to master
> - add coverage counter afxdp_cq_emtpy, afxdp_fq_full
> - minor refactoring


Hi!
I finally managed to successfully run 'make check-afxdp' with the
following results:

  ERROR: 76 tests were run,
  6 failed unexpectedly.
  48 tests were skipped.

Failed tests are IP fragmentation expiry for conntrack (I'll send a
separate e-mail about this issue) and NSH tests which are broken for
a while now (not related to XDP).

However, here is the list of issues I faced and had to fix/workaround
to make in work:

1. Abort while trying to push any header:

  Thread 10 "pmd8" received signal SIGABRT, Aborted.
  [Switching to Thread 0x7ff3ab9d9700 (LWP 12151)]
  0x00007ff3b311793f in raise () from /lib64/libc.so.6
  (gdb) bt
  #0  0x00007ff3b311793f in raise () from /lib64/libc.so.6
  #1  0x00007ff3b3101c95 in abort () from /lib64/libc.so.6
  #2  0x0000000000683930 in dp_packet_resize__ (b=0x7ff3a9d346b0, new_headroom=64, new_tailroom=<out>) at ./lib/dp-packet.h:613
  #3  0x000000000068547c in dp_packet_prealloc_headroom (b=<out>, size=4) at lib/dp-packet.c:315
  #4  dp_packet_push_uninit (b=<out>, size=4) at lib/dp-packet.c:427
  #5  dp_packet_resize_l2_5 (b=0x7ff3a9d346b0, increment=4) at lib/dp-packet.c:493
  #6  0x000000000089a841 in push_mpls (packet=0x2, ethtype=<>, lse=1076953088) at lib/packets.c:391
  #7  0x0000000000762b5d in odp_execute_actions  at lib/odp-execute.c:875
  #8  0x00000000006a4493 in dp_netdev_execute_actions  at lib/dpif-netdev.c:7264
  #9  handle_packet_upcall  at lib/dpif-netdev.c:6545
  #10 fast_path_processing  at lib/dpif-netdev.c:6641
  #11 0x00000000006a2b6f in dp_netdev_input__  at lib/dpif-netdev.c:6729
  #12 0x000000000069f973 in dp_netdev_input  at lib/dpif-netdev.c:6767
  #13 dp_netdev_process_rxq_port  at lib/dpif-netdev.c:4277
  #14 0x000000000069ba0b in pmd_thread_main  at lib/dpif-netdev.c:5451
  #15 0x000000000085f6b0 in ovsthread_wrapper  at lib/ovs-thread.c:352
  #16 0x00007ff3b3e532de in start_thread () from /lib64/libpthread.so.0
  #17 0x00007ff3b31dca63 in clone () from /lib64/libc.so.6

Reason: Wrong headroom management. In current implementation headroom of
the dp-packet is always zero. This leads to resize attempts failures on
OVS_NOT_REACHED().

Here is the patch that could solve the issue:

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index e6a794707..c9593515a 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -65,10 +65,11 @@ dp_packet_use(struct dp_packet *b, void *base, size_t allocated)
  * memory starting at AF_XDP umem base.
  */
 void
-dp_packet_use_afxdp(struct dp_packet *b, void *base, size_t allocated)
+dp_packet_use_afxdp(struct dp_packet *b, void *data,
+                    size_t allocated, size_t headroom)
 {
-    dp_packet_set_base(b, base);
-    dp_packet_set_data(b, base);
+    dp_packet_set_base(b, (char *) data - headroom);
+    dp_packet_set_data(b, data);
     dp_packet_set_size(b, 0);
 
     dp_packet_set_allocated(b, allocated);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index e3438226e..47ea14b94 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -132,7 +132,7 @@ void dp_packet_use(struct dp_packet *, void *, size_t);
 void dp_packet_use_stub(struct dp_packet *, void *, size_t);
 void dp_packet_use_const(struct dp_packet *, const void *, size_t);
 #if HAVE_AF_XDP
-void dp_packet_use_afxdp(struct dp_packet *, void *, size_t);
+void dp_packet_use_afxdp(struct dp_packet *, void *, size_t, size_t);
 #endif
 void dp_packet_init_dpdk(struct dp_packet *);
 
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index 33d861215..518389a58 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -591,7 +591,7 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
         packet = &xpacket->packet;
 
         /* Initialize the struct dp_packet */
-        dp_packet_use_afxdp(packet, pkt, FRAME_SIZE - FRAME_HEADROOM);
+        dp_packet_use_afxdp(packet, pkt, FRAME_SIZE, FRAME_HEADROOM);
         dp_packet_set_size(packet, len);
 
         /* Add packet into batch, increase batch->count */
@@ -646,7 +646,7 @@ free_afxdp_buf(struct dp_packet *p)
     if (xpacket->mpool) {
         void *base = dp_packet_base(p);
 
-        addr = (uintptr_t)base & (~FRAME_SHIFT_MASK);
+        addr = ((uintptr_t)base + FRAME_HEADROOM) & (~FRAME_SHIFT_MASK);
         umem_elem_push(xpacket->mpool, (void *)addr);
     }
 }
@@ -664,7 +664,7 @@ free_afxdp_buf_batch(struct dp_packet_batch *batch)
         if (xpacket->mpool) {
             void *base = dp_packet_base(packet);
 
-            addr = (uintptr_t)base & (~FRAME_SHIFT_MASK);
+            addr = ((uintptr_t)base + FRAME_HEADROOM) & (~FRAME_SHIFT_MASK);
             elems[i] = (void *)addr;
         }
     }
---

Works for me. Please, re-check. I might miss something.



2. vlan tests fails due to kernel issues. We need to disable vlan offloading
along with tx offloading, otherwise packets dissapears somewhere inside the
kernel. I didn't find the root cause. In v8 of this patch you disabled 'rxvlan'
and 'txvlan' for tests, but these bits are missing in newer versions.
Most probably we only need to disable 'txvlan'.



3. cvlan tests fails due to kernel issues. I didn't found a workaround for this.
Disabling the vlan offloading doesn't work. I had to force afxdp testsuite to
skip these tests:

diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
index 1e6f7a46b..91f2ef91c 100644
--- a/tests/system-afxdp-macros.at
+++ b/tests/system-afxdp-macros.at
@@ -18,3 +18,6 @@ m4_define([ADD_VETH],
       on_exit 'ip link del ovs-$1'
     ]
 )
+
+m4_define([OVS_CHECK_8021AD],
+    [AT_SKIP_IF([:])])
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index d23ee897b..22f814fba 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -71,6 +71,7 @@ AT_CLEANUP
 
 AT_SETUP([datapath - ping between two ports on cvlan])
 OVS_TRAFFIC_VSWITCHD_START()
+OVS_CHECK_8021AD()
 
 AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
 
@@ -161,6 +162,7 @@ AT_CLEANUP
 
 AT_SETUP([datapath - ping6 between two ports on cvlan])
 OVS_TRAFFIC_VSWITCHD_START()
+OVS_CHECK_8021AD()
 
 AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
 
---



4. 'system-afxdp-testsuite' doesn't re-build on 'system-userspace-macros.at'
changes. Missing file dependencies in automake:

diff --git a/tests/automake.mk b/tests/automake.mk
index 131564bb0..f0449c395 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -163,6 +163,7 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
        tests/system-userspace-packet-type-aware.at
 
 SYSTEM_AFXDP_TESTSUITE_AT = \
+       tests/system-userspace-macros.at \
        tests/system-afxdp-testsuite.at \
        tests/system-afxdp-macros.at
 
---


5. 'make check-afxdp' executes 'make install' which is unwanted:

diff --git a/tests/automake.mk b/tests/automake.mk
index 131564bb0..d6ab51732 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -325,7 +326,6 @@ check-system-userspace: all
        "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
 
 check-afxdp: all
-       $(MAKE) install
        set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
        "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
 
---


6. TCP doesn't work with XDP over veth interfaces. This is a known kernel
issue that could be workarounded by the following patch to a kernel:
    https://github.com/cilium/cilium/issues/3077#issuecomment-430801467 .
Until proper solution implemented in kernel we probably should skip all the
tests that involves TCP.


7. It's a known issue that tunneling is not working right now in system-traffic
userspace tests. Could be workarounded by removing '--disable-system' from
OVS_TRAFFIC_VSWITCHD_START in tests/system-userspace-macros.at. I'm going to
prepare a patch for this issue in a near future.


8. As I said, I'll send a separate main about IP fragmented conntrack issues.


P.S. We probably should mention TCP, vlan and 8021ad issues on veth interfaces
     somewhere in docs, so users will be aware of them.

Best regards, Ilya Maximets.


More information about the dev mailing list