[ovs-dev] [PATCH v5] netdev-afxdp: add new netdev type for AF_XDP.
William Tu
u9012063 at gmail.com
Fri Apr 19 20:22:14 UTC 2019
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP is a new address family working together with eBPF/XDP.
A socket with AF_XDP family can receive and send raw packets
from an eBPF/XDP program attached to the netdev.
For details introduction and configuration, see
Documentation/intro/install/afxdp.rst
Signed-off-by: William Tu <u9012063 at gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei at gmail.com>
Cc: Tim Rozet <trozet at redhat.com>
Cc: Eelco Chaudron <echaudro at redhat.com>
---
Documentation/automake.mk | 1 +
Documentation/index.rst | 1 +
Documentation/intro/install/afxdp.rst | 299 +++++++++++
Documentation/intro/install/index.rst | 1 +
acinclude.m4 | 23 +
configure.ac | 1 +
lib/automake.mk | 7 +-
lib/dp-packet.c | 13 +
lib/dp-packet.h | 32 +-
lib/dpif-netdev-perf.h | 13 +
lib/netdev-afxdp.c | 592 ++++++++++++++++++++
lib/netdev-afxdp.h | 46 ++
lib/netdev-linux.c | 89 +++-
lib/netdev-linux.h | 1 +
lib/netdev-provider.h | 1 +
lib/netdev.c | 1 +
lib/xdpsock.c | 211 ++++++++
lib/xdpsock.h | 133 +++++
tests/automake.mk | 17 +
tests/system-afxdp-macros.at | 153 ++++++
tests/system-afxdp-testsuite.at | 26 +
tests/system-afxdp-traffic.at | 978 ++++++++++++++++++++++++++++++++++
22 files changed, 2633 insertions(+), 6 deletions(-)
create mode 100644 Documentation/intro/install/afxdp.rst
create mode 100644 lib/netdev-afxdp.c
create mode 100644 lib/netdev-afxdp.h
create mode 100644 lib/xdpsock.c
create mode 100644 lib/xdpsock.h
create mode 100644 tests/system-afxdp-macros.at
create mode 100644 tests/system-afxdp-testsuite.at
create mode 100644 tests/system-afxdp-traffic.at
diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 082438e09a33..11cc59efc881 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -10,6 +10,7 @@ DOC_SOURCE = \
Documentation/intro/why-ovs.rst \
Documentation/intro/install/index.rst \
Documentation/intro/install/bash-completion.rst \
+ Documentation/intro/install/afxdp.rst \
Documentation/intro/install/debian.rst \
Documentation/intro/install/documentation.rst \
Documentation/intro/install/distributions.rst \
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 46261235c732..aa9e7c49f179 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -59,6 +59,7 @@ vSwitch? Start here.
:doc:`intro/install/windows` |
:doc:`intro/install/xenserver` |
:doc:`intro/install/dpdk` |
+ :doc:`intro/install/afxdp` |
:doc:`Installation FAQs <faq/releases>`
- **Tutorials:** :doc:`tutorials/faucet` |
diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
new file mode 100644
index 000000000000..db55d1bc4dd7
--- /dev/null
+++ b/Documentation/intro/install/afxdp.rst
@@ -0,0 +1,299 @@
+..
+ Licensed under the Apache License, Version 2.0 (the "License"); you may
+ not use this file except in compliance with the License. You may obtain
+ a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ License for the specific language governing permissions and limitations
+ under the License.
+
+ Convention for heading levels in Open vSwitch documentation:
+
+ ======= Heading 0 (reserved for the title in a document)
+ ------- Heading 1
+ ~~~~~~~ Heading 2
+ +++++++ Heading 3
+ ''''''' Heading 4
+
+ Avoid deeper levels because they do not render well.
+
+
+========================
+Open vSwitch with AF_XDP
+========================
+
+This document describes how to build and install Open vSwitch using
+AF_XDP netdev.
+
+.. warning::
+ The AF_XDP support of Open vSwitch is considered 'experimental',
+ it is not compiled in by default.
+
+Introduction
+------------
+AF_XDP, Address Family eXpress Data Path, is a new address family
+working together with eBPF/XDP. An AF_XDP socket can receive and
+send packets from an eBPF/XDP program attached to the netdev and
+have much better performance than AF_PACKET.
+For more details about AF_XDP, please see linux kernel's
+Documentation/networking/af_xdp.rst
+
+AF_XDP Netdev
+-------------
+OVS has a couple of netdev types, i.e., system, tap, or
+internal. The AF_XDP feature adds a new netdev types called
+"afxdp", and implement its configuration, packet reception,
+and transmit functions. Since the AF_XDP socket, xsk,
+operates in userspace, once ovs-vswitchd receives packets
+from xsk, the proposed architecture re-uses the existing
+userspace dpif-netdev datapath. As a result, most of
+the packet processing happens at the userspace instead of
+linux kernel.
+
+::
+
+ | +-------------------+
+ | | ovs-vswitchd |<-->ovsdb-server
+ | +-------------------+
+ | | ofproto |<-->OpenFlow controllers
+ | +--------+-+--------+
+ | | netdev | |ofproto-|
+ userspace | +--------+ | dpif |
+ | | afxdp | +--------+
+ | | netdev | | dpif |
+ | +---||---+ +--------+
+ | || | dpif- |
+ | || | netdev |
+ |_ || +--------+
+ ||
+ _ +---||-----+--------+
+ | | AF_XDP prog + |
+ kernel | | xsk_map |
+ |_ +--------||---------+
+ ||
+ physical
+ NIC
+
+
+Build requirements
+------------------
+
+In addition to the requirements described in :doc:`general`, building Open
+vSwitch with AF_XDP will require the following:
+
+- libbpf from kernel source tree (kernel 5.0.0 or later)
+
+- Linux kernel XDP support, with the following options
+ ``_CONFIG_BPF=y``
+
+ ``_CONFIG_BPF_SYSCALL=y``
+
+ ``_CONFIG_XDP_SOCKETS=y``
+
+
+- The following optional Kconfig options are also recommended, but not required:
+
+ ``_CONFIG_BPF_JIT=y``
+
+ ``_CONFIG_HAVE_BPF_JIT=y``
+
+ ``_CONFIG_XDP_SOCKETS_DIAG=y``
+
+- If possible, run **./xdpsock -r -N -z -i <your device>** under linux/samples/bpf.
+ This is the OVS indepedent benchmark tools for AF_XDP. It makes sure your basic
+ kernel requirements are met for AF_XDP.
+
+
+Installing
+----------
+For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
+Fist, clone a recent version of Linux bpf-next tree::
+
+ git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
+
+Second, go into the Linux source directory and build libbpf in the tools directory::
+
+ cd bpf-next/
+ cd tools/lib/bpf/
+ make && make install
+ make install_headers
+
+.. note::
+ Make sure xsk.h is install in system's library path
+
+Make sure the libbpf.so is installed correctly::
+
+ ldconfig -p | grep libbpf
+
+
+Third, ensure the standard OVS requirements are installed and bootstrap/configure the package::
+
+ ./boot.sh && ./configure --enable-afxdp
+
+Finally, build and install OVS::
+
+ make && make install
+
+To kick start end-to-end autotesting::
+
+ make check-afxdp
+
+if a test case fails, check the log at::
+
+ cat tests/system-afxdp-testsuite.dir/<failed number>/system-afxdp-testsuite.log
+
+
+Setup AF_XDP netdev
+-------------------
+Before running OVS with AF_XDP, make sure the libbpf and libelf are set-up right::
+
+ ldd vswitchd/ovs-vswitchd
+
+Open vSwitch should be started using userspace datapath as described in :doc:`general`::
+
+ ovs-vswitchd --disable-system
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+.. note::
+ OVS AF_XDP netdev is using the userspace datapath, the same datapath
+ as used by OVS-DPDK. So it requires --disable-system for ovs-vswitchd
+ and datapath_type=netdev when adding a new bridge.
+
+Make sure your device support AF_XDP, and to use 1 PMD (on core 4)
+on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
+pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
+
+ ethtool -L enp2s0 combined 1
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
+ options:n_rxq=1 options:xdpmode=drv other_config:pmd-rxq-affinity="0:4"
+
+Or, use 4 pmds/cores and 4 queues by doing::
+
+ ethtool -L enp2s0 combined 4
+ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
+ ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
+ options:n_rxq=4 options:xdpmode=drv other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
+
+To validate that the bridge has successfully instantiated, you can use the::
+
+ ovs-vsctl show
+
+should show something like::
+
+ Port "ens802f0"
+ Interface "ens802f0"
+ type: afxdp
+ options: {n_rxq="1", xdpmode=drv}
+
+Otherwise, enable debug by::
+
+ ovs-appctl vlog/set netdev_afxdp::dbg
+
+
+References
+----------
+Most of the design details are described in the paper presented at
+Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
+section 4, and slides[2][4].
+"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
+about AF_XDP current and future work.
+
+
+[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
+
+[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
+
+[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
+
+[4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
+
+
+Performance Tuning
+------------------
+The name of the game is to keep your CPU running in userspace, allowing PMD to
+keep polling the AF_XDP queues without any interferences from kernel.
+
+#. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
+ running cores, device plug-in slot)
+
+#. Isolate your CPU by doing isolcpu at grub configure.
+
+#. IRQ should not set to pmd running core.
+
+#. The Spectre and Meltdown fixes increase the overhead of system calls.
+
+Debugging performance issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+While running the traffic, use linux perf tool to see where your cpu spends its cycle::
+
+ cd bpf-next/tools/perf
+ make
+ ./perf record -p `pidof ovs-vswitchd` sleep 10
+ ./perf report
+
+Or, use OVS pmd tool::
+
+ ovs-appctl dpif-netdev/pmd-stats-show
+
+
+Example Script
+--------------
+
+Below is a script using namespaces and veth peer::
+
+ #!/bin/bash
+ ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl --disable-system --detach
+ ovs-vsctl -- add-br br0 -- set Bridge br0 \
+ protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 \
+ fail-mode=secure datapath_type=netdev
+ ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+ ip netns add at_ns0
+ ovs-appctl vlog/set netdev_afxdp::dbg
+
+ ip link add p0 type veth peer name afxdp-p0
+ ip link set p0 netns at_ns0
+ ip link set dev afxdp-p0 up
+ ovs-vsctl add-port br0 afxdp-p0 -- \
+ set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
+
+ ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
+ ip addr add "10.1.1.1/24" dev p0
+ ip link set dev p0 up
+ NS_EXEC_HEREDOC
+
+ ip netns add at_ns1
+ ip link add p1 type veth peer name afxdp-p1
+ ip link set p1 netns at_ns1
+ ip link set dev afxdp-p1 up
+
+ ovs-vsctl add-port br0 afxdp-p1 -- \
+ set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
+ ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
+ ip addr add "10.1.1.2/24" dev p1
+ ip link set dev p1 up
+ NS_EXEC_HEREDOC
+
+ ip netns exec at_ns0 ping -i .2 10.1.1.2
+
+
+Limitations/Known Issues
+------------------------
+#. Device's numa ID is always 0, need a way to find numa id from a netdev.
+#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
+ work-around is to use OpenFlow meter action.
+#. AF_XDP device added to bridge, remove, and added again will fail.
+#. Most of the tests are done using i40e single port. Multiple ports and
+ also ixgbe driver also needs to be tested.
+#. No latency test result (TODO items)
+
+
+Bug Reporting
+-------------
+
+Please report problems to dev at openvswitch.org.
diff --git a/Documentation/intro/install/index.rst b/Documentation/intro/install/index.rst
index 3193c736cf17..c27a9c9d16ff 100644
--- a/Documentation/intro/install/index.rst
+++ b/Documentation/intro/install/index.rst
@@ -45,6 +45,7 @@ Installation from Source
xenserver
userspace
dpdk
+ afxdp
Installation from Packages
--------------------------
diff --git a/acinclude.m4 b/acinclude.m4
index 1607d5f4b1d9..9ff981e28c32 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -221,6 +221,29 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
])
])
+dnl OVS_CHECK_LINUX_AF_XDP
+dnl
+dnl Check both Linux kernel AF_XDP and libbpf support
+AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
+ AC_MSG_CHECKING([whether AF_XDP is supported])
+ AC_ARG_ENABLE([afxdp],
+ [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
+ [], [enable_afxdp=no])
+ AC_CHECK_HEADER([bpf/libbpf.h],
+ [HAVE_LIBBPF=yes],
+ [HAVE_LIBBPF=no])
+ AC_CHECK_HEADER([linux/if_xdp.h],
+ [HAVE_IF_XDP=yes],
+ [HAVE_IF_XDP=no])
+ AM_CONDITIONAL([SUPPORT_AF_XDP],
+ [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes && test "$HAVE_IF_XDP" = yes])
+ AM_COND_IF([SUPPORT_AF_XDP], [
+ AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is available and enabled.])
+ LIBBPF_LDADD=" -lbpf -lelf"
+ AC_SUBST([LIBBPF_LDADD])
+ ])
+])
+
dnl OVS_CHECK_DPDK
dnl
dnl Configure DPDK source tree
diff --git a/configure.ac b/configure.ac
index 505e3d041e93..29c90b73f836 100644
--- a/configure.ac
+++ b/configure.ac
@@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
OVS_CHECK_DOT
OVS_CHECK_IF_DL
OVS_CHECK_STRTOK_R
+OVS_CHECK_LINUX_AF_XDP
AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
[], [], [[#include <sys/stat.h>]])
diff --git a/lib/automake.mk b/lib/automake.mk
index cc5dccf39d6b..8b9df5635bbe 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
+lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
if WIN32
lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
@@ -327,7 +328,11 @@ lib_libopenvswitch_la_SOURCES = \
lib/lldp/lldpd.c \
lib/lldp/lldpd.h \
lib/lldp/lldpd-structs.c \
- lib/lldp/lldpd-structs.h
+ lib/lldp/lldpd-structs.h \
+ lib/xdpsock.c \
+ lib/xdpsock.h \
+ lib/netdev-afxdp.c \
+ lib/netdev-afxdp.h
if WIN32
lib_libopenvswitch_la_SOURCES += \
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 0976a35e758b..5db1aa4b67c0 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -122,6 +122,16 @@ dp_packet_uninit(struct dp_packet *b)
* created as a dp_packet */
free_dpdk_buf((struct dp_packet*) b);
#endif
+ } else if (b->source == DPBUF_AFXDP) {
+#ifdef HAVE_AF_XDP
+ struct dp_packet_afxdp *xpacket;
+
+ xpacket = dp_packet_cast_afxdp(b);
+ if (xpacket->mpool) {
+ umem_elem_push(xpacket->mpool, dp_packet_base(b));
+ }
+#endif
+ return;
}
}
}
@@ -248,6 +258,8 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
case DPBUF_STACK:
OVS_NOT_REACHED();
+ case DPBUF_AFXDP:
+ OVS_NOT_REACHED();
case DPBUF_STUB:
b->source = DPBUF_MALLOC;
new_base = xmalloc(new_allocated);
@@ -433,6 +445,7 @@ dp_packet_steal_data(struct dp_packet *b)
{
void *p;
ovs_assert(b->source != DPBUF_DPDK);
+ ovs_assert(b->source != DPBUF_AFXDP);
if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) {
p = dp_packet_data(b);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index a5e9ade1244a..78f92f0be583 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -30,6 +30,7 @@
#include "packets.h"
#include "util.h"
#include "flow.h"
+#include "lib/xdpsock.h"
#ifdef __cplusplus
extern "C" {
@@ -42,6 +43,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
DPBUF_DPDK, /* buffer data is from DPDK allocated memory.
* ref to dp_packet_init_dpdk() in dp-packet.c.
*/
+ DPBUF_AFXDP, /* buffer data from XDP frame */
};
#define DP_PACKET_CONTEXT_SIZE 64
@@ -89,6 +91,20 @@ struct dp_packet {
};
};
+struct dp_packet_afxdp {
+ struct umem_pool *mpool;
+ struct dp_packet packet;
+};
+
+#if HAVE_AF_XDP
+static struct dp_packet_afxdp *
+dp_packet_cast_afxdp(const struct dp_packet *d OVS_UNUSED)
+{
+ ovs_assert(d->source == DPBUF_AFXDP);
+ return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
+}
+#endif
+
static inline void *dp_packet_data(const struct dp_packet *);
static inline void dp_packet_set_data(struct dp_packet *, void *);
static inline void *dp_packet_base(const struct dp_packet *);
@@ -183,7 +199,21 @@ dp_packet_delete(struct dp_packet *b)
free_dpdk_buf((struct dp_packet*) b);
return;
}
-
+ if (b->source == DPBUF_AFXDP) {
+#ifdef HAVE_AF_XDP
+ struct dp_packet_afxdp *xpacket;
+
+ /* if a packet is received from afxdp port,
+ * and tx to a system port. Then we need to
+ * push the rx umem back here
+ */
+ xpacket = dp_packet_cast_afxdp(b);
+ if (xpacket->mpool) {
+ umem_elem_push(xpacket->mpool, dp_packet_base(b));
+ }
+#endif
+ return;
+ }
dp_packet_uninit(b);
free(b);
}
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 859c05613ddf..e47cf73bf3c9 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -198,6 +198,19 @@ cycles_counter_update(struct pmd_perf_stats *s)
{
#ifdef DPDK_NETDEV
return s->last_tsc = rte_get_tsc_cycles();
+#elif HAVE_AF_XDP
+ union {
+ uint64_t tsc_64;
+ struct {
+ uint32_t lo_32;
+ uint32_t hi_32;
+ };
+ } tsc;
+ asm volatile("rdtsc" :
+ "=a" (tsc.lo_32),
+ "=d" (tsc.hi_32));
+
+ return s->last_tsc = tsc.tsc_64;
#else
return s->last_tsc = 0;
#endif
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
new file mode 100644
index 000000000000..56f313606190
--- /dev/null
+++ b/lib/netdev-afxdp.c
@@ -0,0 +1,592 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#include "netdev-linux.h"
+#include <errno.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <inttypes.h>
+#include <linux/filter.h>
+#include <linux/gen_stats.h>
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <linux/rtnetlink.h>
+#include <linux/sockios.h>
+#include <linux/if_xdp.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/utsname.h>
+#include <netpacket/packet.h>
+#include <net/if.h>
+#include <net/if_arp.h>
+#include <net/route.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "coverage.h"
+#include "dp-packet.h"
+#include "dpif-netlink.h"
+#include "dpif-netdev.h"
+#include "openvswitch/dynamic-string.h"
+#include "fatal-signal.h"
+#include "hash.h"
+#include "openvswitch/hmap.h"
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "netlink-notifier.h"
+#include "netlink-socket.h"
+#include "netlink.h"
+#include "netnsid.h"
+#include "openvswitch/ofpbuf.h"
+#include "openflow/openflow.h"
+#include "ovs-atomic.h"
+#include "packets.h"
+#include "openvswitch/poll-loop.h"
+#include "rtnetlink.h"
+#include "openvswitch/shash.h"
+#include "socket-util.h"
+#include "sset.h"
+#include "tc.h"
+#include "timer.h"
+#include "unaligned.h"
+#include "openvswitch/vlog.h"
+#include "util.h"
+#include "xdpsock.h"
+#include "netdev-afxdp.h"
+
+#ifdef HAVE_AF_XDP
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+
+#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
+#define UMEM2XPKT(base, i) \
+ (struct dp_packet_afxdp *)((char *)base + i * sizeof(struct dp_packet_afxdp))
+
+static uint32_t opt_xdp_bind_flags = XDP_COPY;
+static uint32_t opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+#if 0
+static uint32_t opt_xdp_bind_flags = XDP_ZEROCOPY;
+static uint32_t opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+#endif
+static uint32_t prog_id;
+
+static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t size)
+{
+ struct xsk_umem_info *umem;
+ int ret;
+ int i;
+
+ umem = xcalloc(1, sizeof(*umem));
+ if (!umem) {
+ VLOG_FATAL("xsk config umem failed (%s)", ovs_strerror(errno));
+ }
+
+ ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
+ NULL);
+
+ if (ret) {
+ VLOG_FATAL("xsk umem create failed (%s) mode: %s",
+ ovs_strerror(errno),
+ opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV");
+ }
+
+ umem->buffer = buffer;
+
+ /* set-up umem pool */
+ umem_pool_init(&umem->mpool, NUM_FRAMES);
+
+ for (i = NUM_FRAMES - 1; i >= 0; i--) {
+ struct umem_elem *elem;
+
+ elem = (struct umem_elem *)((char *)umem->buffer
+ + i * FRAME_SIZE);
+ umem_elem_push(&umem->mpool, elem);
+ }
+
+ /* set-up metadata */
+ xpacket_pool_init(&umem->xpool, NUM_FRAMES);
+
+ VLOG_DBG("%s xpacket pool from %p to %p", __func__,
+ umem->xpool.array,
+ (char *)umem->xpool.array +
+ NUM_FRAMES * sizeof(struct dp_packet_afxdp));
+
+ for (i = NUM_FRAMES - 1; i >= 0; i--) {
+ struct dp_packet_afxdp *xpacket;
+ struct dp_packet *packet;
+
+ xpacket = UMEM2XPKT(umem->xpool.array, i);
+ xpacket->mpool = &umem->mpool;
+
+ packet = &xpacket->packet;
+ packet->source = DPBUF_AFXDP;
+ }
+
+ return umem;
+}
+
+static struct xsk_socket_info *
+xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
+ uint32_t queue_id)
+{
+ struct xsk_socket_config cfg;
+ struct xsk_socket_info *xsk;
+ char devname[IF_NAMESIZE];
+ uint32_t idx;
+ int ret;
+ int i;
+
+ xsk = xcalloc(1, sizeof(*xsk));
+ if (!xsk) {
+ VLOG_FATAL("xsk calloc failed (%s)", ovs_strerror(errno));
+ }
+
+ xsk->umem = umem;
+ cfg.rx_size = CONS_NUM_DESCS;
+ cfg.tx_size = PROD_NUM_DESCS;
+ cfg.libbpf_flags = 0;
+ cfg.xdp_flags = opt_xdp_flags;
+ cfg.bind_flags = opt_xdp_bind_flags;
+
+ if (if_indextoname(ifindex, devname) == NULL) {
+ VLOG_FATAL("ifindex %d devname failed (%s)",
+ ifindex, ovs_strerror(errno));
+ }
+
+ ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
+ &xsk->rx, &xsk->tx, &cfg);
+ if (ret) {
+ VLOG_FATAL("xsk_socket_create failed (%s) mode: %s qid: %d",
+ ovs_strerror(errno),
+ opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV",
+ queue_id);
+ }
+
+ /* make sure the XDP program is there */
+ ret = bpf_get_link_xdp_id(ifindex, &prog_id, opt_xdp_flags);
+ if (ret) {
+ VLOG_FATAL("get XDP prog ID failed (%s)", ovs_strerror(errno));
+ }
+
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq,
+ PROD_NUM_DESCS,
+ &idx);
+ if (ret != PROD_NUM_DESCS) {
+ VLOG_FATAL("fq set-up failed (%s)", ovs_strerror(errno));
+ }
+
+ for (i = 0;
+ i < PROD_NUM_DESCS * FRAME_SIZE;
+ i += FRAME_SIZE) {
+ struct umem_elem *elem;
+ uint64_t addr;
+
+ elem = umem_elem_pop(&xsk->umem->mpool);
+ addr = UMEM2DESC(elem, xsk->umem->buffer);
+
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
+ }
+
+ xsk_ring_prod__submit(&xsk->umem->fq,
+ PROD_NUM_DESCS);
+ return xsk;
+}
+
+struct xsk_socket_info *
+xsk_configure(int ifindex, int xdp_queue_id)
+{
+ struct xsk_socket_info *xsk;
+ struct xsk_umem_info *umem;
+ void *bufs;
+ int ret;
+
+ ret = posix_memalign(&bufs, getpagesize(),
+ NUM_FRAMES * FRAME_SIZE);
+ ovs_assert(!ret);
+
+ /* Create sockets... */
+ umem = xsk_configure_umem(bufs,
+ NUM_FRAMES * FRAME_SIZE);
+ xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id);
+ return xsk;
+}
+
+static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
+{
+ struct ds ds = DS_EMPTY_INITIALIZER;
+ ds_put_hex_dump(&ds, buf, count, 0, false);
+ VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
+ ds_destroy(&ds);
+}
+
+void
+xsk_destroy(struct xsk_socket_info *xsk)
+{
+ struct xsk_umem *umem;
+
+ if (!xsk) {
+ return;
+ }
+
+ umem = xsk->umem->umem;
+ xsk_socket__delete(xsk->xsk);
+ (void)xsk_umem__delete(umem);
+
+ /* cleanup umem pool */
+ umem_pool_cleanup(&xsk->umem->mpool);
+
+ /* cleanup metadata pool */
+ xpacket_pool_cleanup(&xsk->umem->xpool);
+
+ return;
+}
+
+static inline void
+print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
+ struct xdp_statistics stat;
+ socklen_t optlen;
+
+ optlen = sizeof(stat);
+ ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP, XDP_STATISTICS,
+ &stat, &optlen) == 0);
+
+ VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid %llu",
+ stat.rx_dropped, stat.rx_invalid_descs, stat.tx_invalid_descs);
+ return;
+}
+
+int
+netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
+ char **errp OVS_UNUSED)
+{
+ const char *xdpmode;
+ int new_n_rxq;
+
+ /* TODO: add mutex lock */
+ new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
+
+ if (netdev->n_rxq != new_n_rxq) {
+
+ if (new_n_rxq > MAX_XSKQ) {
+ VLOG_WARN("set n_rxq %d too large", new_n_rxq);
+ goto out;
+ }
+
+ netdev->n_rxq = new_n_rxq;
+ VLOG_INFO("set AF_XDP device %s to %d n_rxq", netdev->name, new_n_rxq);
+ netdev_request_reconfigure(netdev);
+ }
+
+ xdpmode = smap_get(args, "xdpmode");
+ if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
+ if (opt_xdp_bind_flags != XDP_ZEROCOPY) {
+ opt_xdp_bind_flags = XDP_ZEROCOPY;
+ opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+ }
+ VLOG_INFO("AF_XDP device %s in ZC driver mode", netdev->name);
+ } else {
+ opt_xdp_bind_flags = XDP_COPY;
+ opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+ VLOG_INFO("AF_XDP device %s in SKB mode", netdev->name);
+ }
+
+out:
+ return 0;
+}
+
+int
+netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
+{
+ /* TODO: add mutex lock */
+ smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
+ smap_add_format(args, "xdpmode", "%s",
+ opt_xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
+
+ return 0;
+}
+
+int
+netdev_afxdp_get_numa_id(const struct netdev *netdev)
+{
+ /* FIXME: Get netdev's PCIe device ID, then find
+ * its NUMA node id.
+ */
+ VLOG_INFO("FIXME: Device %s always use numa id 0", netdev->name);
+ return 0;
+}
+
+void
+xsk_remove_xdp_program(uint32_t ifindex)
+{
+ uint32_t curr_prog_id = 0;
+
+ /* remove_xdp_program() */
+ if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, opt_xdp_flags)) {
+ bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
+ }
+ if (prog_id == curr_prog_id) {
+ bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
+ } else if (!curr_prog_id) {
+ VLOG_WARN("couldn't find a prog id on a given interface");
+ } else {
+ VLOG_WARN("program on interface changed, not removing");
+ }
+
+ return;
+}
+
+/* Receive packet from AF_XDP socket */
+int
+netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch)
+{
+ unsigned int rcvd, i;
+ uint32_t idx_rx = 0, idx_fq = 0;
+ int ret = 0;
+
+ /* See if there is any packet on RX queue,
+ * if yes, idx_rx is the index having the packet.
+ */
+ rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
+ if (!rcvd) {
+ return 0;
+ }
+
+ /* Form a dp_packet batch from descriptor in RX queue */
+ for (i = 0; i < rcvd; i++) {
+ uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+ uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
+ char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+ uint64_t index;
+
+ struct dp_packet_afxdp *xpacket;
+ struct dp_packet *packet;
+
+ index = addr >> FRAME_SHIFT;
+ xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
+
+ packet = &xpacket->packet;
+ xpacket->mpool = &xsk->umem->mpool;
+
+ if (packet->source != DPBUF_AFXDP) {
+ /* FIXME: might be a bug */
+ continue;
+ }
+
+ /* Initialize the struct dp_packet */
+ if (opt_xdp_bind_flags == XDP_ZEROCOPY) {
+ dp_packet_set_base(packet, pkt - FRAME_HEADROOM);
+ } else {
+ /* SKB mode */
+ dp_packet_set_base(packet, pkt);
+ }
+ dp_packet_set_data(packet, pkt);
+ dp_packet_set_size(packet, len);
+
+ /* Add packet into batch, increase batch->count */
+ dp_packet_batch_add(batch, packet);
+
+ idx_rx++;
+ }
+
+ /* We've consume rcvd packets in RX, now re-fill the
+ * same number back to FILL queue.
+ */
+ for (i = 0; i < rcvd; i++) {
+ uint64_t index;
+ struct umem_elem *elem;
+
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+ while (ret == 0) {
+ /* The FILL queue is full, so retry. (or skip)? */
+ ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+ }
+
+ /* Get one free umem, program it into FILL queue */
+ elem = umem_elem_pop(&xsk->umem->mpool);
+ index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+ ovs_assert((index & FRAME_SHIFT_MASK) == 0);
+ *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
+
+ idx_fq++;
+ }
+ xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+
+ /* Release the RX queue */
+ xsk_ring_cons__release(&xsk->rx, rcvd);
+ xsk->rx_npkts += rcvd;
+
+#ifdef AFXDP_DEBUG
+ print_xsk_stat(xsk);
+#endif
+ return 0;
+}
+
+static void kick_tx(struct xsk_socket_info *xsk)
+{
+ int ret;
+
+ ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
+ if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY) {
+ return;
+ }
+}
+
+/*
+ * A dp_packet might come from
+ * 1) AFXDP buffer
+ * 2) non-AFXDP buffer, ex: send from tap device
+ */
+int
+netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch)
+{
+ uint32_t tx_done, idx_cq = 0;
+ struct dp_packet *packet;
+ uint32_t idx;
+ int j;
+
+ /* Make sure we have enough TX descs */
+ if (xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx) == 0) {
+ return -EAGAIN;
+ }
+
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ struct dp_packet_afxdp *xpacket;
+ struct umem_elem *elem;
+ uint64_t index;
+
+ elem = umem_elem_pop(&xsk->umem->mpool);
+ if (!elem) {
+ return -EAGAIN;
+ }
+
+ memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
+
+ index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
+ xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
+ = dp_packet_size(packet);
+
+ if (packet->source == DPBUF_AFXDP) {
+ xpacket = dp_packet_cast_afxdp(packet);
+ umem_elem_push(xpacket->mpool, dp_packet_base(packet));
+ /* Avoid freeing it twice at dp_packet_uninit */
+ xpacket->mpool = NULL;
+ }
+ }
+ xsk_ring_prod__submit(&xsk->tx, batch->count);
+ xsk->outstanding_tx += batch->count;
+
+retry:
+ kick_tx(xsk);
+
+ /* Process CQ */
+ tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count, &idx_cq);
+ if (tx_done > 0) {
+ xsk->outstanding_tx -= tx_done;
+ xsk->tx_npkts += tx_done;
+ }
+
+ /* Recycle back to umem pool */
+ for (j = 0; j < tx_done; j++) {
+ struct umem_elem *elem;
+ uint64_t addr;
+
+ addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+
+ elem = (struct umem_elem *)((char *)xsk->umem->buffer + addr);
+ umem_elem_push(&xsk->umem->mpool, elem);
+ }
+ xsk_ring_cons__release(&xsk->umem->cq, tx_done);
+
+ if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
+ /* If there are still a lot not transmitted,
+ * try harder.
+ */
+ goto retry;
+ }
+
+ return 0;
+}
+
+#else
+struct xsk_socket_info *
+xsk_configure(int ifindex OVS_UNUSED, int xdp_queue_id OVS_UNUSED)
+{
+ return NULL;
+}
+
+void
+xsk_destroy(struct xsk_socket_info *xsk OVS_UNUSED)
+{
+ return;
+}
+
+int
+netdev_linux_rxq_xsk(struct xsk_socket_info *xsk OVS_UNUSED,
+ struct dp_packet_batch *batch OVS_UNUSED)
+{
+ return 0;
+}
+
+int
+netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk OVS_UNUSED,
+ struct dp_packet_batch *batch OVS_UNUSED)
+{
+ return 0;
+}
+
+int
+netdev_afxdp_set_config(struct netdev *netdev OVS_UNUSED,
+ const struct smap *args OVS_UNUSED,
+ char **errp OVS_UNUSED)
+{
+ return 0;
+}
+
+int
+netdev_afxdp_get_config(const struct netdev *netdev OVS_UNUSED,
+ struct smap *args OVS_UNUSED)
+{
+ return 0;
+}
+
+int
+netdev_afxdp_get_numa_id(const struct netdev *netdev OVS_UNUSED)
+{
+ return 0;
+}
+#endif
diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
new file mode 100644
index 000000000000..dd78135c77dc
--- /dev/null
+++ b/lib/netdev-afxdp.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (c) 2018 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_AFXDP_H
+#define NETDEV_AFXDP_H 1
+
+#include <stdint.h>
+#include <stdbool.h>
+
+/* These functions are Linux AF_XDP specific, so they should be used directly
+ * only by Linux-specific code. */
+#define MAX_XSKQ 16
+struct netdev;
+struct xsk_socket_info;
+struct xdp_umem;
+struct dp_packet_batch;
+
+struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id);
+void xsk_destroy(struct xsk_socket_info *xsk);
+
+int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch);
+
+int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+ struct dp_packet_batch *batch);
+
+void xsk_remove_xdp_program(uint32_t ifindex);
+int netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
+ char **errp);
+int netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args);
+int netdev_afxdp_get_numa_id(const struct netdev *netdev);
+
+#endif /* netdev-afxdp.h */
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 84e2f52dc7b8..d4058d3b1d18 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -75,6 +75,7 @@
#include "unaligned.h"
#include "openvswitch/vlog.h"
#include "util.h"
+#include "netdev-afxdp.h"
VLOG_DEFINE_THIS_MODULE(netdev_linux);
@@ -531,6 +532,7 @@ struct netdev_linux {
/* LAG information. */
bool is_lag_master; /* True if the netdev is a LAG master. */
+ struct xsk_socket_info *xsk[MAX_XSKQ]; /* af_xdp socket */
};
struct netdev_rxq_linux {
@@ -580,12 +582,18 @@ is_netdev_linux_class(const struct netdev_class *netdev_class)
}
static bool
+is_afxdp_netdev(const struct netdev *netdev)
+{
+ return netdev_get_class(netdev) == &netdev_afxdp_class;
+}
+
+static bool
is_tap_netdev(const struct netdev *netdev)
{
return netdev_get_class(netdev) == &netdev_tap_class;
}
-static struct netdev_linux *
+struct netdev_linux *
netdev_linux_cast(const struct netdev *netdev)
{
ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
@@ -1084,6 +1092,25 @@ netdev_linux_destruct(struct netdev *netdev_)
atomic_count_dec(&miimon_cnt);
}
+#if HAVE_AF_XDP
+ if (is_afxdp_netdev(netdev_)) {
+ int ifindex;
+ int ret, i;
+
+ ret = get_ifindex(netdev_, &ifindex);
+ if (ret) {
+ VLOG_ERR("get ifindex error");
+ } else {
+ for (i = 0; i < MAX_XSKQ; i++) {
+ if (netdev->xsk[i]) {
+ VLOG_INFO("destroy xsk[%d]", i);
+ xsk_destroy(netdev->xsk[i]);
+ }
+ }
+ xsk_remove_xdp_program(ifindex);
+ }
+ }
+#endif
ovs_mutex_destroy(&netdev->mutex);
}
@@ -1113,6 +1140,32 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
rx->is_tap = is_tap_netdev(netdev_);
if (rx->is_tap) {
rx->fd = netdev->tap_fd;
+ } else if (is_afxdp_netdev(netdev_)) {
+#if HAVE_AF_XDP
+ struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+ int ifindex;
+ int xdp_queue_id = rxq_->queue_id;
+ struct xsk_socket_info *xsk;
+
+ if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+ VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
+ ovs_strerror(errno));
+ ovs_assert(0);
+ }
+
+ VLOG_DBG("%s: %s: queue=%d configuring xdp sock",
+ __func__, netdev_->name, xdp_queue_id);
+
+ /* Get ethernet device index. */
+ error = get_ifindex(&netdev->up, &ifindex);
+ if (error) {
+ goto error;
+ }
+
+ xsk = xsk_configure(ifindex, xdp_queue_id);
+ netdev->xsk[xdp_queue_id] = xsk;
+ rx->fd = xsk_socket__fd(xsk->xsk); /* for netdev layer to poll */
+#endif
} else {
struct sockaddr_ll sll;
int ifindex, val;
@@ -1318,9 +1371,16 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
{
struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
struct netdev *netdev = rx->up.netdev;
- struct dp_packet *buffer;
+ struct dp_packet *buffer = NULL;
ssize_t retval;
int mtu;
+ struct netdev_linux *netdev_ = netdev_linux_cast(netdev);
+
+ if (is_afxdp_netdev(netdev)) {
+ int qid = rxq_->queue_id;
+
+ return netdev_linux_rxq_xsk(netdev_->xsk[qid], batch);
+ }
if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
mtu = ETH_PAYLOAD_MAX;
@@ -1329,6 +1389,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
/* Assume Ethernet port. No need to set packet_type. */
buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
DP_NETDEV_HEADROOM);
+
retval = (rx->is_tap
? netdev_linux_rxq_recv_tap(rx->fd, buffer)
: netdev_linux_rxq_recv_sock(rx->fd, buffer));
@@ -1473,14 +1534,15 @@ netdev_linux_tap_batch_send(struct netdev *netdev_,
* The kernel maintains a packet transmission queue, so the caller is not
* expected to do additional queuing of packets. */
static int
-netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
+netdev_linux_send(struct netdev *netdev_, int qid,
struct dp_packet_batch *batch,
bool concurrent_txq OVS_UNUSED)
{
int error = 0;
int sock = 0;
- if (!is_tap_netdev(netdev_)) {
+ if (!is_tap_netdev(netdev_) &&
+ !is_afxdp_netdev(netdev_)) {
if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) {
error = EOPNOTSUPP;
goto free_batch;
@@ -1499,6 +1561,10 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
}
error = netdev_linux_sock_batch_send(sock, ifindex, batch);
+ } else if (is_afxdp_netdev(netdev_)) {
+ struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+
+ error = netdev_linux_afxdp_batch_send(netdev->xsk[qid], batch);
} else {
error = netdev_linux_tap_batch_send(netdev_, batch);
}
@@ -3322,6 +3388,7 @@ const struct netdev_class netdev_linux_class = {
NETDEV_LINUX_CLASS_COMMON,
LINUX_FLOW_OFFLOAD_API,
.type = "system",
+ .is_pmd = false,
.construct = netdev_linux_construct,
.get_stats = netdev_linux_get_stats,
.get_features = netdev_linux_get_features,
@@ -3332,6 +3399,7 @@ const struct netdev_class netdev_linux_class = {
const struct netdev_class netdev_tap_class = {
NETDEV_LINUX_CLASS_COMMON,
.type = "tap",
+ .is_pmd = false,
.construct = netdev_linux_construct_tap,
.get_stats = netdev_tap_get_stats,
.get_features = netdev_linux_get_features,
@@ -3342,10 +3410,23 @@ const struct netdev_class netdev_internal_class = {
NETDEV_LINUX_CLASS_COMMON,
LINUX_FLOW_OFFLOAD_API,
.type = "internal",
+ .is_pmd = false,
.construct = netdev_linux_construct,
.get_stats = netdev_internal_get_stats,
.get_status = netdev_internal_get_status,
};
+
+const struct netdev_class netdev_afxdp_class = {
+ NETDEV_LINUX_CLASS_COMMON,
+ .type = "afxdp",
+ .is_pmd = true,
+ .construct = netdev_linux_construct,
+ .get_stats = netdev_linux_get_stats,
+ .get_status = netdev_linux_get_status,
+ .set_config = netdev_afxdp_set_config,
+ .get_config = netdev_afxdp_get_config,
+ .get_numa_id = netdev_afxdp_get_numa_id,
+};
#define CODEL_N_QUEUES 0x0000
diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
index 17ca9120168a..afcb20ee8d0a 100644
--- a/lib/netdev-linux.h
+++ b/lib/netdev-linux.h
@@ -28,6 +28,7 @@ struct netdev;
int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
const char *flag_name, bool enable);
int linux_get_ifindex(const char *netdev_name);
+struct netdev_linux *netdev_linux_cast(const struct netdev *netdev);
#define LINUX_FLOW_OFFLOAD_API \
.flow_flush = netdev_tc_flow_flush, \
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index fb0c27e6e8e8..5bf041316503 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -902,6 +902,7 @@ extern const struct netdev_class netdev_linux_class;
#endif
extern const struct netdev_class netdev_internal_class;
extern const struct netdev_class netdev_tap_class;
+extern const struct netdev_class netdev_afxdp_class;
#ifdef __cplusplus
}
diff --git a/lib/netdev.c b/lib/netdev.c
index 7d7ecf6f0946..c30016b34033 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -145,6 +145,7 @@ netdev_initialize(void)
netdev_register_provider(&netdev_linux_class);
netdev_register_provider(&netdev_internal_class);
netdev_register_provider(&netdev_tap_class);
+ netdev_register_provider(&netdev_afxdp_class);
netdev_vport_tunnel_register();
#endif
#if defined(__FreeBSD__) || defined(__NetBSD__)
diff --git a/lib/xdpsock.c b/lib/xdpsock.c
new file mode 100644
index 000000000000..9bd574e61774
--- /dev/null
+++ b/lib/xdpsock.c
@@ -0,0 +1,211 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <config.h>
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <syslog.h>
+#include <time.h>
+#include <unistd.h>
+#include "openvswitch/vlog.h"
+#include "async-append.h"
+#include "coverage.h"
+#include "dirs.h"
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/ofpbuf.h"
+#include "ovs-thread.h"
+#include "sat-math.h"
+#include "socket-util.h"
+#include "svec.h"
+#include "syslog-direct.h"
+#include "syslog-libc.h"
+#include "syslog-provider.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+#include "ovs-atomic.h"
+#include "xdpsock.h"
+#include "openvswitch/compiler.h"
+#include "dp-packet.h"
+
+#ifdef HAVE_AF_XDP
+static inline void ovs_spinlock_init(ovs_spinlock_t *sl)
+{
+ sl->locked = 0;
+}
+
+static inline void ovs_spin_lock(ovs_spinlock_t *sl)
+{
+ int exp = 0;
+
+ while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
+ __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+ while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) {
+ ;
+ }
+ exp = 0;
+ }
+}
+
+static inline void ovs_spin_unlock(ovs_spinlock_t *sl)
+{
+ __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
+}
+
+static inline int ovs_spin_trylock(ovs_spinlock_t *sl)
+{
+ int exp = 0;
+ return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
+ 0, /* disallow spurious failure */
+ __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
+}
+
+void
+__umem_elem_push_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n)
+{
+ void *ptr;
+
+ if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
+ OVS_NOT_REACHED();
+ }
+
+ ptr = &umemp->array[umemp->index];
+ memcpy(ptr, addrs, n * sizeof(void *));
+ umemp->index += n;
+}
+
+inline void
+__umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
+{
+ umemp->array[umemp->index++] = addr;
+}
+
+void
+umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
+{
+
+ if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
+ /* stack is full */
+ /* it's possible that one umem gets pushed twice,
+ * because actions=1,2,3... multiple ports?
+ */
+ OVS_NOT_REACHED();
+ }
+
+ ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
+
+ ovs_spin_lock(&umemp->mutex);
+ __umem_elem_push(umemp, addr);
+ ovs_spin_unlock(&umemp->mutex);
+}
+
+void
+__umem_elem_pop_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n)
+{
+ void *ptr;
+
+ umemp->index -= n;
+
+ if (OVS_UNLIKELY(umemp->index < 0)) {
+ OVS_NOT_REACHED();
+ }
+
+ ptr = &umemp->array[umemp->index];
+ memcpy(addrs, ptr, n * sizeof(void *));
+}
+
+inline void *
+__umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
+{
+ return umemp->array[--umemp->index];
+}
+
+void *
+umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
+{
+ void *ptr;
+
+ ovs_spin_lock(&umemp->mutex);
+ ptr = __umem_elem_pop(umemp);
+ ovs_spin_unlock(&umemp->mutex);
+
+ return ptr;
+}
+
+void **
+__umem_pool_alloc(unsigned int size)
+{
+ void *bufs;
+
+ ovs_assert(posix_memalign(&bufs, getpagesize(),
+ size * sizeof(void *)) == 0);
+ memset(bufs, 0, size * sizeof(void *));
+ return (void **)bufs;
+}
+
+unsigned int
+umem_elem_count(struct umem_pool *mpool)
+{
+ return mpool->index;
+}
+
+int
+umem_pool_init(struct umem_pool *umemp OVS_UNUSED, unsigned int size)
+{
+ umemp->array = __umem_pool_alloc(size);
+ if (!umemp->array) {
+ OVS_NOT_REACHED();
+ }
+
+ umemp->size = size;
+ umemp->index = 0;
+ ovs_spinlock_init(&umemp->mutex);
+ return 0;
+}
+
+void
+umem_pool_cleanup(struct umem_pool *umemp OVS_UNUSED)
+{
+ free(umemp->array);
+}
+
+/* AF_XDP metadata init/destroy */
+int
+xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
+{
+ void *bufs;
+
+ ovs_assert(posix_memalign(&bufs, getpagesize(),
+ size * sizeof(struct dp_packet_afxdp)) == 0);
+ memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
+
+ xp->array = bufs;
+ xp->size = size;
+ return 0;
+}
+
+void
+xpacket_pool_cleanup(struct xpacket_pool *xp)
+{
+ free(xp->array);
+}
+#else /* !HAVE_AF_XDP below */
+#endif
diff --git a/lib/xdpsock.h b/lib/xdpsock.h
new file mode 100644
index 000000000000..cb64befe7dba
--- /dev/null
+++ b/lib/xdpsock.h
@@ -0,0 +1,133 @@
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#ifndef XDPSOCK_H
+#define XDPSOCK_H 1
+#include <errno.h>
+#include <getopt.h>
+#include <libgen.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+#include <linux/if_ether.h>
+#include <net/if.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <net/ethernet.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <locale.h>
+#include <sys/types.h>
+#include <poll.h>
+#include <bpf/libbpf.h>
+
+#include "ovs-atomic.h"
+#include "openvswitch/thread.h"
+
+/* bpf/xsk.h uses the following macros not defined in OVS,
+ * so re-define them before include.
+ */
+#define unlikely OVS_UNLIKELY
+#define likely OVS_LIKELY
+#define barrier() __asm__ __volatile__("": : :"memory")
+#define smp_rmb() barrier()
+#define smp_wmb() barrier()
+#include <bpf/xsk.h>
+
+#define FRAME_HEADROOM XDP_PACKET_HEADROOM
+#define FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
+#define BATCH_SIZE NETDEV_MAX_BURST
+#define FRAME_SHIFT XSK_UMEM__DEFAULT_FRAME_SHIFT
+#define FRAME_SHIFT_MASK ((1<<FRAME_SHIFT)-1)
+
+#define NUM_FRAMES 1024
+#define PROD_NUM_DESCS 128
+#define CONS_NUM_DESCS 128
+
+#ifdef USE_XSK_DEFAULT
+#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
+#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
+#endif
+
+typedef struct {
+ volatile int locked;
+} ovs_spinlock_t;
+
+/* LIFO ptr_array */
+struct umem_pool {
+ int index; /* point to top */
+ unsigned int size;
+ ovs_spinlock_t mutex;
+ void **array; /* a pointer array */
+};
+
+/* array-based dp_packet_afxdp */
+struct xpacket_pool {
+ unsigned int size;
+ struct dp_packet_afxdp **array;
+};
+
+struct xsk_umem_info {
+ struct umem_pool mpool;
+ struct xpacket_pool xpool;
+ struct xsk_ring_prod fq;
+ struct xsk_ring_cons cq;
+ struct xsk_umem *umem;
+ void *buffer;
+};
+
+struct xsk_socket_info {
+ struct xsk_ring_cons rx;
+ struct xsk_ring_prod tx;
+ struct xsk_umem_info *umem;
+ struct xsk_socket *xsk;
+ unsigned long rx_npkts;
+ unsigned long tx_npkts;
+ unsigned long prev_rx_npkts;
+ unsigned long prev_tx_npkts;
+ uint32_t outstanding_tx;
+};
+
+struct umem_elem_head {
+ unsigned int index;
+ struct ovs_mutex mutex;
+ uint32_t n;
+};
+
+struct umem_elem {
+ struct umem_elem *next;
+};
+
+void __umem_elem_push(struct umem_pool *umemp, void *addr);
+void umem_elem_push(struct umem_pool *umemp, void *addr);
+void *__umem_elem_pop(struct umem_pool *umemp);
+void *umem_elem_pop(struct umem_pool *umemp);
+void **__umem_pool_alloc(unsigned int size);
+int umem_pool_init(struct umem_pool *umemp, unsigned int size);
+void umem_pool_cleanup(struct umem_pool *umemp);
+unsigned int umem_elem_count(struct umem_pool *mpool);
+void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n);
+void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n);
+int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
+void xpacket_pool_cleanup(struct xpacket_pool *xp);
+
+#endif
diff --git a/tests/automake.mk b/tests/automake.mk
index 017d2d416156..b591405c1a21 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -4,12 +4,14 @@ EXTRA_DIST += \
$(SYSTEM_TESTSUITE_AT) \
$(SYSTEM_KMOD_TESTSUITE_AT) \
$(SYSTEM_USERSPACE_TESTSUITE_AT) \
+ $(SYSTEM_AFXDP_TESTSUITE_AT) \
$(SYSTEM_OFFLOADS_TESTSUITE_AT) \
$(SYSTEM_DPDK_TESTSUITE_AT) \
$(OVSDB_CLUSTER_TESTSUITE_AT) \
$(TESTSUITE) \
$(SYSTEM_KMOD_TESTSUITE) \
$(SYSTEM_USERSPACE_TESTSUITE) \
+ $(SYSTEM_AFXDP_TESTSUITE) \
$(SYSTEM_OFFLOADS_TESTSUITE) \
$(SYSTEM_DPDK_TESTSUITE) \
$(OVSDB_CLUSTER_TESTSUITE) \
@@ -157,6 +159,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
tests/system-userspace-macros.at \
tests/system-userspace-packet-type-aware.at
+SYSTEM_AFXDP_TESTSUITE_AT = \
+ tests/system-afxdp-testsuite.at \
+ tests/system-afxdp-traffic.at \
+ tests/system-afxdp-macros.at
+
SYSTEM_TESTSUITE_AT = \
tests/system-common-macros.at \
tests/system-ovn.at \
@@ -181,6 +188,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
+SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
@@ -314,6 +322,11 @@ check-system-userspace: all
set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
+check-afxdp: all
+ $(MAKE) install
+ set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
+ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
+
check-offloads: all
set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
@@ -351,6 +364,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
$(AM_V_at)mv $@.tmp $@
+$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
+ $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
+ $(AM_V_at)mv $@.tmp $@
+
$(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
$(AM_V_at)mv $@.tmp $@
diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
new file mode 100644
index 000000000000..2c58c2d6554b
--- /dev/null
+++ b/tests/system-afxdp-macros.at
@@ -0,0 +1,153 @@
+# _ADD_BR([name])
+#
+# Expands into the proper ovs-vsctl commands to create a bridge with the
+# appropriate type and properties
+m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]])
+
+# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
+#
+# Creates a database and starts ovsdb-server, starts ovs-vswitchd
+# connected to that database, calls ovs-vsctl to create a bridge named
+# br0 with predictable settings, passing 'vsctl-args' as additional
+# commands to ovs-vsctl. If 'vsctl-args' causes ovs-vsctl to provide
+# output (e.g. because it includes "create" commands) then 'vsctl-output'
+# specifies the expected output after filtering through uuidfilt.
+m4_define([OVS_TRAFFIC_VSWITCHD_START],
+ [
+ export OVS_PKGDATADIR=$(`pwd`)
+ _OVS_VSWITCHD_START([--disable-system])
+ AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2])
+])
+
+# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
+#
+# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files
+# for messages with severity WARN or higher and signaling an error if any
+# is present. The optional WHITELIST may contain shell-quoted "sed"
+# commands to delete any warnings that are actually expected, e.g.:
+#
+# OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
+#
+# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is
+# invoked. They can be used to perform additional cleanups such as name space
+# removal.
+m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
+ [OVS_VSWITCHD_STOP([dnl
+$1";/netdev_linux.*obtaining netdev stats via vport failed/d
+/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d
+/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
+/dpif(revalidator.*)|WARN|netdev at ovs-netdev: failed to put/d
+"])
+ AT_CHECK([:; $2])
+ ])
+
+m4_define([ADD_VETH_AFXDP],
+ [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
+ CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
+ AT_CHECK([ip link set $1 netns $2])
+ AT_CHECK([ip link set dev ovs-$1 up])
+ AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+ set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"])
+ NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
+ NS_CHECK_EXEC([$2], [ip link set dev $1 up])
+ if test -n "$5"; then
+ NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
+ fi
+ if test -n "$6"; then
+ NS_CHECK_EXEC([$2], [ip route add default via $6])
+ fi
+ on_exit 'ip link del ovs-$1'
+ ]
+)
+
+# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
+m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
+ [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
+ AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
+ AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
+ ]
+)
+
+# CONFIGURE_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads for veths. The userspace datapath uses the AF_PACKET
+# socket to receive packets for veths. Unfortunately, the AF_PACKET socket
+# doesn't play well with offloads:
+# 1. GSO packets are received without segmentation and therefore discarded.
+# 2. Packets with offloaded partial checksum are received with the wrong
+# checksum, therefore discarded by the receiver.
+#
+# By disabling tx offloads in the non-OVS side of the veth peer we make sure
+# that the AF_PACKET socket will not receive bad packets.
+#
+# This is a workaround, and should be removed when offloads are properly
+# supported in netdev-linux.
+m4_define([CONFIGURE_VETH_OFFLOADS],
+ [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
+)
+
+# CHECK_CONNTRACK()
+#
+# Perform requirements checks for running conntrack tests.
+#
+m4_define([CHECK_CONNTRACK],
+ [AT_SKIP_IF([test $HAVE_PYTHON = no])]
+)
+
+# CHECK_CONNTRACK_ALG()
+#
+# Perform requirements checks for running conntrack ALG tests. The userspace
+# supports FTP and TFTP.
+#
+m4_define([CHECK_CONNTRACK_ALG])
+
+# CHECK_CONNTRACK_FRAG()
+#
+# Perform requirements checks for running conntrack fragmentations tests.
+# The userspace doesn't support fragmentation yet, so skip the tests.
+m4_define([CHECK_CONNTRACK_FRAG],
+[
+ AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_LOCAL_STACK()
+#
+# Perform requirements checks for running conntrack tests with local stack.
+# While the kernel connection tracker automatically passes all the connection
+# tracking state from an internal port to the OpenvSwitch kernel module, there
+# is simply no way of doing that with the userspace, so skip the tests.
+m4_define([CHECK_CONNTRACK_LOCAL_STACK],
+[
+ AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_NAT()
+#
+# Perform requirements checks for running conntrack NAT tests. The userspace
+# datapath supports NAT.
+#
+m4_define([CHECK_CONNTRACK_NAT])
+
+# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
+#
+# Perform requirements checks for running ovs-dpctl flush-conntrack by
+# conntrack 5-tuple test. The userspace datapath does not support
+# this feature yet.
+m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
+[
+ AT_SKIP_IF([:])
+])
+
+# CHECK_CT_DPIF_SET_GET_MAXCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-set-maxconns or
+# ovs-dpctl ct-get-maxconns. The userspace datapath does support this feature.
+m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
+
+# CHECK_CT_DPIF_GET_NCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-get-nconns. The
+# userspace datapath does support this feature.
+m4_define([CHECK_CT_DPIF_GET_NCONNS])
diff --git a/tests/system-afxdp-testsuite.at b/tests/system-afxdp-testsuite.at
new file mode 100644
index 000000000000..538c0d15d556
--- /dev/null
+++ b/tests/system-afxdp-testsuite.at
@@ -0,0 +1,26 @@
+AT_INIT
+
+AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.])
+
+m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
+
+m4_include([tests/ovs-macros.at])
+m4_include([tests/ovsdb-macros.at])
+m4_include([tests/ofproto-macros.at])
+m4_include([tests/system-afxdp-macros.at])
+m4_include([tests/system-common-macros.at])
+
+m4_include([tests/system-afxdp-traffic.at])
+m4_include([tests/system-ovn.at])
diff --git a/tests/system-afxdp-traffic.at b/tests/system-afxdp-traffic.at
new file mode 100644
index 000000000000..26f72acf48ef
--- /dev/null
+++ b/tests/system-afxdp-traffic.at
@@ -0,0 +1,978 @@
+AT_BANNER([AF_XDP netdev datapath-sanity])
+
+AT_SETUP([datapath - ping between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ulimit -l unlimited
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
+ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
+ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan tunnel])
+OVS_CHECK_VXLAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [10.1.1.1/24],
+ [id 0 dstport 4789])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan6 tunnel])
+OVS_CHECK_VXLAN_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [10.1.1.1/24],
+ [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over gre tunnel])
+OVS_CHECK_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7])
+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1 options:erspan_hwid=0x7])
+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
+ [options:key=123 options:erspan_ver=1 options:erspan_idx=0x7])
+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
+ [10.1.1.1/24], [local fc00:100::1 seq key 123 erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
+ [options:key=121 options:erspan_ver=2 options:erspan_dir=0 options:erspan_hwid=0x7])
+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
+ [10.1.1.1/24],
+ [local fc00:100::1 seq key 121 erspan_ver 2 erspan_dir ingress erspan_hwid 0x7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve tunnel])
+OVS_CHECK_GENEVE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
+ [vni 0])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve6 tunnel])
+OVS_CHECK_GENEVE_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [10.1.1.1/24],
+ [vni 0 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - clone action])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
+ -- set interface ovs-p1 ofport_request=2])
+
+AT_DATA([flows.txt], [dnl
+priority=1 actions=NORMAL
+priority=10 in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst), output:2
+priority=10 in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst, controller), output:1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - basic truncate action])
+AT_SKIP_IF([test $HAVE_NC = no])
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-ofctl del-flows br0])
+
+dnl Create p0 and ovs-p0(1)
+ADD_NAMESPACES(at_ns0)
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11])
+NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
+
+dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will appear in p1
+AT_CHECK([ip link add p1 type veth peer name ovs-p1])
+on_exit 'ip link del ovs-p1'
+AT_CHECK([ip link set dev ovs-p1 up])
+AT_CHECK([ip link set dev p1 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1 ofport_request=2])
+dnl Use p1 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1 ofport_request=3])
+
+dnl Create p2(5) and ovs-p2(4)
+AT_CHECK([ip link add p2 type veth peer name ovs-p2])
+on_exit 'ip link del ovs-p2'
+AT_CHECK([ip link set dev ovs-p2 up])
+AT_CHECK([ip link set dev p2 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=4])
+dnl Use p2 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=5])
+
+dnl basic test
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+dnl use this file as payload file for ncat
+AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null])
+on_exit 'rm -f payload200.bin'
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
+
+dnl packet with truncated size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=100
+])
+dnl packet with original size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=242
+])
+
+dnl more complicated output actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=684
+])
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+dnl SLOW_ACTION: disable kernel datapath truncate support
+dnl Repeat the test above, but exercise the SLOW_ACTION code path
+AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
+
+dnl SLOW_ACTION test1: check datapatch actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"], [0], [stdout])
+AT_CHECK([tail -3 stdout], [0],
+[Datapath actions: trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
+This flow is handled by the userspace slow path because it:
+ - Uses action(s) not supported by datapath.
+])
+
+dnl SLOW_ACTION test2: check actual packet truncate
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=684
+])
+
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
+AT_BANNER([conntrack])
+
+AT_SETUP([conntrack - controller])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl Check this output. We only see the latter two packets, not the first.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - force commit])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(force,commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
+table=1,in_port=2,ct_state=+trk,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+dnl Check this output. We only see the latter two packets, not the first.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
+NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42 ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
+])
+
+dnl
+dnl Check that the directionality has been changed by force commit.
+dnl
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
+])
+
+dnl OK, now send another packet from port 1 and see that it switches again
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - ct flush by 5-tuple])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),2
+priority=100,in_port=2,udp,action=ct(zone=5,commit),1
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Test UDP from port 1
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [1], [dnl
+])
+
+dnl Test UDP from port 2
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+])
+
+dnl Test ICMP traffic
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [stdout])
+AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
+icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
+])
+
+ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
+ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv4 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - get_nconns and get/set_maxconns])
+CHECK_CONNTRACK()
+CHECK_CT_DPIF_SET_GET_MAXCONNS()
+CHECK_CT_DPIF_GET_NCONNS()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+1
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+3000000
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
+setting maxconns successful
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+0
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv6 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+AT_DATA([flows.txt], [dnl
+
+dnl ICMPv6 echo request and reply go to table 1. The rest of the traffic goes
+dnl through normal action.
+table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
+table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
+table=0,priority=1,action=normal
+
+dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0.
+table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
+table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
+table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
+table=1,priority=1,action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+dnl The above ping creates state in the connection tracker. We're not
+dnl interested in that state.
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
+icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
--
2.7.4
More information about the dev
mailing list