[ovs-dev] [PATCHv8] netdev-afxdp: add new netdev type for AF_XDP.
Ilya Maximets
i.maximets at samsung.com
Mon May 13 17:48:54 UTC 2019
On 10.05.2019 2:54, William Tu wrote:
> The patch introduces experimental AF_XDP support for OVS netdev.
> AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
> built upon the eBPF and XDP technology. It is aims to have comparable
> performance to DPDK but cooperate better with existing kernel's networking
> stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
> attached to the netdev, by-passing a couple of Linux kernel's subsystems
> As a result, AF_XDP socket shows much better performance than AF_PACKET
> For more details about AF_XDP, please see linux kernel's
> Documentation/networking/af_xdp.rst. Note that by default, this is not
> compiled in.
>
> Signed-off-by: William Tu <u9012063 at gmail.com>
>
> ---
> v1->v2:
> - add a list to maintain unused umem elements
> - remove copy from rx umem to ovs internal buffer
> - use hugetlb to reduce misses (not much difference)
> - use pmd mode netdev in OVS (huge performance improve)
> - remove malloc dp_packet, instead put dp_packet in umem
>
> v2->v3:
> - rebase on the OVS master, 7ab4b0653784
> ("configure: Check for more specific function to pull in pthread library.")
> - remove the dependency on libbpf and dpif-bpf.
> instead, use the built-in XDP_ATTACH feature.
> - data structure optimizations for better performance, see[1]
> - more test cases support
> v3: https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html
>
> v3->v4:
> - Use AF_XDP API provided by libbpf
> - Remove the dependency on XDP_ATTACH kernel patch set
> - Add documentation, bpf.rst
>
> v4->v5:
> - rebase to master
> - remove rfc, squash all into a single patch
> - add --enable-afxdp, so by default, AF_XDP is not compiled
> - add options: xdpmode=drv,skb
> - add multiple queue and multiple PMD support, with options: n_rxq
> - improve documentation, rename bpf.rst to af_xdp.rst
>
> v5->v6
> - rebase to master, commit 0cdd5b13de91b98
> - address errors from sparse and clang
> - pass travis-ci test
> - address feedback from Ben
> - fix issues reported by 0-day robot
> - improved documentation
>
> v6-v7
> - rebase to master, commit abf11558c1515bf3b1
> - address feedbacks from Ilya, Ben, and Eelco, see:
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg32357.html
> - add XDP mode change, implement get/set_config, reconfigure
> - Fix reconfiguration/crash issue caused by libbpf, see patch:
> [PATCH bpf 0/2] libbpf: fixes for AF_XDP teardown
> - perf optimization for batching umem_push/pop
> - perf optimization for batching kick_tx
> - test build with dpdk
> - fix/refactor atomic operation
> - make AF_XDP x86 specific, otherwise fail at build time
> - lots of code refactoring
> - add PVP setup in documentation
>
> v7-v8:
> - Address feedback from Ilya at:
> https://patchwork.ozlabs.org/patch/1095019/
> - add netdev-linux-private.h
> - fix afxdp reconfigure issue
> - sort include headers
> - remove unnecessary OVS_UNUSED
> - coding style fixes
> - error case handling and memory leak
> ---
> Documentation/automake.mk | 1 +
> Documentation/index.rst | 1 +
> Documentation/intro/install/afxdp.rst | 479 +++++++++++++++++
> Documentation/intro/install/index.rst | 1 +
> acinclude.m4 | 32 ++
> configure.ac | 1 +
> lib/automake.mk | 13 +
> lib/dp-packet.c | 33 ++
> lib/dp-packet.h | 22 +-
> lib/dpif-netdev-perf.h | 14 +
> lib/netdev-afxdp.c | 727 +++++++++++++++++++++++++
> lib/netdev-afxdp.h | 53 ++
> lib/netdev-linux-private.h | 124 +++++
> lib/netdev-linux.c | 137 +++--
> lib/netdev-linux.h | 14 +
> lib/netdev-provider.h | 4 +-
> lib/netdev.c | 3 +
> lib/xdpsock.c | 239 +++++++++
> lib/xdpsock.h | 123 +++++
> tests/automake.mk | 17 +
> tests/system-afxdp-macros.at | 153 ++++++
> tests/system-afxdp-testsuite.at | 26 +
> tests/system-afxdp-traffic.at | 978 ++++++++++++++++++++++++++++++++++
> 23 files changed, 3137 insertions(+), 58 deletions(-)
> create mode 100644 Documentation/intro/install/afxdp.rst
> create mode 100644 lib/netdev-afxdp.c
> create mode 100644 lib/netdev-afxdp.h
> create mode 100644 lib/netdev-linux-private.h
> create mode 100644 lib/xdpsock.c
> create mode 100644 lib/xdpsock.h
> create mode 100644 tests/system-afxdp-macros.at
> create mode 100644 tests/system-afxdp-testsuite.at
> create mode 100644 tests/system-afxdp-traffic.at
>
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index 082438e09a33..11cc59efc881 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -10,6 +10,7 @@ DOC_SOURCE = \
> Documentation/intro/why-ovs.rst \
> Documentation/intro/install/index.rst \
> Documentation/intro/install/bash-completion.rst \
> + Documentation/intro/install/afxdp.rst \
> Documentation/intro/install/debian.rst \
> Documentation/intro/install/documentation.rst \
> Documentation/intro/install/distributions.rst \
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index 46261235c732..aa9e7c49f179 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -59,6 +59,7 @@ vSwitch? Start here.
> :doc:`intro/install/windows` |
> :doc:`intro/install/xenserver` |
> :doc:`intro/install/dpdk` |
> + :doc:`intro/install/afxdp` |
> :doc:`Installation FAQs <faq/releases>`
>
> - **Tutorials:** :doc:`tutorials/faucet` |
> diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
> new file mode 100644
> index 000000000000..1222b433dbbb
> --- /dev/null
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -0,0 +1,479 @@
> +..
> + Licensed under the Apache License, Version 2.0 (the "License"); you may
> + not use this file except in compliance with the License. You may obtain
> + a copy of the License at
> +
> + http://www.apache.org/licenses/LICENSE-2.0
> +
> + Unless required by applicable law or agreed to in writing, software
> + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
> + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
> + License for the specific language governing permissions and limitations
> + under the License.
> +
> + Convention for heading levels in Open vSwitch documentation:
> +
> + ======= Heading 0 (reserved for the title in a document)
> + ------- Heading 1
> + ~~~~~~~ Heading 2
> + +++++++ Heading 3
> + ''''''' Heading 4
> +
> + Avoid deeper levels because they do not render well.
> +
> +
> +========================
> +Open vSwitch with AF_XDP
> +========================
> +
> +This document describes how to build and install Open vSwitch using
> +AF_XDP netdev.
> +
> +.. warning::
> + The AF_XDP support of Open vSwitch is considered 'experimental',
> + and it is not compiled in by default.
> +
> +Introduction
> +------------
> +AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
> +built upon the eBPF and XDP technology. It is aims to have comparable
> +performance to DPDK but cooperate better with existing kernel's networking
> +stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
> +attached to the netdev, by-passing a couple of Linux kernel's subsystems.
> +As a result, AF_XDP socket shows much better performance than AF_PACKET.
> +For more details about AF_XDP, please see linux kernel's
> +Documentation/networking/af_xdp.rst
> +
> +
> +AF_XDP Netdev
> +-------------
> +OVS has a couple of netdev types, i.e., system, tap, or
> +internal. The AF_XDP feature adds a new netdev types called
> +"afxdp", and implement its configuration, packet reception,
> +and transmit functions. Since the AF_XDP socket, xsk,
> +operates in userspace, once ovs-vswitchd receives packets
> +from xsk, the proposed architecture re-uses the existing
> +userspace dpif-netdev datapath. As a result, most of
> +the packet processing happens at the userspace instead of
> +linux kernel.
> +
> +::
> +
> + | +-------------------+
> + | | ovs-vswitchd |<-->ovsdb-server
> + | +-------------------+
> + | | ofproto |<-->OpenFlow controllers
> + | +--------+-+--------+
> + | | netdev | |ofproto-|
> + userspace | +--------+ | dpif |
> + | | afxdp | +--------+
> + | | netdev | | dpif |
> + | +---||---+ +--------+
> + | || | dpif- |
> + | || | netdev |
> + |_ || +--------+
> + ||
> + _ +---||-----+--------+
> + | | AF_XDP prog + |
> + kernel | | xsk_map |
> + |_ +--------||---------+
> + ||
> + physical
> + NIC
> +
> +
> +Build requirements
> +------------------
> +
> +In addition to the requirements described in :doc:`general`, building Open
> +vSwitch with AF_XDP will require the following:
> +
> +- libbpf from kernel source tree (kernel 5.0.0 or later)
> +
> +- Linux kernel XDP support, with the following options (required)
> +
> + * CONFIG_BPF=y
> +
> + * CONFIG_BPF_SYSCALL=y
> +
> + * CONFIG_XDP_SOCKETS=y
> +
> +
> +- The following optional Kconfig options are also recommended, but not
> + required:
> +
> + * CONFIG_BPF_JIT=y (Performance)
> +
> + * CONFIG_HAVE_BPF_JIT=y (Performance)
> +
> + * CONFIG_XDP_SOCKETS_DIAG=y (Debugging)
> +
> +- If possible, run **./xdpsock -r -N -z -i <your device>** under
> + linux/samples/bpf. This is the OVS indepedent benchmark tools for AF_XDP.
> + It makes sure your basic kernel requirements are met for AF_XDP.
> +
> +
> +Installing
> +----------
> +For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
> +Frist, clone a recent version of Linux bpf-next tree::
> +
> + git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> +
> +Second, go into the Linux source directory and build libbpf in the tools
> +directory::
> +
> + cd bpf-next/
> + cd tools/lib/bpf/
> + make && make install
> + make install_headers
> +
> +.. note::
> + Make sure xsk.h and bpf.h are installed in system's library path,
> + e.g. /usr/local/include/bpf/ or /usr/include/bpf/
> +
> +Make sure the libbpf.so is installed correctly::
> +
> + ldconfig
> + ldconfig -p | grep libbpf
> +
> +
> +Third, ensure the standard OVS requirements are installed and
> +bootstrap/configure the package::
> +
> + ./boot.sh && ./configure --enable-afxdp
> +
> +Finally, build and install OVS::
> +
> + make && make install
> +
> +To kick start end-to-end autotesting::
> +
> + uname -a # make sure having 5.0+ kernel
> + make check-afxdp
> +
> +if a test case fails, check the log at::
> +
> + cat tests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log
> +
> +
> +Setup AF_XDP netdev
> +-------------------
> +Before running OVS with AF_XDP, make sure the libbpf and libelf are
> +set-up right::
> +
> + ldd vswitchd/ovs-vswitchd
> +
> +Open vSwitch should be started using userspace datapath as described
> +in :doc:`general`::
> +
> + ovs-vswitchd --disable-system
> + ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> +
> +.. note::
> + OVS AF_XDP netdev is using the userspace datapath, the same datapath
> + as used by OVS-DPDK. So it requires --disable-system for ovs-vswitchd
> + and datapath_type=netdev when adding a new bridge.
> +
> +Make sure your device driver support AF_XDP, and to use 1 PMD (on core 4)
> +on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
> +pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
> +
> + ethtool -L enp2s0 combined 1
> + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> + ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> + options:n_rxq=1 options:xdpmode=drv \
> + other_config:pmd-rxq-affinity="0:4"
> +
> +Or, use 4 pmds/cores and 4 queues by doing::
> +
> + ethtool -L enp2s0 combined 4
> + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
> + ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> + options:n_rxq=4 options:xdpmode=drv \
> + other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> +
> +To validate that the bridge has successfully instantiated, you can use the::
> +
> + ovs-vsctl show
> +
> +should show something like::
> +
> + Port "ens802f0"
> + Interface "ens802f0"
> + type: afxdp
> + options: {n_rxq="1", xdpmode=drv}
> +
> +Otherwise, enable debug by::
> +
> + ovs-appctl vlog/set netdev_afxdp::dbg
> +
> +
> +References
> +----------
> +Most of the design details are described in the paper presented at
> +Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
> +section 4, and slides[2][4].
> +"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
> +about AF_XDP current and future work.
> +
> +
> +[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
> +
> +[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
> +
> +[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
> +
> +[4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
> +
> +
> +Performance Tuning
> +------------------
> +The name of the game is to keep your CPU running in userspace, allowing PMD
> +to keep polling the AF_XDP queues without any interferences from kernel.
> +
> +#. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
> + running cores, device plug-in slot)
> +
> +#. Isolate your CPU by doing isolcpu at grub configure.
> +
> +#. IRQ should not set to pmd running core.
> +
> +#. The Spectre and Meltdown fixes increase the overhead of system calls.
> +
> +Debugging performance issue
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +While running the traffic, use linux perf tool to see where your cpu
> +spends its cycle::
> +
> + cd bpf-next/tools/perf
> + make
> + ./perf record -p `pidof ovs-vswitchd` sleep 10
> + ./perf report
> +
> +Measure your system call rate by doing::
> +
> + pstree -p `pidof ovs-vswitchd`
> + strace -c -p <your pmd's PID>
> +
> +Or, use OVS pmd tool::
> +
> + ovs-appctl dpif-netdev/pmd-stats-show
> +
> +
> +Example Script
> +--------------
> +
> +Below is a script using namespaces and veth peer::
> +
> + #!/bin/bash
> + ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
> + --disable-system --detach \
> + ovs-vsctl -- add-br br0 -- set Bridge br0 \
> + protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
> + fail-mode=secure datapath_type=netdev
> + ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> +
> + ip netns add at_ns0
> + ovs-appctl vlog/set netdev_afxdp::dbg
> +
> + ip link add p0 type veth peer name afxdp-p0
> + ip link set p0 netns at_ns0
> + ip link set dev afxdp-p0 up
> + ovs-vsctl add-port br0 afxdp-p0 -- \
> + set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
> +
> + ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> + ip addr add "10.1.1.1/24" dev p0
> + ip link set dev p0 up
> + NS_EXEC_HEREDOC
> +
> + ip netns add at_ns1
> + ip link add p1 type veth peer name afxdp-p1
> + ip link set p1 netns at_ns1
> + ip link set dev afxdp-p1 up
> +
> + ovs-vsctl add-port br0 afxdp-p1 -- \
> + set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
> + ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
> + ip addr add "10.1.1.2/24" dev p1
> + ip link set dev p1 up
> + NS_EXEC_HEREDOC
> +
> + ip netns exec at_ns0 ping -i .2 10.1.1.2
> +
> +
> +Limitations/Known Issues
> +------------------------
> +#. Device's numa ID is always 0, need a way to find numa id from a netdev.
> +#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
> + work-around is to use OpenFlow meter action.
> +#. AF_XDP device added to bridge, remove, and added again will fail.
> +#. Most of the tests are done using i40e single port. Multiple ports and
> + also ixgbe driver also needs to be tested.
> +#. No latency test result (TODO items)
> +
> +
> +make check-afxdp
> +----------------
> +When executing 'make check-afxdp', OVS creates namespaces, sets up AF_XDP on
> +veth devices and kicks start the testing. So far we have the following test
> +cases::
> +
> + AF_XDP netdev datapath-sanity
> +
> + 1: datapath - ping between two ports ok
> + 2: datapath - ping between two ports on vlan ok
> + 3: datapath - ping6 between two ports ok
> + 4: datapath - ping6 between two ports on vlan ok
> + 5: datapath - ping over vxlan tunnel ok
> + 6: datapath - ping over vxlan6 tunnel ok
> + 7: datapath - ping over gre tunnel ok
> + 8: datapath - ping over erspan v1 tunnel ok
> + 9: datapath - ping over erspan v2 tunnel ok
> + 10: datapath - ping over ip6erspan v1 tunnel ok
> + 11: datapath - ping over ip6erspan v2 tunnel ok
> + 12: datapath - ping over geneve tunnel ok
> + 13: datapath - ping over geneve6 tunnel ok
> + 14: datapath - clone action ok
> + 15: datapath - basic truncate action ok
> +
> + conntrack
> +
> + 16: conntrack - controller ok
> + 17: conntrack - force commit ok
> + 18: conntrack - ct flush by 5-tuple ok
> + 19: conntrack - IPv4 ping ok
> + 20: conntrack - get_nconns and get/set_maxconns ok
> + 21: conntrack - IPv6 ping ok
> +
> + system-ovn
> +
> + 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
> + 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
> + 24: ovn -- multiple gateway routers, SNAT and DNAT ok
> + 25: ovn -- load-balancing ok
> + 26: ovn -- load-balancing - same subnet. ok
> + 27: ovn -- load balancing in gateway router ok
> + 28: ovn -- multiple gateway routers, load-balancing ok
> + 29: ovn -- load balancing in router with gateway router port ok
> + 30: ovn -- DNAT and SNAT on distributed router - N/S ok
> + 31: ovn -- DNAT and SNAT on distributed router - E/W ok
> +
> +PVP using tap device
> +--------------------
> +Assume you have enp2s0 as physical nic, and a tap device connected to VM.
> +First, start OVS, then add physical port::
> +
> + ethtool -L enp2s0 combined 1
> + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> + ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> + options:n_rxq=1 options:xdpmode=drv \
> + other_config:pmd-rxq-affinity="0:4"
> +
> +Start a VM with virtio and tap device::
> +
> + qemu-system-x86_64 -hda ubuntu1810.qcow \
> + -m 4096 \
> + -cpu host,+x2apic -enable-kvm \
> + -device virtio-net-pci,mac=00:02:00:00:00:01,netdev=net0,mq=on,\
> + vectors=10,mrg_rxbuf=on,rx_queue_size=1024 \
> + -netdev type=tap,id=net0,vhost=on,queues=8 \
> + -object memory-backend-file,id=mem,size=4096M,\
> + mem-path=/dev/hugepages,share=on \
> + -numa node,memdev=mem -mem-prealloc -smp 2
> +
> +Create OpenFlow rules::
> +
> + ovs-vsctl add-port br0 tap0
> + ovs-ofctl del-flows br0
> + ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:tap0"
> + ovs-ofctl add-flow br0 "in_port=tap0, actions=output:enp2s0"
> +
> +Inside the VM, use xdp_rxq_info to bounce back the traffic::
> +
> + ./xdp_rxq_info --dev ens3 --action XDP_TX
> +
> +The performance number I got is around 700Kpps.
> +This is due to using the kernel's tap interface, which requires copying
> +packet into kernel from the umem buffer in userspace.
> +
> +PVP using vhostuser device
> +--------------------------
> +First, build OVS with DPDK and AFXDP::
> +
> + ./configure --enable-afxdp --with-dpdk=<dpdk path>
> + make -j4 && make install
> +
> +Create a vhost-user port from OVS::
> +
> + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> + ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev \
> + other_config:pmd-cpu-mask=0xfff
> + ovs-vsctl add-port br0 vhost-user-1 \
> + -- set Interface vhost-user-1 type=dpdkvhostuser
> +
> +Start VM using vhost-user mode::
> +
> + qemu-system-x86_64 -hda ubuntu1810.qcow \
> + -m 4096 \
> + -cpu host,+x2apic -enable-kvm \
> + -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 \
> + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce,queues=4 \
> + -device virtio-net-pci,mac=00:00:00:00:00:01,\
> + netdev=mynet1,mq=on,vectors=10 \
> + -object memory-backend-file,id=mem,size=4096M,\
> + mem-path=/dev/hugepages,share=on \
> + -numa node,memdev=mem -mem-prealloc -smp 2
> +
> +Setup the OpenFlow ruls::
> +
> + ovs-ofctl del-flows br0
> + ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:vhost-user-1"
> + ovs-ofctl add-flow br0 "in_port=vhost-user-1, actions=output:enp2s0"
> +
> +Inside the VM, use xdp_rxq_info to drop or bounce back the traffic::
> +
> + ./xdp_rxq_info --dev ens3 --action XDP_DROP
> + ./xdp_rxq_info --dev ens3 --action XDP_TX
> +
> +Performance: for RX_DROP: 6.6Mpps, TX: 2.3Mpps
> +
> +PCP container using veth
> +------------------------
> +Create namespace and veth peer devices::
> +
> + ip netns add at_ns0
> + ip link add p0 type veth peer name afxdp-p0
> + ip link set p0 netns at_ns0
> + ip link set dev afxdp-p0 up
> + ip netns exec at_ns0 ip link set dev p0 up
> +
> +Attach the veth port to br0 (linux kernel mode)::
> +
> + ovs-vsctl add-port br0 afxdp-p0 -- \
> + set interface afxdp-p0 options:n_rxq=1 options:xdpmode=skb
> +
> +
> +Or, use AF_XDP with skb mode::
> +
> + ovs-vsctl add-port br0 afxdp-p0 -- \
> + set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
> +
> +Setup the OpenFlow rules::
> +
> + ovs-ofctl del-flows br0
> + ovs-ofctl add-flow br0 "in_port=enp2s0, actions=output:afxdp-p0"
> + ovs-ofctl add-flow br0 "in_port=afxdp-p0, actions=output:enp2s0"
> +
> +In the namespace, run drop or bounce back the packet::
> +
> + ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_DROP
> + ip netns exec at_ns0 ./xdp_rxq_info --dev p0 --action XDP_TX
> +
> +Performace: for RX_DROP: 800Kpps, TX: 700Kpps
> +
> +Bug Reporting
> +-------------
> +
> +Please report problems to dev at openvswitch.org.
> diff --git a/Documentation/intro/install/index.rst b/Documentation/intro/install/index.rst
> index 3193c736cf17..c27a9c9d16ff 100644
> --- a/Documentation/intro/install/index.rst
> +++ b/Documentation/intro/install/index.rst
> @@ -45,6 +45,7 @@ Installation from Source
> xenserver
> userspace
> dpdk
> + afxdp
>
> Installation from Packages
> --------------------------
> diff --git a/acinclude.m4 b/acinclude.m4
> index b532a4579266..5782f7e4bc2e 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -221,6 +221,38 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
> ])
> ])
>
> +dnl OVS_CHECK_LINUX_AF_XDP
> +dnl
> +dnl Check both Linux kernel AF_XDP and libbpf support
> +AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> + AC_ARG_ENABLE([afxdp],
> + [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
> + [], [enable_afxdp=no])
> + AC_MSG_CHECKING([whether AF_XDP is enabled])
> + if test "$enable_afxdp" != yes; then
> + AC_MSG_RESULT([no])
> + AF_XDP_ENABLE=false
> + else
> + AC_MSG_RESULT([yes])
> + AF_XDP_ENABLE=true
> +
> + AC_CHECK_HEADER([bpf/libbpf.h], [],
> + [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
> +
> + AC_CHECK_HEADER([linux/if_xdp.h], [],
> + [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
> +
> + AC_CHECK_HEADER([bpf/xsk.h], [],
> + [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
> +
> + AC_DEFINE([HAVE_AF_XDP], [1],
> + [Define to 1 if AF_XDP support is available and enabled.])
> + LIBBPF_LDADD=" -lbpf -lelf"
> + AC_SUBST([LIBBPF_LDADD])
> + fi
> + AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
> +])
> +
> dnl OVS_CHECK_DPDK
> dnl
> dnl Configure DPDK source tree
> diff --git a/configure.ac b/configure.ac
> index 505e3d041e93..29c90b73f836 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
> OVS_CHECK_DOT
> OVS_CHECK_IF_DL
> OVS_CHECK_STRTOK_R
> +OVS_CHECK_LINUX_AF_XDP
> AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
> AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
> [], [], [[#include <sys/stat.h>]])
> diff --git a/lib/automake.mk b/lib/automake.mk
> index cc5dccf39d6b..686e57f8c472 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -14,6 +14,10 @@ if WIN32
> lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
> endif
>
> +if HAVE_AF_XDP
> +lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
> +endif
> +
> lib_libopenvswitch_la_LDFLAGS = \
> $(OVS_LTINFO) \
> -Wl,--version-script=$(top_builddir)/lib/libopenvswitch.sym \
> @@ -392,6 +396,7 @@ lib_libopenvswitch_la_SOURCES += \
> lib/if-notifier.h \
> lib/netdev-linux.c \
> lib/netdev-linux.h \
> + lib/netdev-linux-private.h \
> lib/netdev-tc-offloads.c \
> lib/netdev-tc-offloads.h \
> lib/netlink-conntrack.c \
> @@ -409,6 +414,14 @@ lib_libopenvswitch_la_SOURCES += \
> lib/tc.h
> endif
>
> +if HAVE_AF_XDP
> +lib_libopenvswitch_la_SOURCES += \
> + lib/xdpsock.c \
> + lib/xdpsock.h \
> + lib/netdev-afxdp.c \
> + lib/netdev-afxdp.h
> +endif
> +
> if DPDK_NETDEV
> lib_libopenvswitch_la_SOURCES += \
> lib/dpdk.c \
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index 0976a35e758b..7d086dc5e860 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -22,6 +22,9 @@
> #include "netdev-dpdk.h"
> #include "openvswitch/dynamic-string.h"
> #include "util.h"
> +#ifdef HAVE_AF_XDP
> +#include "netdev-afxdp.h"
> +#endif
>
> static void
> dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source source)
> @@ -59,6 +62,27 @@ dp_packet_use(struct dp_packet *b, void *base, size_t allocated)
> dp_packet_use__(b, base, allocated, DPBUF_MALLOC);
> }
>
> +#if HAVE_AF_XDP
> +/* Initialize 'b' as an empty dp_packet that contains
> + * memory starting at AF_XDP umem base.
> + */
> +void
> +dp_packet_use_afxdp(struct dp_packet *b, void *base, size_t allocated)
> +{
> + dp_packet_set_base(b, base);
> + dp_packet_set_data(b, base);
> + dp_packet_set_size(b, 0);
> +
> + dp_packet_set_allocated(b, allocated);
> + b->source = DPBUF_AFXDP;
> + dp_packet_reset_offsets(b);
> + pkt_metadata_init(&b->md, 0);
> + dp_packet_reset_cutlen(b);
> + dp_packet_reset_offload(b);
> + b->packet_type = htonl(PT_ETH);
> +}
> +#endif
> +
> /* Initializes 'b' as an empty dp_packet that contains the 'allocated' bytes of
> * memory starting at 'base'. 'base' should point to a buffer on the stack.
> * (Nothing actually relies on 'base' being allocated on the stack. It could
> @@ -122,6 +146,11 @@ dp_packet_uninit(struct dp_packet *b)
> * created as a dp_packet */
> free_dpdk_buf((struct dp_packet*) b);
> #endif
> + } else if (b->source == DPBUF_AFXDP) {
> +#ifdef HAVE_AF_XDP
> + free_afxdp_buf(b);
> +#endif
> + return;
> }
> }
> }
> @@ -248,6 +277,9 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
> case DPBUF_STACK:
> OVS_NOT_REACHED();
>
> + case DPBUF_AFXDP:
> + OVS_NOT_REACHED();
> +
> case DPBUF_STUB:
> b->source = DPBUF_MALLOC;
> new_base = xmalloc(new_allocated);
> @@ -433,6 +465,7 @@ dp_packet_steal_data(struct dp_packet *b)
> {
> void *p;
> ovs_assert(b->source != DPBUF_DPDK);
> + ovs_assert(b->source != DPBUF_AFXDP);
>
> if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) {
> p = dp_packet_data(b);
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index a5e9ade1244a..0f533201f956 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -25,6 +25,10 @@
> #include <rte_mbuf.h>
> #endif
>
> +#ifdef HAVE_AF_XDP
> +#include "netdev-afxdp.h"
> +#endif
> +
> #include "netdev-dpdk.h"
> #include "openvswitch/list.h"
> #include "packets.h"
> @@ -42,6 +46,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
> DPBUF_DPDK, /* buffer data is from DPDK allocated memory.
> * ref to dp_packet_init_dpdk() in dp-packet.c.
> */
> + DPBUF_AFXDP, /* buffer data from XDP frame */
> };
>
> #define DP_PACKET_CONTEXT_SIZE 64
> @@ -89,6 +94,13 @@ struct dp_packet {
> };
> };
>
> +#if HAVE_AF_XDP
> +struct dp_packet_afxdp {
> + struct umem_pool *mpool;
> + struct dp_packet packet;
> +};
> +#endif
> +
> static inline void *dp_packet_data(const struct dp_packet *);
> static inline void dp_packet_set_data(struct dp_packet *, void *);
> static inline void *dp_packet_base(const struct dp_packet *);
> @@ -122,7 +134,9 @@ static inline const void *dp_packet_get_nd_payload(const struct dp_packet *);
> void dp_packet_use(struct dp_packet *, void *, size_t);
> void dp_packet_use_stub(struct dp_packet *, void *, size_t);
> void dp_packet_use_const(struct dp_packet *, const void *, size_t);
> -
> +#if HAVE_AF_XDP
> +void dp_packet_use_afxdp(struct dp_packet *, void *, size_t);
> +#endif
> void dp_packet_init_dpdk(struct dp_packet *);
>
> void dp_packet_init(struct dp_packet *, size_t);
> @@ -184,6 +198,12 @@ dp_packet_delete(struct dp_packet *b)
> return;
> }
>
> +#ifdef HAVE_AF_XDP
> + if (b->source == DPBUF_AFXDP) {
> + free_afxdp_buf(b);
> + return;
> + }
> +#endif
> dp_packet_uninit(b);
> free(b);
> }
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> index 859c05613ddf..cc91720fad6e 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -198,6 +198,20 @@ cycles_counter_update(struct pmd_perf_stats *s)
> {
> #ifdef DPDK_NETDEV
> return s->last_tsc = rte_get_tsc_cycles();
> +#elif HAVE_AF_XDP
> + /* This is x86-specific instructions. */
> + union {
> + uint64_t tsc_64;
> + struct {
> + uint32_t lo_32;
> + uint32_t hi_32;
> + };
> + } tsc;
> + asm volatile("rdtsc" :
> + "=a" (tsc.lo_32),
> + "=d" (tsc.hi_32));
> +
> + return s->last_tsc = tsc.tsc_64;
> #else
> return s->last_tsc = 0;
> #endif
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> new file mode 100644
> index 000000000000..cd1b9ca8be77
> --- /dev/null
> +++ b/lib/netdev-afxdp.c
> @@ -0,0 +1,727 @@
> +/*
> + * Copyright (c) 2018, 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#if !defined(__i386__) && !defined(__x86_64__)
> +#error AF_XDP supported only for Linux on x86 or x86_64
> +#endif
> +
> +#include <config.h>
> +
> +#include "netdev-linux-private.h"
> +#include "netdev-linux.h"
> +#include "netdev-afxdp.h"
> +
> +#include <arpa/inet.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <inttypes.h>
> +#include <linux/if_ether.h>
> +#include <linux/if_tun.h>
> +#include <linux/types.h>
> +#include <linux/ethtool.h>
> +#include <linux/mii.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/sockios.h>
> +#include <linux/if_xdp.h>
> +#include <net/if.h>
> +#include <net/if_arp.h>
> +#include <net/route.h>
> +#include <netinet/in.h>
> +#include <netpacket/packet.h>
> +#include <poll.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/utsname.h>
> +#include <unistd.h>
> +
> +#include "coverage.h"
> +#include "dp-packet.h"
> +#include "dpif-netlink.h"
> +#include "dpif-netdev.h"
> +#include "fatal-signal.h"
> +#include "hash.h"
> +#include "netdev-provider.h"
> +#include "netdev-tc-offloads.h"
> +#include "netdev-vport.h"
> +#include "netlink-notifier.h"
> +#include "netlink-socket.h"
> +#include "netlink.h"
> +#include "netnsid.h"
> +#include "openflow/openflow.h"
> +#include "openvswitch/dynamic-string.h"
> +#include "openvswitch/hmap.h"
> +#include "openvswitch/ofpbuf.h"
> +#include "openvswitch/poll-loop.h"
> +#include "openvswitch/vlog.h"
> +#include "openvswitch/shash.h"
> +#include "ovs-atomic.h"
> +#include "packets.h"
> +#include "rtnetlink.h"
> +#include "socket-util.h"
> +#include "sset.h"
> +#include "tc.h"
> +#include "timer.h"
> +#include "unaligned.h"
> +#include "util.h"
> +#include "xdpsock.h"
> +
> +#ifndef SOL_XDP
> +#define SOL_XDP 283
> +#endif
> +#ifndef AF_XDP
> +#define AF_XDP 44
> +#endif
> +#ifndef PF_XDP
> +#define PF_XDP AF_XDP
> +#endif
> +
> +VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> +
> +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> +#define UMEM2XPKT(base, i) \
> + ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base + \
> + i * sizeof(struct dp_packet_afxdp))
> +
> +static uint32_t prog_id;
> +static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
> + int mode);
> +static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
> +static void xsk_destroy(struct xsk_socket_info *xsk);
> +
> +static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t size,
> + int xdpmode)
> +{
> + struct xsk_umem_info *umem;
> + int ret;
> + int i;
> +
> + umem = xcalloc(1, sizeof(*umem));
> + ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
> + NULL);
> +
> + if (ret) {
> + VLOG_ERR("xsk umem create failed (%s) mode: %s",
> + ovs_strerror(errno),
> + xdpmode == XDP_COPY ? "SKB": "DRV");
> + free(umem);
> + return NULL;
> + }
> +
> + umem->buffer = buffer;
> +
> + /* set-up umem pool */
> + umem_pool_init(&umem->mpool, NUM_FRAMES);
> +
> + for (i = NUM_FRAMES - 1; i >= 0; i--) {
> + struct umem_elem *elem;
> +
> + elem = ALIGNED_CAST(struct umem_elem *,
> + (char *)umem->buffer + i * FRAME_SIZE);
> + umem_elem_push(&umem->mpool, elem);
> + }
> +
> + /* set-up metadata */
> + xpacket_pool_init(&umem->xpool, NUM_FRAMES);
> +
> + VLOG_DBG("%s xpacket pool from %p to %p", __func__,
> + umem->xpool.array,
> + (char *)umem->xpool.array +
> + NUM_FRAMES * sizeof(struct dp_packet_afxdp));
> +
> + for (i = NUM_FRAMES - 1; i >= 0; i--) {
> + struct dp_packet_afxdp *xpacket;
> + struct dp_packet *packet;
> +
> + xpacket = UMEM2XPKT(umem->xpool.array, i);
> + xpacket->mpool = &umem->mpool;
> +
> + packet = &xpacket->packet;
> + packet->source = DPBUF_AFXDP;
> + }
> +
> + return umem;
> +}
> +
> +static struct xsk_socket_info *
> +xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> + uint32_t queue_id, int xdpmode)
> +{
> + struct xsk_socket_config cfg;
> + struct xsk_socket_info *xsk;
> + char devname[IF_NAMESIZE];
> + uint32_t idx = 0;
> + int ret;
> + int i;
> +
> + xsk = xcalloc(1, sizeof(*xsk));
> + xsk->umem = umem;
> + cfg.rx_size = CONS_NUM_DESCS;
> + cfg.tx_size = PROD_NUM_DESCS;
> + cfg.libbpf_flags = 0;
> +
> + if (xdpmode == XDP_ZEROCOPY) {
> + cfg.bind_flags = XDP_ZEROCOPY;
> + cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> + } else {
> + cfg.bind_flags = XDP_COPY;
> + cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> + }
> +
> + if (if_indextoname(ifindex, devname) == NULL) {
> + VLOG_ERR("ifindex %d to devname failed (%s)",
> + ifindex, ovs_strerror(errno));
> + free(xsk);
> + return NULL;
> + }
> +
> + ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
> + &xsk->rx, &xsk->tx, &cfg);
> + if (ret) {
> + VLOG_ERR("xsk_socket_create failed (%s) mode: %s qid: %d",
> + ovs_strerror(errno),
> + xdpmode == XDP_COPY ? "SKB": "DRV",
> + queue_id);
> + free(xsk);
> + return NULL;
> + }
> +
> + /* Make sure the built-in AF_XDP program is loaded */
> + ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
> + if (ret) {
> + VLOG_ERR("get XDP prog ID failed (%s)", ovs_strerror(errno));
> + xsk_socket__delete(xsk->xsk);
> + free(xsk);
> + return NULL;
> + }
> +
> + xsk_ring_prod__reserve(&xsk->umem->fq, PROD_NUM_DESCS, &idx);
> +
> + for (i = 0;
> + i < PROD_NUM_DESCS * FRAME_SIZE;
> + i += FRAME_SIZE) {
> + struct umem_elem *elem;
> + uint64_t addr;
> +
> + elem = umem_elem_pop(&xsk->umem->mpool);
> + addr = UMEM2DESC(elem, xsk->umem->buffer);
> +
> + *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
> + }
> +
> + xsk_ring_prod__submit(&xsk->umem->fq,
> + PROD_NUM_DESCS);
> + return xsk;
> +}
> +
> +static struct xsk_socket_info *
> +xsk_configure(int ifindex, int xdp_queue_id, int xdpmode)
> +{
> + struct xsk_socket_info *xsk;
> + struct xsk_umem_info *umem;
> + void *bufs;
> + int ret;
> +
> + /* umem memory region */
> + ret = posix_memalign(&bufs, get_page_size(),
> + NUM_FRAMES * FRAME_SIZE);
> + memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
> + ovs_assert(!ret);
> +
> + /* create AF_XDP socket */
> + umem = xsk_configure_umem(bufs,
> + NUM_FRAMES * FRAME_SIZE,
> + xdpmode);
> + if (!umem) {
> + free(bufs);
> + return NULL;
> + }
> +
> + xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode);
> + if (!xsk) {
> + /* clean up umem and xpacket pool */
> + (void)xsk_umem__delete(umem->umem);
> + free(bufs);
> + umem_pool_cleanup(&umem->mpool);
> + xpacket_pool_cleanup(&umem->xpool);
> + free(umem);
> + }
> + return xsk;
> +}
> +
> +int
> +xsk_configure_all(struct netdev *netdev)
> +{
> + struct netdev_linux *dev = netdev_linux_cast(netdev);
> + struct xsk_socket_info *xsk;
> + int i, ifindex;
> +
> + ifindex = linux_get_ifindex(netdev_get_name(netdev));
> +
> + /* configure each queue */
> + for (i = 0; i < netdev->n_rxq; i++) {
> + VLOG_INFO("%s configure queue %d mode %s", __func__, i,
> + dev->xdpmode == XDP_COPY ? "SKB" : "DRV");
> + xsk = xsk_configure(ifindex, i, dev->xdpmode);
> + if (!xsk) {
> + VLOG_ERR("failed to create AF_XDP socket on queue %d", i);
> + goto err;
> + }
> + dev->xsk[i] = xsk;
> + }
> +
> + return 0;
> +
> +err:
> + xsk_destroy_all(netdev);
> + return EINVAL;
> +}
> +
> +static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
> +{
> + struct ds ds = DS_EMPTY_INITIALIZER;
> + ds_put_hex_dump(&ds, buf, count, 0, false);
> + VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
> + ds_destroy(&ds);
> +}
> +
> +static void
> +xsk_destroy(struct xsk_socket_info *xsk)
> +{
> + struct xsk_umem *umem;
> +
> + if (!xsk) {
> + return;
> + }
> +
> + umem = xsk->umem->umem;
> + xsk_socket__delete(xsk->xsk);
> + (void)xsk_umem__delete(umem);
> +
> + /* free the packet buffer */
> + free(xsk->umem->buffer);
> +
> + /* cleanup umem pool */
> + umem_pool_cleanup(&xsk->umem->mpool);
> +
> + /* cleanup metadata pool */
> + xpacket_pool_cleanup(&xsk->umem->xpool);
> +
> + free(xsk->umem);
> + free(xsk);
> +}
> +
> +void
> +xsk_destroy_all(struct netdev *netdev)
> +{
> + struct netdev_linux *dev = netdev_linux_cast(netdev);
> + int i, ifindex;
> +
> + ifindex = linux_get_ifindex(netdev_get_name(netdev));
> +
> + for (i = 0; i < MAX_XSKQ; i++) {
> + if (dev->xsk[i]) {
> + VLOG_INFO("destroy xsk[%d]", i);
> + xsk_destroy(dev->xsk[i]);
> + dev->xsk[i] = NULL;
> + }
> + }
> + VLOG_INFO("remove xdp program");
> + xsk_remove_xdp_program(ifindex, dev->xdpmode);
> +}
> +
> +static inline void OVS_UNUSED
> +print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
> + struct xdp_statistics stat;
> + socklen_t optlen;
> +
> + optlen = sizeof stat;
> + ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP, XDP_STATISTICS,
> + &stat, &optlen) == 0);
> +
> + VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid %llu",
> + stat.rx_dropped,
> + stat.rx_invalid_descs,
> + stat.tx_invalid_descs);
> +}
> +
> +int
> +netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> + char **errp OVS_UNUSED)
> +{
> + struct netdev_linux *dev = netdev_linux_cast(netdev);
> + const char *xdpmode;
> + int new_n_rxq;
> +
> + ovs_mutex_lock(&dev->mutex);
> +
> + new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> + if (new_n_rxq > MAX_XSKQ) {
> + ovs_mutex_unlock(&dev->mutex);
> + return EINVAL;
> + }
> +
> + if (new_n_rxq != netdev->n_rxq) {
> + dev->requested_n_rxq = new_n_rxq;
> + netdev_request_reconfigure(netdev);
> + }
> +
> + xdpmode = smap_get(args, "xdpmode");
> + if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
> + dev->requested_xdpmode = XDP_ZEROCOPY;
> + if (dev->xdpmode != dev->requested_xdpmode) {
> + netdev_request_reconfigure(netdev);
> + }
> + } else {
> + dev->requested_xdpmode = XDP_COPY;
> + if (dev->xdpmode != dev->requested_xdpmode) {
> + netdev_request_reconfigure(netdev);
> + }
> + }
Above code will request reconfiguration infinitely until it reconfiguration
finished. This could cause multiple reconfigurations in a row for the same
configuration change. Better version could look like this:
new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
if (new_n_rxq > MAX_XSKQ) {
ovs_mutex_unlock(&dev->mutex);
VLOG_ERR("%s: Too big 'n_rxq' (%d > %d).",
netdev_get_name(netdev), new_n_rxq, MAX_XSKQ);
return EINVAL;
}
str_xdpmode = smap_get_def(args, "xdpmode", "skb");
if (!strcasecmp(str_xdpmode, "drv")) {
xdpmode = XDP_ZEROCOPY;
} else if (!strcasecmp(str_xdpmode, "skb")) {
xdpmode = XDP_COPY;
} else {
VLOG_ERR("%s: Incorrect xdpmode (%s).",
netdev_get_name(netdev), str_xdpmode);
ovs_mutex_unlock(&dev->mutex);
return EINVAL;
}
if (dev->requested_n_rxq != new_n_rxq
|| dev->requested_xdpmode != xdpmode) {
dev->requested_n_rxq = new_n_rxq;
dev->requested_xdpmode = xdpmode
netdev_request_reconfigure(netdev);
}
The main difference is checking "new" with "requested", not the "new" with
"current". This allows us to request reconfiguration only once for each
change. I also made few cosmetic changes which you may find useful, however
it's up to you.
> + ovs_mutex_unlock(&dev->mutex);
> + return 0;
> +}
> +
> +int
> +netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
> +{
> + struct netdev_linux *dev = netdev_linux_cast(netdev);
> +
> + ovs_mutex_lock(&dev->mutex);
> + smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> + smap_add_format(args, "xdpmode", "%s",
> + dev->xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
> + ovs_mutex_unlock(&dev->mutex);
> + return 0;
> +}
> +
> +int
> +netdev_afxdp_reconfigure(struct netdev *netdev)
> +{
> + struct netdev_linux *dev = netdev_linux_cast(netdev);
> + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
> + int err = 0;
> +
> + ovs_mutex_lock(&dev->mutex);
> +
> + if (netdev->n_rxq == dev->requested_n_rxq
> + && dev->xdpmode == dev->requested_xdpmode) {
> + goto out;
> + }
> +
> + xsk_destroy_all(netdev);
> + netdev->n_rxq = dev->requested_n_rxq;
> +
> + if (dev->requested_xdpmode == XDP_ZEROCOPY) {
> + VLOG_INFO("AF_XDP device %s in DRV mode", netdev_get_name(netdev));
> + /* From SKB mode to DRV mode */
> + dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> + dev->xdp_bind_flags = XDP_ZEROCOPY;
> + dev->xdpmode = XDP_ZEROCOPY;
> +
> + if (setrlimit(RLIMIT_MEMLOCK, &r)) {
> + VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK): %s",
> + ovs_strerror(errno));
> + }
> + } else {
> + VLOG_INFO("AF_XDP device %s in SKB mode", netdev_get_name(netdev));
> + /* From DRV mode to SKB mode */
> + dev->xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> + dev->xdp_bind_flags = XDP_COPY;
> + dev->xdpmode = XDP_COPY;
> + /* TODO: set rlimit back to previous value
> + * when no device is in DRV mode.
> + */
> + }
> +
> + err = xsk_configure_all(netdev);
> + if (err) {
> + VLOG_ERR("AF_XDP device %s reconfig fails", netdev_get_name(netdev));
> + }
> + netdev_change_seq_changed(netdev);
> +out:
> + ovs_mutex_unlock(&dev->mutex);
> + return err;
> +}
> +
> +int
> +netdev_afxdp_get_numa_id(const struct netdev *netdev)
> +{
> + /* FIXME: Get netdev's PCIe device ID, then find
> + * its NUMA node id.
> + */
> + VLOG_INFO("FIXME: Device %s always use numa id 0",
> + netdev_get_name(netdev));
> + return 0;
> +}
> +
> +void
> +xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
> +{
> + uint32_t curr_prog_id = 0;
> + uint32_t flags;
> +
> + /* remove_xdp_program() */
> + if (xdpmode == XDP_COPY) {
> + flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> + } else {
> + flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> + }
> +
> + if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, flags)) {
> + bpf_set_link_xdp_fd(ifindex, -1, flags);
> + }
> + if (prog_id == curr_prog_id) {
> + bpf_set_link_xdp_fd(ifindex, -1, flags);
> + } else if (!curr_prog_id) {
> + VLOG_INFO("couldn't find a prog id on a given interface");
> + } else {
> + VLOG_INFO("program on interface changed, not removing");
> + }
> +}
> +
> +struct dp_packet_afxdp *
> +dp_packet_cast_afxdp(const struct dp_packet *d)
> +{
> + ovs_assert(d->source == DPBUF_AFXDP);
> + return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
> +}
> +
> +void
> +free_afxdp_buf(struct dp_packet *p)
> +{
> + struct dp_packet_afxdp *xpacket;
> + unsigned long addr;
> +
> + xpacket = dp_packet_cast_afxdp(p);
> + if (xpacket->mpool) {
> + void *base = dp_packet_base(p);
> +
> + addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
> + umem_elem_push(xpacket->mpool, (void *)addr);
> + }
> +}
> +
> +void
> +free_afxdp_buf_batch(struct dp_packet_batch *batch)
> +{
> + struct dp_packet_afxdp *xpacket = NULL;
> + struct dp_packet *packet;
> + void *elems[BATCH_SIZE];
> + unsigned long addr;
> +
> + /* all packets are AF_XDP, so handles its own delete in batch */
> + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> + xpacket = dp_packet_cast_afxdp(packet);
> + if (xpacket->mpool) {
> + void *base = dp_packet_base(packet);
> +
> + addr = (unsigned long)base & (~FRAME_SHIFT_MASK);
> + elems[i] = (void *)addr;
> + }
> + }
> + umem_elem_push_n(xpacket->mpool, batch->count, elems);
> + dp_packet_batch_init(batch);
> +}
> +
> +/* Receive packet from AF_XDP socket */
> +int
> +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> + struct dp_packet_batch *batch)
> +{
> + struct umem_elem *elems[BATCH_SIZE];
> + uint32_t idx_rx = 0, idx_fq = 0;
> + unsigned int rcvd, i;
> + int ret = 0;
> +
> + /* See if there is any packet on RX queue,
> + * if yes, idx_rx is the index having the packet.
> + */
> + rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> + if (!rcvd) {
> + return 0;
> + }
> +
> + /* Form a dp_packet batch from descriptor in RX queue */
s/From/To/ ?
> + for (i = 0; i < rcvd; i++) {
> + uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> + uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
> + char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> + uint64_t index;
> +
> + struct dp_packet_afxdp *xpacket;
> + struct dp_packet *packet;
> +
> + index = addr >> FRAME_SHIFT;
> + xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
> +
> + packet = &xpacket->packet;
> + xpacket->mpool = &xsk->umem->mpool;
> +
> + /* Initialize the struct dp_packet */
> + dp_packet_use_afxdp(packet, pkt, FRAME_SIZE - FRAME_HEADROOM);
> + dp_packet_set_size(packet, len);
> +
> + /* Add packet into batch, increase batch->count */
> + dp_packet_batch_add(batch, packet);
> +
> + idx_rx++;
> + }
> +
> + /* We've consume rcvd packets in RX, now re-fill the
> + * same number back to FILL queue.
> + */
> + ret = umem_elem_pop_n(&xsk->umem->mpool, rcvd, (void **)elems);
> + if (OVS_UNLIKELY(ret)) {> + return -ENOMEM;
> + }
Can this be done before actually receiving packets? i.e. don't receive
anything if cant refill.
> +
> + for (i = 0; i < rcvd; i++) {
> + uint64_t index;
> + struct umem_elem *elem;
> +
> + ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> + while (OVS_UNLIKELY(ret == 0)) {
> + /* The FILL queue is full, so retry. (or skip)? */
> + ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> + }
> +
> + /* Get one free umem, program it into FILL queue */
> + elem = elems[i];
> + index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> + ovs_assert((index & FRAME_SHIFT_MASK) == 0);
> + *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
> +
> + idx_fq++;
> + }
> + xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
> +
> + /* Release the RX queue */
> + xsk_ring_cons__release(&xsk->rx, rcvd);
> + xsk->rx_npkts += rcvd;
> +
> +#ifdef AFXDP_DEBUG
> + print_xsk_stat(xsk);
> +#endif
> + return 0;
> +}
> +
> +static inline int kick_tx(struct xsk_socket_info *xsk)
> +{
> + int ret;
> +
> + /* This causes system call into kernel's xsk_sendmsg, and
> + * xsk_generic_xmit (skb mode) or xsk_async_xmit (driver mode).
> + */
> + ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
> + if (OVS_UNLIKELY(ret < 0)) {
> + if (errno == ENXIO || errno == ENOBUFS || errno == EOPNOTSUPP) {
> + return errno;
> + }
> + }
> + /* no error, or EBUSY or EAGAIN */
> + return 0;
> +}
> +
> +int
> +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> + struct dp_packet_batch *batch)
> +{
> + struct umem_elem *elems_pop[BATCH_SIZE];
> + struct umem_elem *elems_push[BATCH_SIZE];
> + uint32_t tx_done, idx_cq = 0;
> + struct dp_packet *packet;
> + uint32_t idx = 0;
> + int j, ret, retry_count = 0;
> + const int max_retry = 4;
> +
> + ret = umem_elem_pop_n(&xsk->umem->mpool, batch->count, (void **)elems_pop);
> + if (OVS_UNLIKELY(ret)) {
> + return EAGAIN;
> + }
> +
> + /* Make sure we have enough TX descs */
> + ret = xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx);
> + if (OVS_UNLIKELY(ret == 0)) {
> + umem_elem_push_n(&xsk->umem->mpool, batch->count, (void **)elems_pop);
> + return EAGAIN;
> + }
> +
> + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> + struct umem_elem *elem;
> + uint64_t index;
> +
> + elem = elems_pop[i];
> + /* Copy the packet to the umem we just pop from umem pool.
> + * We can avoid this copy if the packet and the pop umem
> + * are located in the same umem.
> + */
> + memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
> +
> + index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
> + xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
> + = dp_packet_size(packet);
> + }
> + xsk_ring_prod__submit(&xsk->tx, batch->count);
> + xsk->outstanding_tx += batch->count;
> +
> + ret = kick_tx(xsk);
> + if (OVS_UNLIKELY(ret)) {
> + umem_elem_push_n(&xsk->umem->mpool, batch->count, (void **)elems_pop);
> + VLOG_WARN_RL(&rl, "error sending AF_XDP packet: %s",
> + ovs_strerror(ret));
> + return ret;
> + }
> +
> +retry:
> + /* Process CQ */
> + tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count, &idx_cq);
> + if (tx_done > 0) {
> + xsk->outstanding_tx -= tx_done;
> + xsk->tx_npkts += tx_done;
> + }
> +
> + /* Recycle back to umem pool */
> + for (j = 0; j < tx_done; j++) {
> + struct umem_elem *elem;
> + uint64_t addr;
> +
> + addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
> +
> + elem = ALIGNED_CAST(struct umem_elem *,
> + (char *)xsk->umem->buffer + addr);
> + elems_push[j] = elem;
> + }
> +
> + ret = umem_elem_push_n(&xsk->umem->mpool, tx_done, (void **)elems_push);
> + ovs_assert(ret == 0);
> +
> + xsk_ring_cons__release(&xsk->umem->cq, tx_done);
> +
> + if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
> + /* If there are still a lot not transmitted, try harder. */
> + if (retry_count++ > max_retry) {
> + return 0;
> + }
> + goto retry;
> + }
> +
> + return 0;
> +}
> diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> new file mode 100644
> index 000000000000..6518d8fca0b5
> --- /dev/null
> +++ b/lib/netdev-afxdp.h
> @@ -0,0 +1,53 @@
> +/*
> + * Copyright (c) 2018 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef NETDEV_AFXDP_H
> +#define NETDEV_AFXDP_H 1
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +/* These functions are Linux AF_XDP specific, so they should be used directly
> + * only by Linux-specific code. */
> +#define MAX_XSKQ 16
> +struct netdev;
> +struct xsk_socket_info;
> +struct xdp_umem;
> +struct dp_packet_batch;
> +struct smap;
> +struct dp_packet;
> +
> +struct dp_packet_afxdp * dp_packet_cast_afxdp(const struct dp_packet *d);
> +
> +int xsk_configure_all(struct netdev *netdev);
> +
> +void xsk_destroy_all(struct netdev *netdev);
> +
> +int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> + struct dp_packet_batch *batch);
> +
> +int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> + struct dp_packet_batch *batch);
> +
> +int netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> + char **errp);
> +int netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args);
> +int netdev_afxdp_get_numa_id(const struct netdev *netdev);
> +
> +void free_afxdp_buf(struct dp_packet *p);
> +void free_afxdp_buf_batch(struct dp_packet_batch *batch);
> +int netdev_afxdp_reconfigure(struct netdev *netdev);
> +#endif /* netdev-afxdp.h */
> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
> new file mode 100644
> index 000000000000..3dd3d902b3c4
> --- /dev/null
> +++ b/lib/netdev-linux-private.h
> @@ -0,0 +1,124 @@
> +/*
> + * Copyright (c) 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef NETDEV_LINUX_PRIVATE_H
> +#define NETDEV_LINUX_PRIVATE_H 1
> +
> +#include <config.h>
> +
> +#include <linux/filter.h>
> +#include <linux/gen_stats.h>
> +#include <linux/if_ether.h>
> +#include <linux/if_tun.h>
> +#include <linux/types.h>
> +#include <linux/ethtool.h>
> +#include <linux/mii.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +#include "netdev-provider.h"
> +#include "netdev-tc-offloads.h"
> +#include "netdev-vport.h"
> +#include "openvswitch/thread.h"
> +#include "ovs-atomic.h"
> +#include "timer.h"
> +
> +#if HAVE_AF_XDP
> +#include "netdev-afxdp.h"
> +#endif
> +
> +/* These functions are Linux specific, so they should be used directly only by
> + * Linux-specific code. */
> +
> +struct netdev;
> +
> +int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
> + const char *flag_name, bool enable);
> +int linux_get_ifindex(const char *netdev_name);
> +
> +#define LINUX_FLOW_OFFLOAD_API \
> + .flow_flush = netdev_tc_flow_flush, \
> + .flow_dump_create = netdev_tc_flow_dump_create, \
> + .flow_dump_destroy = netdev_tc_flow_dump_destroy, \
> + .flow_dump_next = netdev_tc_flow_dump_next, \
> + .flow_put = netdev_tc_flow_put, \
> + .flow_get = netdev_tc_flow_get, \
> + .flow_del = netdev_tc_flow_del, \
> + .init_flow_api = netdev_tc_init_flow_api
> +
> +struct netdev_linux {
> + struct netdev up;
> +
> + /* Protects all members below. */
> + struct ovs_mutex mutex;
> +
> + unsigned int cache_valid;
> +
> + bool miimon; /* Link status of last poll. */
> + long long int miimon_interval; /* Miimon Poll rate. Disabled if <= 0. */
> + struct timer miimon_timer;
> +
> + int netnsid; /* Network namespace ID. */
> + /* The following are figured out "on demand" only. They are only valid
> + * when the corresponding VALID_* bit in 'cache_valid' is set. */
> + int ifindex;
> + struct eth_addr etheraddr;
> + int mtu;
> + unsigned int ifi_flags;
> + long long int carrier_resets;
> + uint32_t kbits_rate; /* Policing data. */
> + uint32_t kbits_burst;
> + int vport_stats_error; /* Cached error code from vport_get_stats().
> + 0 or an errno value. */
> + int netdev_mtu_error; /* Cached error code from SIOCGIFMTU
> + * or SIOCSIFMTU.
> + */
> + int ether_addr_error; /* Cached error code from set/get etheraddr. */
> + int netdev_policing_error; /* Cached error code from set policing. */
> + int get_features_error; /* Cached error code from ETHTOOL_GSET. */
> + int get_ifindex_error; /* Cached error code from SIOCGIFINDEX. */
> +
> + enum netdev_features current; /* Cached from ETHTOOL_GSET. */
> + enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
> + enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
> +
> + struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. */
> + struct tc *tc;
> +
> + /* For devices of class netdev_tap_class only. */
> + int tap_fd;
> + bool present; /* If the device is present in the namespace */
> + uint64_t tx_dropped; /* tap device can drop if the iface is down */
> +
> + /* LAG information. */
> + bool is_lag_master; /* True if the netdev is a LAG master. */
> +
> + /* AF_XDP information */
> +#ifdef HAVE_AF_XDP
> + struct xsk_socket_info *xsk[MAX_XSKQ];
> + int requested_n_rxq;
> + int xdpmode, requested_xdpmode; /* detect mode changed */
> + int xdp_flags, xdp_bind_flags;
> +#endif
> +};
> +
> +static struct netdev_linux *
> +netdev_linux_cast(const struct netdev *netdev)
> +{
> + return CONTAINER_OF(netdev, struct netdev_linux, up);
> +}
> +
> +#endif /* netdev-linux-private.h */
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index f75d73fd39f8..1f190406d145 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -17,6 +17,7 @@
> #include <config.h>
>
> #include "netdev-linux.h"
> +#include "netdev-linux-private.h"
>
> #include <errno.h>
> #include <fcntl.h>
> @@ -54,6 +55,7 @@
> #include "fatal-signal.h"
> #include "hash.h"
> #include "openvswitch/hmap.h"
> +#include "netdev-afxdp.h"
> #include "netdev-provider.h"
> #include "netdev-tc-offloads.h"
> #include "netdev-vport.h"
> @@ -487,51 +489,6 @@ static int tc_calc_cell_log(unsigned int mtu);
> static void tc_fill_rate(struct tc_ratespec *rate, uint64_t bps, int mtu);
> static int tc_calc_buffer(unsigned int Bps, int mtu, uint64_t burst_bytes);
>
> -struct netdev_linux {
> - struct netdev up;
> -
> - /* Protects all members below. */
> - struct ovs_mutex mutex;
> -
> - unsigned int cache_valid;
> -
> - bool miimon; /* Link status of last poll. */
> - long long int miimon_interval; /* Miimon Poll rate. Disabled if <= 0. */
> - struct timer miimon_timer;
> -
> - int netnsid; /* Network namespace ID. */
> - /* The following are figured out "on demand" only. They are only valid
> - * when the corresponding VALID_* bit in 'cache_valid' is set. */
> - int ifindex;
> - struct eth_addr etheraddr;
> - int mtu;
> - unsigned int ifi_flags;
> - long long int carrier_resets;
> - uint32_t kbits_rate; /* Policing data. */
> - uint32_t kbits_burst;
> - int vport_stats_error; /* Cached error code from vport_get_stats().
> - 0 or an errno value. */
> - int netdev_mtu_error; /* Cached error code from SIOCGIFMTU or SIOCSIFMTU. */
> - int ether_addr_error; /* Cached error code from set/get etheraddr. */
> - int netdev_policing_error; /* Cached error code from set policing. */
> - int get_features_error; /* Cached error code from ETHTOOL_GSET. */
> - int get_ifindex_error; /* Cached error code from SIOCGIFINDEX. */
> -
> - enum netdev_features current; /* Cached from ETHTOOL_GSET. */
> - enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
> - enum netdev_features supported; /* Cached from ETHTOOL_GSET. */
> -
> - struct ethtool_drvinfo drvinfo; /* Cached from ETHTOOL_GDRVINFO. */
> - struct tc *tc;
> -
> - /* For devices of class netdev_tap_class only. */
> - int tap_fd;
> - bool present; /* If the device is present in the namespace */
> - uint64_t tx_dropped; /* tap device can drop if the iface is down */
> -
> - /* LAG information. */
> - bool is_lag_master; /* True if the netdev is a LAG master. */
> -};
>
> struct netdev_rxq_linux {
> struct netdev_rxq up;
> @@ -579,18 +536,23 @@ is_netdev_linux_class(const struct netdev_class *netdev_class)
> return netdev_class->run == netdev_linux_run;
> }
>
> +#if HAVE_AF_XDP
> static bool
> -is_tap_netdev(const struct netdev *netdev)
> +is_afxdp_netdev(const struct netdev *netdev)
> {
> - return netdev_get_class(netdev) == &netdev_tap_class;
> + return netdev_get_class(netdev) == &netdev_afxdp_class;
> }
> -
> -static struct netdev_linux *
> -netdev_linux_cast(const struct netdev *netdev)
> +#else
> +static bool
> +is_afxdp_netdev(const struct netdev *netdev OVS_UNUSED)
> {
> - ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
> -
> - return CONTAINER_OF(netdev, struct netdev_linux, up);
> + return false;
> +}
> +#endif
> +static bool
> +is_tap_netdev(const struct netdev *netdev)
> +{
> + return netdev_get_class(netdev) == &netdev_tap_class;
> }
>
> static struct netdev_rxq_linux *
> @@ -1084,6 +1046,11 @@ netdev_linux_destruct(struct netdev *netdev_)
> atomic_count_dec(&miimon_cnt);
> }
>
> +#if HAVE_AF_XDP
> + if (is_afxdp_netdev(netdev_)) {
> + xsk_destroy_all(netdev_);
> + }
> +#endif
> ovs_mutex_destroy(&netdev->mutex);
> }
>
> @@ -1113,7 +1080,7 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
> rx->is_tap = is_tap_netdev(netdev_);
> if (rx->is_tap) {
> rx->fd = netdev->tap_fd;
> - } else {
> + } else if (!is_afxdp_netdev(netdev_)) {
> struct sockaddr_ll sll;
> int ifindex, val;
> /* Result of tcpdump -dd inbound */
> @@ -1318,10 +1285,18 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
> {
> struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
> struct netdev *netdev = rx->up.netdev;
> - struct dp_packet *buffer;
> + struct dp_packet *buffer = NULL;
> ssize_t retval;
> int mtu;
>
> +#if HAVE_AF_XDP
> + if (is_afxdp_netdev(netdev)) {
> + struct netdev_linux *dev = netdev_linux_cast(netdev);
> + int qid = rxq_->queue_id;
> +
> + return netdev_linux_rxq_xsk(dev->xsk[qid], batch);
> + }
Maybe it's better to just implement '.rxq_recv' inside netdev-afxdp.c ?
Also, you missed clearing the '*qfill'.
> +#endif
> if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
> mtu = ETH_PAYLOAD_MAX;
> }
> @@ -1329,6 +1304,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
> /* Assume Ethernet port. No need to set packet_type. */
> buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
> DP_NETDEV_HEADROOM);
> +
> retval = (rx->is_tap
> ? netdev_linux_rxq_recv_tap(rx->fd, buffer)
> : netdev_linux_rxq_recv_sock(rx->fd, buffer));
> @@ -1480,7 +1456,8 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
> int error = 0;
> int sock = 0;
>
> - if (!is_tap_netdev(netdev_)) {
> + if (!is_tap_netdev(netdev_) &&
> + !is_afxdp_netdev(netdev_)) {
> if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) {
> error = EOPNOTSUPP;
> goto free_batch;
> @@ -1499,6 +1476,36 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
> }
>
> error = netdev_linux_sock_batch_send(sock, ifindex, batch);
> +#if HAVE_AF_XDP
> + } else if (is_afxdp_netdev(netdev_)) {
> + struct netdev_linux *dev = netdev_linux_cast(netdev_);
> + struct dp_packet_afxdp *xpacket;
> + struct umem_pool *first_mpool;
> + struct dp_packet *packet;
> +
> + error = netdev_linux_afxdp_batch_send(dev->xsk[qid], batch);
> +
> + /* all packets must come frome the same umem pool
> + * and has DPBUF_AFXDP type, otherwise free on-by-one
> + */
> + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> + if (packet->source != DPBUF_AFXDP) {
> + goto free_batch;
> + }
> +
> + xpacket = dp_packet_cast_afxdp(packet);
> + if (i == 0) {
> + first_mpool = xpacket->mpool;
> + continue;
> + }
> + if (xpacket->mpool != first_mpool) {
> + goto free_batch;
> + }
> + }
> + /* free in batch */
> + free_afxdp_buf_batch(batch);
> + return error;
There are a lot of afxdp specific code here and 'netdev_linux_send' doesn't
provide any magic, i.e. has no real code suitable for all netdev types.
Maybe it's better to just implement own '.send' function inside netdev-afxdp.c ?
> +#endif
> } else {
> error = netdev_linux_tap_batch_send(netdev_, batch);
> }
> @@ -3323,6 +3330,7 @@ const struct netdev_class netdev_linux_class = {
> NETDEV_LINUX_CLASS_COMMON,
> LINUX_FLOW_OFFLOAD_API,
> .type = "system",
> + .is_pmd = false,
> .construct = netdev_linux_construct,
> .get_stats = netdev_linux_get_stats,
> .get_features = netdev_linux_get_features,
> @@ -3333,6 +3341,7 @@ const struct netdev_class netdev_linux_class = {
> const struct netdev_class netdev_tap_class = {
> NETDEV_LINUX_CLASS_COMMON,
> .type = "tap",
> + .is_pmd = false,
> .construct = netdev_linux_construct_tap,
> .get_stats = netdev_tap_get_stats,
> .get_features = netdev_linux_get_features,
> @@ -3343,10 +3352,26 @@ const struct netdev_class netdev_internal_class = {
> NETDEV_LINUX_CLASS_COMMON,
> LINUX_FLOW_OFFLOAD_API,
> .type = "internal",
> + .is_pmd = false,
> .construct = netdev_linux_construct,
> .get_stats = netdev_internal_get_stats,
> .get_status = netdev_internal_get_status,
> };
> +
> +#ifdef HAVE_AF_XDP
> +const struct netdev_class netdev_afxdp_class = {
> + NETDEV_LINUX_CLASS_COMMON,
> + .type = "afxdp",
> + .is_pmd = true,
> + .construct = netdev_linux_construct,
> + .get_stats = netdev_linux_get_stats,
> + .get_status = netdev_linux_get_status,
> + .set_config = netdev_afxdp_set_config,
> + .get_config = netdev_afxdp_get_config,
> + .reconfigure = netdev_afxdp_reconfigure,
> + .get_numa_id = netdev_afxdp_get_numa_id,
> +};
> +#endif
>
>
> #define CODEL_N_QUEUES 0x0000
> diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
> index 17ca9120168a..b812e64cb078 100644
> --- a/lib/netdev-linux.h
> +++ b/lib/netdev-linux.h
> @@ -19,6 +19,20 @@
>
> #include <stdint.h>
> #include <stdbool.h>
> +#include <linux/filter.h>
> +#include <linux/gen_stats.h>
> +#include <linux/if_ether.h>
> +#include <linux/if_tun.h>
> +#include <linux/types.h>
> +#include <linux/ethtool.h>
> +#include <linux/mii.h>
> +
> +#include "netdev-provider.h"
> +#include "netdev-tc-offloads.h"
> +#include "netdev-vport.h"
> +#include "openvswitch/thread.h"
> +#include "ovs-atomic.h"
> +#include "timer.h"
>
> /* These functions are Linux specific, so they should be used directly only by
> * Linux-specific code. */
> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> index fb0c27e6e8e8..d433818f7064 100644
> --- a/lib/netdev-provider.h
> +++ b/lib/netdev-provider.h
> @@ -902,7 +902,9 @@ extern const struct netdev_class netdev_linux_class;
> #endif
> extern const struct netdev_class netdev_internal_class;
> extern const struct netdev_class netdev_tap_class;
> -
> +#if HAVE_AF_XDP
> +extern const struct netdev_class netdev_afxdp_class;
> +#endif
> #ifdef __cplusplus
> }
> #endif
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 7d7ecf6f0946..e2fae37d5a5e 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -146,6 +146,9 @@ netdev_initialize(void)
> netdev_register_provider(&netdev_internal_class);
> netdev_register_provider(&netdev_tap_class);
> netdev_vport_tunnel_register();
> +#ifdef HAVE_AF_XDP
> + netdev_register_provider(&netdev_afxdp_class);
> +#endif
> #endif
> #if defined(__FreeBSD__) || defined(__NetBSD__)
> netdev_register_provider(&netdev_tap_class);
> diff --git a/lib/xdpsock.c b/lib/xdpsock.c
> new file mode 100644
> index 000000000000..2d80e74d69e4
> --- /dev/null
> +++ b/lib/xdpsock.c
> @@ -0,0 +1,239 @@
> +/*
> + * Copyright (c) 2018, 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +#include <config.h>
> +
> +#include "xdpsock.h"
> +
> +#include <ctype.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <stdarg.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <syslog.h>
> +#include <time.h>
> +#include <unistd.h>
> +
> +#include "async-append.h"
> +#include "coverage.h"
> +#include "dirs.h"
> +#include "dp-packet.h"
> +#include "openvswitch/compiler.h"
> +#include "openvswitch/vlog.h"
> +#include "ovs-atomic.h"
> +#include "ovs-thread.h"
> +#include "sat-math.h"
> +#include "socket-util.h"
> +#include "svec.h"
> +#include "syslog-direct.h"
> +#include "syslog-libc.h"
> +#include "syslog-provider.h"
> +#include "timeval.h"
> +#include "unixctl.h"
> +#include "util.h"
> +
> +static inline void
> +ovs_spinlock_init(ovs_spinlock_t *sl)
> +{
> + atomic_init(&sl->locked, 0);
> +}
> +
> +static inline void
> +ovs_spin_lock(ovs_spinlock_t *sl)
> +{
> + int exp = 0, locked = 0;
> +
> + while (!atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
> + memory_order_acquire,
> + memory_order_relaxed)) {
> + locked = 1;
> + while (locked) {
> + atomic_read_relaxed(&sl->locked, &locked);
> + }
> + exp = 0;
> + }
> +}
> +
> +static inline void
> +ovs_spin_unlock(ovs_spinlock_t *sl)
> +{
> + atomic_store_explicit(&sl->locked, 0, memory_order_release);
> +}
> +
> +static inline int OVS_UNUSED
> +ovs_spin_trylock(ovs_spinlock_t *sl)
> +{
> + int exp = 0;
> + return atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
> + memory_order_acquire,
> + memory_order_relaxed);
> +}
> +
> +inline int
> +__umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
> +{
> + void *ptr;
> +
> + if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
> + return -ENOMEM;
> + }
> +
> + ptr = &umemp->array[umemp->index];
> + memcpy(ptr, addrs, n * sizeof(void *));
> + umemp->index += n;
> +
> + return 0;
> +}
> +
> +int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs)
> +{
> + int ret;
> +
> + ovs_spin_lock(&umemp->mutex);
> + ret = __umem_elem_push_n(umemp, n, addrs);
> + ovs_spin_unlock(&umemp->mutex);
> +
> + return ret;
> +}
> +
> +inline void
> +__umem_elem_push(struct umem_pool *umemp, void *addr)
> +{
> + umemp->array[umemp->index++] = addr;
> +}
> +
> +void
> +umem_elem_push(struct umem_pool *umemp, void *addr)
> +{
> +
> + if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
> + /* stack is overflow, this should not happen */
> + OVS_NOT_REACHED();
> + }
> +
> + ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
> +
> + ovs_spin_lock(&umemp->mutex);
> + __umem_elem_push(umemp, addr);
> + ovs_spin_unlock(&umemp->mutex);
> +}
> +
> +inline int
> +__umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
> +{
> + void *ptr;
> +
> + if (OVS_UNLIKELY(umemp->index - n < 0)) {
> + return -ENOMEM;
> + }
> +
> + umemp->index -= n;
> + ptr = &umemp->array[umemp->index];
> + memcpy(addrs, ptr, n * sizeof(void *));
> +
> + return 0;
> +}
> +
> +int
> +umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs)
> +{
> + int ret;
> +
> + ovs_spin_lock(&umemp->mutex);
> + ret = __umem_elem_pop_n(umemp, n, addrs);
> + ovs_spin_unlock(&umemp->mutex);
> +
> + return ret;
> +}
> +
> +inline void *
> +__umem_elem_pop(struct umem_pool *umemp)
> +{
> + return umemp->array[--umemp->index];
> +}
> +
> +void *
> +umem_elem_pop(struct umem_pool *umemp)
> +{
> + void *ptr;
> +
> + ovs_spin_lock(&umemp->mutex);
> + ptr = __umem_elem_pop(umemp);
> + ovs_spin_unlock(&umemp->mutex);
> +
> + return ptr;
> +}
> +
> +void **
> +__umem_pool_alloc(unsigned int size)
> +{
> + void *bufs;
> +
> + ovs_assert(posix_memalign(&bufs, getpagesize(),
> + size * sizeof(void *)) == 0);
> + memset(bufs, 0, size * sizeof(void *));
> + return (void **)bufs;
> +}
> +
> +unsigned int
> +umem_elem_count(struct umem_pool *mpool)
> +{
> + return mpool->index;
> +}
> +
> +int
> +umem_pool_init(struct umem_pool *umemp, unsigned int size)
> +{
> + umemp->array = __umem_pool_alloc(size);
> + if (!umemp->array) {
> + OVS_NOT_REACHED();
> + }
> +
> + umemp->size = size;
> + umemp->index = 0;
> + ovs_spinlock_init(&umemp->mutex);
> + return 0;
> +}
> +
> +void
> +umem_pool_cleanup(struct umem_pool *umemp)
> +{
> + free(umemp->array);
> +}
> +
> +/* AF_XDP metadata init/destroy */
> +int
> +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
> +{
> + void *bufs;
> +
> + /* TODO: check HAVE_POSIX_MEMALIGN */
> + ovs_assert(posix_memalign(&bufs, getpagesize(),
> + size * sizeof(struct dp_packet_afxdp)) == 0);
> + memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
> +
> + xp->array = bufs;
> + xp->size = size;
> + return 0;
> +}
> +
> +void
> +xpacket_pool_cleanup(struct xpacket_pool *xp)
> +{
> + free(xp->array);
> +}
> diff --git a/lib/xdpsock.h b/lib/xdpsock.h
> new file mode 100644
> index 000000000000..aabaa8e5df24
> --- /dev/null
> +++ b/lib/xdpsock.h
> @@ -0,0 +1,123 @@
> +/*
> + * Copyright (c) 2018, 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef XDPSOCK_H
> +#define XDPSOCK_H 1
> +
> +#include <bpf/libbpf.h>
> +#include <bpf/xsk.h>
> +#include <errno.h>
> +#include <getopt.h>
> +#include <libgen.h>
> +#include <linux/bpf.h>
> +#include <linux/if_link.h>
> +#include <linux/if_xdp.h>
> +#include <linux/if_ether.h>
> +#include <locale.h>
> +#include <net/if.h>
> +#include <poll.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/resource.h>
> +#include <sys/socket.h>
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <time.h>
> +#include <unistd.h>
> +
> +#include "openvswitch/thread.h"
> +#include "ovs-atomic.h"
> +
> +#define FRAME_HEADROOM XDP_PACKET_HEADROOM
> +#define FRAME_SIZE XSK_UMEM__DEFAULT_FRAME_SIZE
> +#define BATCH_SIZE NETDEV_MAX_BURST
> +#define FRAME_SHIFT XSK_UMEM__DEFAULT_FRAME_SHIFT
> +#define FRAME_SHIFT_MASK ((1 << FRAME_SHIFT) - 1)
> +
> +#define NUM_FRAMES 4096
> +#define PROD_NUM_DESCS 512
> +#define CONS_NUM_DESCS 512
> +
> +#ifdef USE_XSK_DEFAULT
> +#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
> +#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
> +#endif
Should there be ifdef-else-endif ?
> +
> +typedef struct {
> + atomic_int locked;
> +} ovs_spinlock_t;
> +
> +/* LIFO ptr_array */
> +struct umem_pool {
> + int index; /* point to top */
> + unsigned int size;
> + ovs_spinlock_t mutex;
> + void **array; /* a pointer array, point to umem buf */
> +};
> +
> +/* array-based dp_packet_afxdp */
> +struct xpacket_pool {
> + unsigned int size;
> + struct dp_packet_afxdp **array;
> +};
> +
> +struct xsk_umem_info {
> + struct umem_pool mpool;
> + struct xpacket_pool xpool;
> + struct xsk_ring_prod fq;
> + struct xsk_ring_cons cq;
> + struct xsk_umem *umem;
> + void *buffer;
> +};
> +
> +struct xsk_socket_info {
> + struct xsk_ring_cons rx;
> + struct xsk_ring_prod tx;
> + struct xsk_umem_info *umem;
> + struct xsk_socket *xsk;
> + unsigned long rx_npkts;
> + unsigned long tx_npkts;
> + unsigned long prev_rx_npkts;
> + unsigned long prev_tx_npkts;
> + uint32_t outstanding_tx;
> +};
> +
> +struct umem_elem {
> + struct umem_elem *next;
> +};
> +
> +void __umem_elem_push(struct umem_pool *umemp, void *addr);
> +void umem_elem_push(struct umem_pool *umemp, void *addr);
> +int __umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
> +int umem_elem_push_n(struct umem_pool *umemp, int n, void **addrs);
> +
> +void *__umem_elem_pop(struct umem_pool *umemp);
> +void *umem_elem_pop(struct umem_pool *umemp);
> +int __umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
> +int umem_elem_pop_n(struct umem_pool *umemp, int n, void **addrs);
> +
> +void **__umem_pool_alloc(unsigned int size);
> +int umem_pool_init(struct umem_pool *umemp, unsigned int size);
> +void umem_pool_cleanup(struct umem_pool *umemp);
> +unsigned int umem_elem_count(struct umem_pool *mpool);
> +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
> +void xpacket_pool_cleanup(struct xpacket_pool *xp);
> +
> +#endif
> diff --git a/tests/automake.mk b/tests/automake.mk
> index ea16532dd2a0..715cef9a6b3b 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -4,12 +4,14 @@ EXTRA_DIST += \
> $(SYSTEM_TESTSUITE_AT) \
> $(SYSTEM_KMOD_TESTSUITE_AT) \
> $(SYSTEM_USERSPACE_TESTSUITE_AT) \
> + $(SYSTEM_AFXDP_TESTSUITE_AT) \
> $(SYSTEM_OFFLOADS_TESTSUITE_AT) \
> $(SYSTEM_DPDK_TESTSUITE_AT) \
> $(OVSDB_CLUSTER_TESTSUITE_AT) \
> $(TESTSUITE) \
> $(SYSTEM_KMOD_TESTSUITE) \
> $(SYSTEM_USERSPACE_TESTSUITE) \
> + $(SYSTEM_AFXDP_TESTSUITE) \
> $(SYSTEM_OFFLOADS_TESTSUITE) \
> $(SYSTEM_DPDK_TESTSUITE) \
> $(OVSDB_CLUSTER_TESTSUITE) \
> @@ -158,6 +160,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
> tests/system-userspace-macros.at \
> tests/system-userspace-packet-type-aware.at
>
> +SYSTEM_AFXDP_TESTSUITE_AT = \
> + tests/system-afxdp-testsuite.at \
> + tests/system-afxdp-traffic.at \
> + tests/system-afxdp-macros.at
> +
> SYSTEM_TESTSUITE_AT = \
> tests/system-common-macros.at \
> tests/system-ovn.at \
> @@ -182,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
> TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
> SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
> SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
> +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
> SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
> SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
> OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
> @@ -315,6 +323,11 @@ check-system-userspace: all
> set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
>
> +check-afxdp: all
> + $(MAKE) install
> + set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
> + "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> +
> check-offloads: all
> set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> @@ -352,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
> $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> $(AM_V_at)mv $@.tmp $@
>
> +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
> + $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> + $(AM_V_at)mv $@.tmp $@
> +
> $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
> $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> $(AM_V_at)mv $@.tmp $@
> diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> new file mode 100644
> index 000000000000..2c58c2d6554b
> --- /dev/null
> +++ b/tests/system-afxdp-macros.at
> @@ -0,0 +1,153 @@
> +# _ADD_BR([name])
> +#
> +# Expands into the proper ovs-vsctl commands to create a bridge with the
> +# appropriate type and properties
> +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]])
> +
> +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
> +#
> +# Creates a database and starts ovsdb-server, starts ovs-vswitchd
> +# connected to that database, calls ovs-vsctl to create a bridge named
> +# br0 with predictable settings, passing 'vsctl-args' as additional
> +# commands to ovs-vsctl. If 'vsctl-args' causes ovs-vsctl to provide
> +# output (e.g. because it includes "create" commands) then 'vsctl-output'
> +# specifies the expected output after filtering through uuidfilt.
> +m4_define([OVS_TRAFFIC_VSWITCHD_START],
> + [
> + export OVS_PKGDATADIR=$(`pwd`)
> + _OVS_VSWITCHD_START([--disable-system])
> + AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2])
> +])
> +
> +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
> +#
> +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files
> +# for messages with severity WARN or higher and signaling an error if any
> +# is present. The optional WHITELIST may contain shell-quoted "sed"
> +# commands to delete any warnings that are actually expected, e.g.:
> +#
> +# OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
> +#
> +# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is
> +# invoked. They can be used to perform additional cleanups such as name space
> +# removal.
> +m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
> + [OVS_VSWITCHD_STOP([dnl
> +$1";/netdev_linux.*obtaining netdev stats via vport failed/d
> +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d
> +/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
> +/dpif(revalidator.*)|WARN|netdev at ovs-netdev: failed to put/d
> +"])
> + AT_CHECK([:; $2])
> + ])
> +
> +m4_define([ADD_VETH_AFXDP],
> + [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
> + CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
> + AT_CHECK([ip link set $1 netns $2])
> + AT_CHECK([ip link set dev ovs-$1 up])
> + AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
> + set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"])
> + NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
> + NS_CHECK_EXEC([$2], [ip link set dev $1 up])
> + if test -n "$5"; then
> + NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
> + fi
> + if test -n "$6"; then
> + NS_CHECK_EXEC([$2], [ip route add default via $6])
> + fi
> + on_exit 'ip link del ovs-$1'
> + ]
> +)
> +
> +# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
> +#
> +# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
> +m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
> + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
> + AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
> + AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
> + ]
> +)
> +
> +# CONFIGURE_VETH_OFFLOADS([VETH])
> +#
> +# Disable TX offloads for veths. The userspace datapath uses the AF_PACKET
> +# socket to receive packets for veths. Unfortunately, the AF_PACKET socket
> +# doesn't play well with offloads:
> +# 1. GSO packets are received without segmentation and therefore discarded.
> +# 2. Packets with offloaded partial checksum are received with the wrong
> +# checksum, therefore discarded by the receiver.
> +#
> +# By disabling tx offloads in the non-OVS side of the veth peer we make sure
> +# that the AF_PACKET socket will not receive bad packets.
> +#
> +# This is a workaround, and should be removed when offloads are properly
> +# supported in netdev-linux.
> +m4_define([CONFIGURE_VETH_OFFLOADS],
> + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
> +)
> +
> +# CHECK_CONNTRACK()
> +#
> +# Perform requirements checks for running conntrack tests.
> +#
> +m4_define([CHECK_CONNTRACK],
> + [AT_SKIP_IF([test $HAVE_PYTHON = no])]
> +)
> +
> +# CHECK_CONNTRACK_ALG()
> +#
> +# Perform requirements checks for running conntrack ALG tests. The userspace
> +# supports FTP and TFTP.
> +#
> +m4_define([CHECK_CONNTRACK_ALG])
> +
> +# CHECK_CONNTRACK_FRAG()
> +#
> +# Perform requirements checks for running conntrack fragmentations tests.
> +# The userspace doesn't support fragmentation yet, so skip the tests.
> +m4_define([CHECK_CONNTRACK_FRAG],
> +[
> + AT_SKIP_IF([:])
> +])
> +
> +# CHECK_CONNTRACK_LOCAL_STACK()
> +#
> +# Perform requirements checks for running conntrack tests with local stack.
> +# While the kernel connection tracker automatically passes all the connection
> +# tracking state from an internal port to the OpenvSwitch kernel module, there
> +# is simply no way of doing that with the userspace, so skip the tests.
> +m4_define([CHECK_CONNTRACK_LOCAL_STACK],
> +[
> + AT_SKIP_IF([:])
> +])
> +
> +# CHECK_CONNTRACK_NAT()
> +#
> +# Perform requirements checks for running conntrack NAT tests. The userspace
> +# datapath supports NAT.
> +#
> +m4_define([CHECK_CONNTRACK_NAT])
> +
> +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
> +#
> +# Perform requirements checks for running ovs-dpctl flush-conntrack by
> +# conntrack 5-tuple test. The userspace datapath does not support
> +# this feature yet.
> +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
> +[
> + AT_SKIP_IF([:])
> +])
> +
> +# CHECK_CT_DPIF_SET_GET_MAXCONNS()
> +#
> +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or
> +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this feature.
> +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
> +
> +# CHECK_CT_DPIF_GET_NCONNS()
> +#
> +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The
> +# userspace datapath does support this feature.
> +m4_define([CHECK_CT_DPIF_GET_NCONNS])
> diff --git a/tests/system-afxdp-testsuite.at b/tests/system-afxdp-testsuite.at
> new file mode 100644
> index 000000000000..538c0d15d556
> --- /dev/null
> +++ b/tests/system-afxdp-testsuite.at
> @@ -0,0 +1,26 @@
> +AT_INIT
> +
> +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
> +
> +Licensed under the Apache License, Version 2.0 (the "License");
> +you may not use this file except in compliance with the License.
> +You may obtain a copy of the License at:
> +
> + http://www.apache.org/licenses/LICENSE-2.0
> +
> +Unless required by applicable law or agreed to in writing, software
> +distributed under the License is distributed on an "AS IS" BASIS,
> +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> +See the License for the specific language governing permissions and
> +limitations under the License.])
> +
> +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
> +
> +m4_include([tests/ovs-macros.at])
> +m4_include([tests/ovsdb-macros.at])
> +m4_include([tests/ofproto-macros.at])
> +m4_include([tests/system-afxdp-macros.at])
> +m4_include([tests/system-common-macros.at])
> +
> +m4_include([tests/system-afxdp-traffic.at])
> +m4_include([tests/system-ovn.at])
> diff --git a/tests/system-afxdp-traffic.at b/tests/system-afxdp-traffic.at
> new file mode 100644
> index 000000000000..26f72acf48ef
> --- /dev/null
> +++ b/tests/system-afxdp-traffic.at
Why not using the common 'tests/system-traffic.at' ?
If the 'ADD_VETH_AFXDP' is the only macro you need to replace,
you may move the 'ADD_VETH' out to all tests/system-*-macros.at
instead. It'll be much less code duplication.
Another option is to rename 'ADD_VETH' to '_ADD_VETH' with some
additional arguments and implement wrappers in each of specified
tests/system-*-macros.at.
> @@ -0,0 +1,978 @@
> +AT_BANNER([AF_XDP netdev datapath-sanity])
> +
> +AT_SETUP([datapath - ping between two ports])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ulimit -l unlimited
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping between two ports on vlan])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
> +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping6 between two ports])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> +
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping6 between two ports on vlan])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> +
> +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
> +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
> +
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over vxlan tunnel])
> +OVS_CHECK_VXLAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [10.1.1.1/24],
> + [id 0 dstport 4789])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over vxlan6 tunnel])
> +OVS_CHECK_VXLAN_UDP6ZEROCSUM()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [10.1.1.1/24],
> + [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over gre tunnel])
> +OVS_CHECK_GRE()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over erspan v1 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7])
> +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over erspan v2 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1 options:erspan_hwid=0x7])
> +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
> + [options:key=123 options:erspan_ver=1 options:erspan_idx=0x7])
> +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> + [10.1.1.1/24], [local fc00:100::1 seq key 123 erspan_ver 1 erspan 7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
> + [options:key=121 options:erspan_ver=2 options:erspan_dir=0 options:erspan_hwid=0x7])
> +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> + [10.1.1.1/24],
> + [local fc00:100::1 seq key 121 erspan_ver 2 erspan_dir ingress erspan_hwid 0x7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over geneve tunnel])
> +OVS_CHECK_GENEVE()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
> + [vni 0])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over geneve6 tunnel])
> +OVS_CHECK_GENEVE_UDP6ZEROCSUM()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [10.1.1.1/24],
> + [vni 0 udp6zerocsumtx udp6zerocsumrx])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - clone action])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
> + -- set interface ovs-p1 ofport_request=2])
> +
> +AT_DATA([flows.txt], [dnl
> +priority=1 actions=NORMAL
> +priority=10 in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst), output:2
> +priority=10 in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst, controller), output:1
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - basic truncate action])
> +AT_SKIP_IF([test $HAVE_NC = no])
> +OVS_TRAFFIC_VSWITCHD_START()
> +AT_CHECK([ovs-ofctl del-flows br0])
> +
> +dnl Create p0 and ovs-p0(1)
> +ADD_NAMESPACES(at_ns0)
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11])
> +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
> +
> +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will appear in p1
> +AT_CHECK([ip link add p1 type veth peer name ovs-p1])
> +on_exit 'ip link del ovs-p1'
> +AT_CHECK([ip link set dev ovs-p1 up])
> +AT_CHECK([ip link set dev p1 up])
> +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1 ofport_request=2])
> +dnl Use p1 to check the truncated packet
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1 ofport_request=3])
> +
> +dnl Create p2(5) and ovs-p2(4)
> +AT_CHECK([ip link add p2 type veth peer name ovs-p2])
> +on_exit 'ip link del ovs-p2'
> +AT_CHECK([ip link set dev ovs-p2 up])
> +AT_CHECK([ip link set dev p2 up])
> +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=4])
> +dnl Use p2 to check the truncated packet
> +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=5])
> +
> +dnl basic test
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_DATA([flows.txt], [dnl
> +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +dnl use this file as payload file for ncat
> +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null])
> +on_exit 'rm -f payload200.bin'
> +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
> +
> +dnl packet with truncated size
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=100
> +])
> +dnl packet with original size
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=242
> +])
> +
> +dnl more complicated output actions
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_DATA([flows.txt], [dnl
> +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
> +
> +dnl 100 + 100 + 242 + min(65535,242) = 684
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=684
> +])
> +dnl 242 + 100 + min(242,200) = 542
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=542
> +])
> +
> +dnl SLOW_ACTION: disable kernel datapath truncate support
> +dnl Repeat the test above, but exercise the SLOW_ACTION code path
> +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
> +
> +dnl SLOW_ACTION test1: check datapatch actions
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"], [0], [stdout])
> +AT_CHECK([tail -3 stdout], [0],
> +[Datapath actions: trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
> +This flow is handled by the userspace slow path because it:
> + - Uses action(s) not supported by datapath.
> +])
> +
> +dnl SLOW_ACTION test2: check actual packet truncate
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
> +
> +dnl 100 + 100 + 242 + min(65535,242) = 684
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=684
> +])
> +
> +dnl 242 + 100 + min(242,200) = 542
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=542
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +
> +AT_BANNER([conntrack])
> +
> +AT_SETUP([conntrack - controller])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,udp,action=ct(commit),controller
> +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> +priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +AT_CAPTURE_FILE([ofctl_monitor.log])
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
> +
> +dnl Send an unsolicited reply from port 2. This should be dropped.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> +
> +dnl OK, now start a new connection from port 1.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
> +
> +dnl Now try a reply from port 2.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> +
> +dnl Check this output. We only see the latter two packets, not the first.
> +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> +NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
> +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - force commit])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,udp,action=ct(force,commit),controller
> +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> +priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
> +table=1,in_port=2,ct_state=+trk,udp,action=controller
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +AT_CAPTURE_FILE([ofctl_monitor.log])
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
> +
> +dnl Send an unsolicited reply from port 2. This should be dropped.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +dnl OK, now start a new connection from port 1.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +
> +dnl Now try a reply from port 2.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +
> +dnl Check this output. We only see the latter two packets, not the first.
> +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
> +NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42 ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
> +])
> +
> +dnl
> +dnl Check that the directionality has been changed by force commit.
> +dnl
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
> +])
> +
> +dnl OK, now send another packet from port 1 and see that it switches again
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - ct flush by 5-tuple])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,udp,action=ct(commit),2
> +priority=100,in_port=2,udp,action=ct(zone=5,commit),1
> +priority=100,in_port=1,icmp,action=ct(commit),2
> +priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Test UDP from port 1
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [1], [dnl
> +])
> +
> +dnl Test UDP from port 2
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +])
> +
> +dnl Test ICMP traffic
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [stdout])
> +AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
> +icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
> +])
> +
> +ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
> +ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - IPv4 ping])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,icmp,action=ct(commit),2
> +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Pings from ns0->ns1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +dnl Pings from ns1->ns0 should fail.
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
> +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - get_nconns and get/set_maxconns])
> +CHECK_CONNTRACK()
> +CHECK_CT_DPIF_SET_GET_MAXCONNS()
> +CHECK_CT_DPIF_GET_NCONNS()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,icmp,action=ct(commit),2
> +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Pings from ns0->ns1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
> +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
> +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl
> +ovs-vswitchd: datapath not found (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
> +ovs-vswitchd: datapath not found (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
> +ovs-vswitchd: datapath not found (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> +1
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> +3000000
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
> +setting maxconns successful
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> +10
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> +0
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> +10
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - IPv6 ping])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> +
> +AT_DATA([flows.txt], [dnl
> +
> +dnl ICMPv6 echo request and reply go to table 1. The rest of the traffic goes
> +dnl through normal action.
> +table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
> +table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
> +table=0,priority=1,action=normal
> +
> +dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0.
> +table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
> +table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
> +table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
> +table=1,priority=1,action=drop
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> +
> +dnl The above ping creates state in the connection tracker. We're not
> +dnl interested in that state.
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +dnl Pings from ns1->ns0 should fail.
> +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], [0], [dnl
> +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> +])
> +
> +dnl Pings from ns0->ns1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
> +icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
>
More information about the dev
mailing list