[ovs-dev] [PATCHv3 RFC 0/3] AF_XDP netdev support for OVS

William Tu u9012063 at gmail.com
Wed Nov 28 21:22:19 UTC 2018


The patch series introduces AF_XDP support for OVS netdev.
AF_XDP is a new address family working together with eBPF.
In short, a socket with AF_XDP family can receive and send
packets from an eBPF/XDP program attached to the netdev.
For more details about AF_XDP, please see linux kernel's
Documentation/networking/af_xdp.rst

OVS has a couple of netdev types, i.e., system, tap, or
internal.  The patch first adds a new netdev types called
"afxdp", and implement its configuration, packet reception,
and transmit functions.  Since the AF_XDP socket, xsk,
operates in userspace, once ovs-vswitchd receives packets
from xsk, the proposed architecture re-uses the existing
userspace dpif-netdev datapath.  As a result, most of
the packet processing happens at the userspace instead of
linux kernel.

Architecure
===========
               _
              |   +-------------------+
              |   |    ovs-vswitchd   |<-->ovsdb-server
              |   +-------------------+
              |   |      ofproto      |<-->OpenFlow controllers
              |   +--------+-+--------+ 
              |   | netdev | |ofproto-|
    userspace |   +--------+ |  dpif  |
              |   | netdev | +--------+
              |   |provider| |  dpif  |
              |   +---||---+ +--------+
              |       ||     |  dpif- |
              |       ||     | netdev |
              |_      ||     +--------+  
                      ||         
               _  +---||-----+--------+
              |   | af_xdp prog +     |
       kernel |   |   xsk_map         |
              |_  +--------||---------+
                           ||
                        physical
                           NIC

To simply start, create a ovs userspace bridge using dpif-netdev
by setting the datapath_type to netdev:
# ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev

And attach a linux netdev with type afxdp:
# ovs-vsctl add-port br0 afxdp-p0 -- \
    set interface afxdp-p0 type="afxdp"

Documentation
=============
Most of the design details are described in the paper presetned at
Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
section 4, and slides[2].
This path uses a not-yet upstreamed feature called XDP_ATTACH[3],
described in section 3.1, which is a built-in XDP program for the AF_XDP.
This greatly simplifies the management of XDP/eBPF programs.

[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf

Test Cases
==========
Test cases are created using namespaces and veth peer, with AF_XDP socket
attached to the veth (thus the SKB_MODE).  By issuing "make check-afxdp",
the patch shows the following:

AF_XDP netdev datapath-sanity

  1: datapath - ping between two ports               ok
  2: datapath - http between two ports               ok
  3: datapath - ping between two ports on vlan       ok
  4: datapath - ping6 between two ports              ok
  5: datapath - ping6 between two ports on vlan      ok
  6: datapath - ping over vxlan tunnel               ok
  7: datapath - ping over vxlan6 tunnel              ok
  8: datapath - ping over gre tunnel                 ok
  9: datapath - ping over erspan v1 tunnel           ok
 10: datapath - ping over erspan v2 tunnel           ok
 11: datapath - ping over ip6erspan v1 tunnel        ok
 12: datapath - ping over ip6erspan v2 tunnel        ok
 13: datapath - ping over geneve tunnel              ok
 14: datapath - ping over geneve6 tunnel             ok
 15: datapath - clone action                         ok
 16: datapath - mpls actions                         ok
 17: datapath - basic truncate action                ok

conntrack

 18: conntrack - controller                          ok
 19: conntrack - force commit                        ok
 20: conntrack - ct flush by 5-tuple                 ok
 21: conntrack - IPv4 ping                           ok
 22: conntrack - get_nconns and get/set_maxconns     ok
 23: conntrack - IPv6 ping                           ok
 24: conntrack - preserve registers                  ok
 25: conntrack - invalid                             ok
 26: conntrack - zones                               ok
 27: conntrack - zones from field                    ok
 28: conntrack - multiple bridges                    ok
 29: conntrack - multiple zones                      ok
 30: conntrack - multiple namespaces, internal ports skipped (system-afxdp-traffic.at:1298)
 31: conntrack - ct_mark                             ok
 32: conntrack - ct_mark bit-fiddling                ok

system-ovn

 36: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
 37: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
 38: ovn -- multiple gateway routers, SNAT and DNAT  ok
 39: ovn -- load-balancing                           ok
 40: ovn -- load-balancing - same subnet.            ok
 41: ovn -- load balancing in gateway router         ok
 42: ovn -- multiple gateway routers, load-balancing ok
 43: ovn -- load balancing in router with gateway router port ok
 44: ovn -- DNAT and SNAT on distributed router - N/S ok
 45: ovn -- DNAT and SNAT on distributed router - E/W ok

---
v1->v2:
- add a list to maintain unused umem elements
- remove copy from rx umem to ovs internal buffer
- use hugetlb to reduce misses (not much difference)
- use pmd mode netdev in OVS (huge performance improve)
- remove malloc dp_packet, instead put dp_packet in umem

v2->v3:
- rebase on the OVS master, 7ab4b0653784
  ("configure: Check for more specific function to pull in pthread library.")
- remove the dependency on libbpf and dpif-bpf.
  instead, use the built-in XDP_ATTACH feature.
- data structure optimizations for better performance, see[1]
- more test cases support

William Tu (3):
  netdev-afxdp: add new netdev type for AF_XDP
  tests: add AF_XDP netdev test cases.
  FIXME: work around the failed cases.

 acinclude.m4                    |   13 +
 configure.ac                    |    1 +
 lib/automake.mk                 |    6 +-
 lib/dp-packet.c                 |   20 +
 lib/dp-packet.h                 |   29 +-
 lib/dpif-netdev.c               |    2 +-
 lib/netdev-afxdp.c              |  703 ++++++++++++++++++
 lib/netdev-afxdp.h              |   41 ++
 lib/netdev-linux.c              |   72 +-
 lib/netdev-provider.h           |    1 +
 lib/netdev.c                    |    1 +
 lib/xdpsock.c                   |  171 +++++
 lib/xdpsock.h                   |  144 ++++
 tests/automake.mk               |   17 +
 tests/system-afxdp-macros.at    |  155 ++++
 tests/system-afxdp-testsuite.at |   26 +
 tests/system-afxdp-traffic.at   | 1541 +++++++++++++++++++++++++++++++++++++++
 17 files changed, 2937 insertions(+), 6 deletions(-)
 create mode 100644 lib/netdev-afxdp.c
 create mode 100644 lib/netdev-afxdp.h
 create mode 100644 lib/xdpsock.c
 create mode 100644 lib/xdpsock.h
 create mode 100644 tests/system-afxdp-macros.at
 create mode 100644 tests/system-afxdp-testsuite.at
 create mode 100644 tests/system-afxdp-traffic.at

-- 
2.7.4



More information about the dev mailing list