[ovs-dev] [PATCH v2] netdev-afxdp: Best-effort configuration of XDP mode.
William Tu
u9012063 at gmail.com
Fri Nov 8 14:22:44 UTC 2019
On Thu, Nov 07, 2019 at 12:36:33PM +0100, Ilya Maximets wrote:
> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>
> Devices like 'veth' interfaces in Linux supports native XDP, but
> doesn't support zero-copy mode. This case can not be covered by
> existing API and we have to use slower generic XDP for such devices.
> There are few more issues, e.g. TCP is not supported in generic XDP
> mode for veth interfaces due to kernel limitations, however it is
> supported in native mode.
>
> This change introduces ability to use native XDP without zero-copy
> along with best-effort configuration option that enabled by default.
> In best-effort case OVS will sequentially try different modes starting
> from the fastest one and will choose the first acceptable for current
> interface. This will guarantee the best possible performance.
>
> If user will want to choose specific mode, it's still possible by
> setting the 'options:xdp-mode'.
>
> This change additionally changes the API by renaming the configuration
> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> themselves to be more user-friendly.
>
> The full list of currently supported modes:
> * native-with-zerocopy - former DRV
> * native - new one, DRV without zero-copy
> * generic - former SKB
> * best-effort - new one, chooses the best available from
> 3 above modes
Since we are renaming the mode, in doc, should we tell user the mapping
of these mode to kernel AF_XDP's mode?
So let user know generic mode in OVS = generic SKB in kernel,
native mode in OVS = native mode without zc...
>
> Since 'best-effort' is a default mode, users will not need to
> explicitely set 'xdp-mode' in most cases.
>
> TCP related tests enabled back in system afxdp testsuite, because
> 'best-effort' will choose 'native' mode for veth interfaces
> and this mode has no issues with TCP.
>
> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> ---
LGTM.
Acked-by: William Tu <u9012063 at gmail.com>
I look through the patch and everything looks good.
Thanks for fixing the documentation.
'make check-afxdp' also pass the previous failed cases due to TCP.
It would also be good if this 'best-effort' mode is supported in libbpf.
I think other users of af_xdp also have the same configuration headache.
Regards,
William
>
> With this patch I modified the user-visible API, but I think it's OK
> since it's still an experimental netdev. Comments are welcome.
>
> Version 2:
> * Rebased on current master.
>
> Documentation/intro/install/afxdp.rst | 54 ++++---
> NEWS | 12 +-
> lib/netdev-afxdp.c | 223 ++++++++++++++++----------
> lib/netdev-afxdp.h | 9 ++
> lib/netdev-linux-private.h | 8 +-
> tests/system-afxdp-macros.at | 7 -
> vswitchd/vswitch.xml | 38 +++--
> 7 files changed, 227 insertions(+), 124 deletions(-)
>
> diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
> index a136db0c9..937770ad0 100644
> --- a/Documentation/intro/install/afxdp.rst
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
> make check-afxdp TESTSUITEFLAGS='1'
>
> .. note::
> - Not all test cases pass at this time. Currenly all TCP related
> - tests, ex: using wget or http, are skipped due to XDP limitations
> - on veth. cvlan test is also skipped.
> + Not all test cases pass at this time. Currenly all cvlan tests are skipped
> + due to kernel issues.
>
> If a test case fails, check the log at::
>
> @@ -177,33 +176,35 @@ in :doc:`general`::
> ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>
> Make sure your device driver support AF_XDP, netdev-afxdp supports
> -the following additional options (see man ovs-vswitchd.conf.db for
> +the following additional options (see ``man ovs-vswitchd.conf.db`` for
> more details):
>
> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
> + ``native`` or ``generic``. Defaults to ``best-effort``, i.e. best of
> + supported modes, so in most cases you don't need to change it.
>
> - * **use-need-wakeup**: default "true" if libbpf supports it, otherwise false.
> + * ``use-need-wakeup``: default ``true`` if libbpf supports it,
> + otherwise ``false``.
>
> For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
> -The **xdpmode** can be "drv" or "skb"::
> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
> +``n_rxq``::
>
> ethtool -L enp2s0 combined 1
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> - options:n_rxq=1 options:xdpmode=drv \
> - other_config:pmd-rxq-affinity="0:4"
> + other_config:pmd-rxq-affinity="0:4"
>
> Or, use 4 pmds/cores and 4 queues by doing::
>
> ethtool -L enp2s0 combined 4
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
> ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> - options:n_rxq=4 options:xdpmode=drv \
> - other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> + options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>
> .. note::
> - pmd-rxq-affinity is optional. If not specified, system will auto-assign.
> + ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
> + ``n_rxq`` equals ``1`` by default.
>
> To validate that the bridge has successfully instantiated, you can use the::
>
> @@ -214,12 +215,21 @@ Should show something like::
> Port "ens802f0"
> Interface "ens802f0"
> type: afxdp
> - options: {n_rxq="1", xdpmode=drv}
> + options: {n_rxq="1"}
>
> Otherwise, enable debugging by::
>
> ovs-appctl vlog/set netdev_afxdp::dbg
>
> +To check which XDP mode was chosen by ``best-effort``, you can look for
> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
> +
> + # ovs-appctl dpctl/show
> + netdev at ovs-netdev:
> + <...>
> + port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
> + xdp-mode=best-effort,
> + xdp-mode-in-use=native-with-zerocopy)
>
> References
> ----------
> @@ -323,8 +333,11 @@ Limitations/Known Issues
> #. Most of the tests are done using i40e single port. Multiple ports and
> also ixgbe driver also needs to be tested.
> #. No latency test result (TODO items)
> -#. Due to limitations of current upstream kernel, TCP and various offloading
> +#. Due to limitations of current upstream kernel, various offloading
> (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
> + Also, TCP is not working over virtual interfaces in generic XDP mode.
> + Some more information and possible workaround available `here
> + <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
>
>
> PVP using tap device
> @@ -335,8 +348,7 @@ First, start OVS, then add physical port::
> ethtool -L enp2s0 combined 1
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> - options:n_rxq=1 options:xdpmode=drv \
> - other_config:pmd-rxq-affinity="0:4"
> + options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
>
> Start a VM with virtio and tap device::
>
> @@ -414,13 +426,11 @@ Create namespace and veth peer devices::
>
> Attach the veth port to br0 (linux kernel mode)::
>
> - ovs-vsctl add-port br0 afxdp-p0 -- \
> - set interface afxdp-p0 options:n_rxq=1
> + ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
>
> -Or, use AF_XDP with skb mode::
> +Or, use AF_XDP::
>
> - ovs-vsctl add-port br0 afxdp-p0 -- \
> - set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
> + ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
>
> Setup the OpenFlow rules::
>
> diff --git a/NEWS b/NEWS
> index 88b818948..d5f476d6e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -5,11 +5,19 @@ Post-v2.12.0
> separate project. You can find it at
> https://github.com/ovn-org/ovn.git
> - Userspace datapath:
> + * Add option to enable, disable and query TCP sequence checking in
> + conntrack.
> + - AF_XDP:
> * New option 'use-need-wakeup' for netdev-afxdp to control enabling
> of corresponding 'need_wakeup' flag in AF_XDP rings. Enabled by default
> if supported by libbpf.
> - * Add option to enable, disable and query TCP sequence checking in
> - conntrack.
> + * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
> + Modes also updated. New values:
> + native-with-zerocopy - former DRV
> + native - new one, DRV without zero-copy
> + generic - former SKB
> + best-effort [default] - new one, chooses the best available from
> + 3 above modes
>
> v2.12.0 - 03 Sep 2019
> ---------------------
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index af654d498..74dde219d 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
> #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
>
> static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
> - int mode, bool use_need_wakeup);
> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
> + enum afxdp_mode mode,
> + bool use_need_wakeup,
> + bool report_socket_failures);
> +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
> static void xsk_destroy(struct xsk_socket_info *xsk);
> static int xsk_configure_all(struct netdev *netdev);
> static void xsk_destroy_all(struct netdev *netdev);
>
> +static struct {
> + const char *name;
> + uint32_t bind_flags;
> + uint32_t xdp_flags;
> +} xdp_modes[] = {
> + [OVS_AF_XDP_MODE_UNSPEC] = {
> + .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
> + },
> + [OVS_AF_XDP_MODE_BEST_EFFORT] = {
> + .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
> + },
> + [OVS_AF_XDP_MODE_NATIVE_ZC] = {
> + .name = "native-with-zerocopy",
> + .bind_flags = XDP_ZEROCOPY,
> + .xdp_flags = XDP_FLAGS_DRV_MODE,
> + },
> + [OVS_AF_XDP_MODE_NATIVE] = {
> + .name = "native",
> + .bind_flags = XDP_COPY,
> + .xdp_flags = XDP_FLAGS_DRV_MODE,
> + },
> + [OVS_AF_XDP_MODE_GENERIC] = {
> + .name = "generic",
> + .bind_flags = XDP_COPY,
> + .xdp_flags = XDP_FLAGS_SKB_MODE,
> + },
> +};
> +
> struct unused_pool {
> struct xsk_umem_info *umem_info;
> int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
> }
>
> static struct xsk_umem_info *
> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
> +xsk_configure_umem(void *buffer, uint64_t size)
> {
> struct xsk_umem_config uconfig;
> struct xsk_umem_info *umem;
> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
> ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
> &uconfig);
> if (ret) {
> - VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
> - ovs_strerror(errno),
> - xdpmode == XDP_COPY ? "SKB": "DRV");
> + VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
> free(umem);
> return NULL;
> }
> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>
> static struct xsk_socket_info *
> xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> - uint32_t queue_id, int xdpmode, bool use_need_wakeup)
> + uint32_t queue_id, enum afxdp_mode mode,
> + bool use_need_wakeup, bool report_socket_failures)
> {
> struct xsk_socket_config cfg;
> struct xsk_socket_info *xsk;
> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> cfg.rx_size = CONS_NUM_DESCS;
> cfg.tx_size = PROD_NUM_DESCS;
> cfg.libbpf_flags = 0;
> -
> - if (xdpmode == XDP_ZEROCOPY) {
> - cfg.bind_flags = XDP_ZEROCOPY;
> - cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> - } else {
> - cfg.bind_flags = XDP_COPY;
> - cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> - }
> + cfg.bind_flags = xdp_modes[mode].bind_flags;
> + cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
>
> #ifdef HAVE_XDP_NEED_WAKEUP
> if (use_need_wakeup) {
> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
> &xsk->rx, &xsk->tx, &cfg);
> if (ret) {
> - VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
> - "use-need-wakeup: %s qid: %d",
> - ovs_strerror(errno),
> - xdpmode == XDP_COPY ? "SKB": "DRV",
> - use_need_wakeup ? "true" : "false",
> - queue_id);
> + VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
> + "xsk_socket__create failed (%s) mode: %s, "
> + "use-need-wakeup: %s, qid: %d",
> + ovs_strerror(errno), xdp_modes[mode].name,
> + use_need_wakeup ? "true" : "false", queue_id);
> free(xsk);
> return NULL;
> }
> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> }
>
> static struct xsk_socket_info *
> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> - bool use_need_wakeup)
> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
> + bool use_need_wakeup, bool report_socket_failures)
> {
> struct xsk_socket_info *xsk;
> struct xsk_umem_info *umem;
> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
>
> /* Create AF_XDP socket. */
> - umem = xsk_configure_umem(bufs,
> - NUM_FRAMES * FRAME_SIZE,
> - xdpmode);
> + umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
> if (!umem) {
> free_pagealign(bufs);
> return NULL;
> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>
> VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
>
> - xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
> - use_need_wakeup);
> + xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
> + use_need_wakeup, report_socket_failures);
> if (!xsk) {
> /* Clean up umem and xpacket pool. */
> if (xsk_umem__delete(umem->umem)) {
> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> return xsk;
> }
>
> +static int
> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
> + enum afxdp_mode mode, bool report_socket_failures)
> +{
> + struct xsk_socket_info *xsk_info;
> +
> + VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
> + netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
> + dev->use_need_wakeup ? "true" : "false");
> + xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
> + report_socket_failures);
> + if (!xsk_info) {
> + VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
> + "%s: Failed to create AF_XDP socket on queue %d in %s mode.",
> + netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
> + dev->xsks[queue_id] = NULL;
> + return -1;
> + }
> + dev->xsks[queue_id] = xsk_info;
> + atomic_init(&xsk_info->tx_dropped, 0);
> + xsk_info->outstanding_tx = 0;
> + xsk_info->available_rx = PROD_NUM_DESCS;
> + return 0;
> +}
> +
> +
> static int
> xsk_configure_all(struct netdev *netdev)
> {
> struct netdev_linux *dev = netdev_linux_cast(netdev);
> - struct xsk_socket_info *xsk_info;
> int i, ifindex, n_rxq, n_txq;
> + int qid = 0;
>
> ifindex = linux_get_ifindex(netdev_get_name(netdev));
>
> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
> n_rxq = netdev_n_rxq(netdev);
> dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
>
> - /* Configure each queue. */
> - for (i = 0; i < n_rxq; i++) {
> - VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
> - netdev_get_name(netdev), i,
> - dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
> - dev->use_need_wakeup ? "true" : "false");
> - xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
> - dev->use_need_wakeup);
> - if (!xsk_info) {
> - VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
> - dev->xsks[i] = NULL;
> + if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
> + /* Trying to configure first queue with different modes to
> + * find the most suitable. */
> + for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
> + if (!xsk_configure_queue(dev, ifindex, qid, i,
> + i == OVS_AF_XDP_MODE_MAX - 1)) {
> + dev->xdp_mode_in_use = i;
> + VLOG_INFO("%s: %s XDP mode will be in use.",
> + netdev_get_name(netdev), xdp_modes[i].name);
> + break;
> + }
> + }
> + if (i == OVS_AF_XDP_MODE_MAX) {
> + VLOG_ERR("%s: Failed to detect suitable XDP mode.",
> + netdev_get_name(netdev));
> + goto err;
> + }
> + qid++;
> + } else {
> + dev->xdp_mode_in_use = dev->xdp_mode;
> + }
> +
> + /* Configure remaining queues. */
> + for (; qid < n_rxq; qid++) {
> + if (xsk_configure_queue(dev, ifindex, qid,
> + dev->xdp_mode_in_use, true)) {
> + VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
> + netdev_get_name(netdev), qid);
> goto err;
> }
> - dev->xsks[i] = xsk_info;
> - atomic_init(&xsk_info->tx_dropped, 0);
> - xsk_info->outstanding_tx = 0;
> - xsk_info->available_rx = PROD_NUM_DESCS;
> }
>
> n_txq = netdev_n_txq(netdev);
> @@ -500,7 +559,7 @@ xsk_destroy_all(struct netdev *netdev)
> if (dev->xsks[i]) {
> xsk_destroy(dev->xsks[i]);
> dev->xsks[i] = NULL;
> - VLOG_INFO("Destroyed xsk[%d].", i);
> + VLOG_DBG("%s: Destroyed xsk[%d].", netdev_get_name(netdev), i);
> }
> }
>
> @@ -510,7 +569,7 @@ xsk_destroy_all(struct netdev *netdev)
>
> VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
> ifindex = linux_get_ifindex(netdev_get_name(netdev));
> - xsk_remove_xdp_program(ifindex, dev->xdpmode);
> + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
>
> if (dev->tx_locks) {
> for (i = 0; i < netdev_n_txq(netdev); i++) {
> @@ -526,9 +585,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> char **errp OVS_UNUSED)
> {
> struct netdev_linux *dev = netdev_linux_cast(netdev);
> - const char *str_xdpmode;
> - int xdpmode, new_n_rxq;
> + const char *str_xdp_mode;
> + enum afxdp_mode xdp_mode;
> bool need_wakeup;
> + int new_n_rxq;
>
> ovs_mutex_lock(&dev->mutex);
> new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> @@ -539,14 +599,17 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> return EINVAL;
> }
>
> - str_xdpmode = smap_get_def(args, "xdpmode", "skb");
> - if (!strcasecmp(str_xdpmode, "drv")) {
> - xdpmode = XDP_ZEROCOPY;
> - } else if (!strcasecmp(str_xdpmode, "skb")) {
> - xdpmode = XDP_COPY;
> - } else {
> - VLOG_ERR("%s: Incorrect xdpmode (%s).",
> - netdev_get_name(netdev), str_xdpmode);
> + str_xdp_mode = smap_get_def(args, "xdp-mode", "best-effort");
> + for (xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
> + xdp_mode < OVS_AF_XDP_MODE_MAX;
> + xdp_mode++) {
> + if (!strcasecmp(str_xdp_mode, xdp_modes[xdp_mode].name)) {
> + break;
> + }
> + }
> + if (xdp_mode == OVS_AF_XDP_MODE_MAX) {
> + VLOG_ERR("%s: Incorrect xdp-mode (%s).",
> + netdev_get_name(netdev), str_xdp_mode);
> ovs_mutex_unlock(&dev->mutex);
> return EINVAL;
> }
> @@ -560,10 +623,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> #endif
>
> if (dev->requested_n_rxq != new_n_rxq
> - || dev->requested_xdpmode != xdpmode
> + || dev->requested_xdp_mode != xdp_mode
> || dev->requested_need_wakeup != need_wakeup) {
> dev->requested_n_rxq = new_n_rxq;
> - dev->requested_xdpmode = xdpmode;
> + dev->requested_xdp_mode = xdp_mode;
> dev->requested_need_wakeup = need_wakeup;
> netdev_request_reconfigure(netdev);
> }
> @@ -578,8 +641,9 @@ netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
>
> ovs_mutex_lock(&dev->mutex);
> smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> - smap_add_format(args, "xdpmode", "%s",
> - dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb");
> + smap_add_format(args, "xdp-mode", "%s", xdp_modes[dev->xdp_mode].name);
> + smap_add_format(args, "xdp-mode-in-use", "%s",
> + xdp_modes[dev->xdp_mode_in_use].name);
> smap_add_format(args, "use-need-wakeup", "%s",
> dev->use_need_wakeup ? "true" : "false");
> ovs_mutex_unlock(&dev->mutex);
> @@ -596,7 +660,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
> ovs_mutex_lock(&dev->mutex);
>
> if (netdev->n_rxq == dev->requested_n_rxq
> - && dev->xdpmode == dev->requested_xdpmode
> + && dev->xdp_mode == dev->requested_xdp_mode
> && dev->use_need_wakeup == dev->requested_need_wakeup
> && dev->xsks) {
> goto out;
> @@ -607,9 +671,9 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
> netdev->n_rxq = dev->requested_n_rxq;
> netdev->n_txq = netdev->n_rxq;
>
> - dev->xdpmode = dev->requested_xdpmode;
> + dev->xdp_mode = dev->requested_xdp_mode;
> VLOG_INFO("%s: Setting XDP mode to %s.", netdev_get_name(netdev),
> - dev->xdpmode == XDP_ZEROCOPY ? "DRV" : "SKB");
> + xdp_modes[dev->xdp_mode].name);
>
> if (setrlimit(RLIMIT_MEMLOCK, &r)) {
> VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", ovs_strerror(errno));
> @@ -618,7 +682,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>
> err = xsk_configure_all(netdev);
> if (err) {
> - VLOG_ERR("AF_XDP device %s reconfig failed.", netdev_get_name(netdev));
> + VLOG_ERR("%s: AF_XDP device reconfiguration failed.",
> + netdev_get_name(netdev));
> }
> netdev_change_seq_changed(netdev);
> out:
> @@ -638,17 +703,9 @@ netdev_afxdp_get_numa_id(const struct netdev *netdev)
> }
>
> static void
> -xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
> +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
> {
> - uint32_t flags;
> -
> - flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> -
> - if (xdpmode == XDP_COPY) {
> - flags |= XDP_FLAGS_SKB_MODE;
> - } else if (xdpmode == XDP_ZEROCOPY) {
> - flags |= XDP_FLAGS_DRV_MODE;
> - }
> + uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
>
> bpf_set_link_xdp_fd(ifindex, -1, flags);
> }
> @@ -662,7 +719,7 @@ signal_remove_xdp(struct netdev *netdev)
> ifindex = linux_get_ifindex(netdev_get_name(netdev));
>
> VLOG_WARN("Force removing xdp program.");
> - xsk_remove_xdp_program(ifindex, dev->xdpmode);
> + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
> }
>
> static struct dp_packet_afxdp *
> @@ -782,7 +839,8 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
> }
>
> static inline int
> -kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
> +kick_tx(struct xsk_socket_info *xsk_info, enum afxdp_mode mode,
> + bool use_need_wakeup)
> {
> int ret, retries;
> static const int KERNEL_TX_BATCH_SIZE = 16;
> @@ -791,11 +849,11 @@ kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
> return 0;
> }
>
> - /* In SKB_MODE packet transmission is synchronous, and the kernel xmits
> + /* In generic mode packet transmission is synchronous, and the kernel xmits
> * only TX_BATCH_SIZE(16) packets for a single sendmsg syscall.
> * So, we have to kick the kernel (n_packets / 16) times to be sure that
> * all packets are transmitted. */
> - retries = (xdpmode == XDP_COPY)
> + retries = (mode == OVS_AF_XDP_MODE_GENERIC)
> ? xsk_info->outstanding_tx / KERNEL_TX_BATCH_SIZE
> : 0;
> kick_retry:
> @@ -962,7 +1020,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
> &orig);
> COVERAGE_INC(afxdp_tx_full);
> afxdp_complete_tx(xsk_info);
> - kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
> + kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
> error = ENOMEM;
> goto out;
> }
> @@ -986,7 +1044,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
> xsk_ring_prod__submit(&xsk_info->tx, dp_packet_batch_size(batch));
> xsk_info->outstanding_tx += dp_packet_batch_size(batch);
>
> - ret = kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
> + ret = kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
> if (OVS_UNLIKELY(ret)) {
> VLOG_WARN_RL(&rl, "%s: error sending AF_XDP packet: %s.",
> netdev_get_name(netdev), ovs_strerror(ret));
> @@ -1052,10 +1110,11 @@ netdev_afxdp_construct(struct netdev *netdev)
> /* Queues should not be used before the first reconfiguration. Clearing. */
> netdev->n_rxq = 0;
> netdev->n_txq = 0;
> - dev->xdpmode = 0;
> + dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC;
> + dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC;
>
> dev->requested_n_rxq = NR_QUEUE;
> - dev->requested_xdpmode = XDP_COPY;
> + dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
> dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
>
> dev->xsks = NULL;
> diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> index e2f400b72..4fe861d2d 100644
> --- a/lib/netdev-afxdp.h
> +++ b/lib/netdev-afxdp.h
> @@ -25,6 +25,15 @@
> /* These functions are Linux AF_XDP specific, so they should be used directly
> * only by Linux-specific code. */
>
> +enum afxdp_mode {
> + OVS_AF_XDP_MODE_UNSPEC,
> + OVS_AF_XDP_MODE_BEST_EFFORT,
> + OVS_AF_XDP_MODE_NATIVE_ZC,
> + OVS_AF_XDP_MODE_NATIVE,
> + OVS_AF_XDP_MODE_GENERIC,
> + OVS_AF_XDP_MODE_MAX,
> +};
> +
> struct netdev;
> struct xsk_socket_info;
> struct xdp_umem;
> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
> index c14f2fb81..8873caa9d 100644
> --- a/lib/netdev-linux-private.h
> +++ b/lib/netdev-linux-private.h
> @@ -100,10 +100,14 @@ struct netdev_linux {
> /* AF_XDP information. */
> struct xsk_socket_info **xsks;
> int requested_n_rxq;
> - int xdpmode; /* AF_XDP running mode: driver or skb. */
> - int requested_xdpmode;
> +
> + enum afxdp_mode xdp_mode; /* Configured AF_XDP mode. */
> + enum afxdp_mode requested_xdp_mode; /* Requested AF_XDP mode. */
> + enum afxdp_mode xdp_mode_in_use; /* Effective AF_XDP mode. */
> +
> bool use_need_wakeup;
> bool requested_need_wakeup;
> +
> struct ovs_spin *tx_locks; /* spin lock array for TX queues. */
> #endif
> };
> diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> index f0683c0a9..5ee2ceb1a 100644
> --- a/tests/system-afxdp-macros.at
> +++ b/tests/system-afxdp-macros.at
> @@ -30,10 +30,3 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
> AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
> ]
> )
> -
> -# OVS_START_L7([namespace], [protocol])
> -#
> -# AF_XDP doesn't work with TCP over virtual interfaces for now.
> -#
> -m4_define([OVS_START_L7],
> - [AT_SKIP_IF([:])])
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index efdfb83bb..02a68deb1 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3107,18 +3107,38 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
> </p>
> </column>
>
> - <column name="options" key="xdpmode"
> + <column name="options" key="xdp-mode"
> type='{"type": "string",
> - "enum": ["set", ["skb", "drv"]]}'>
> + "enum": ["set", ["best-effort", "native-with-zerocopy",
> + "native", "generic"]]}'>
> <p>
> Specifies the operational mode of the XDP program.
> - If "drv", the XDP program is loaded into the device driver with
> - zero-copy RX and TX enabled. This mode requires device driver with
> - AF_XDP support and has the best performance.
> - If "skb", the XDP program is using generic XDP mode in kernel with
> - extra data copying between userspace and kernel. No device driver
> - support is needed. Note that this is afxdp netdev type only.
> - Defaults to "skb" mode.
> + <p>
> + In <code>native-with-zerocopy</code> mode the XDP program is loaded
> + into the device driver with zero-copy RX and TX enabled. This mode
> + requires device driver support and has the best performance because
> + there should be no copying of packets.
> + </p>
> + <p>
> + <code>native</code> is the same as
> + <code>native-with-zerocopy</code>, but without zero-copy
> + capability. This requires at least one copy between kernel and the
> + userspace. This mode also requires support from device driver.
> + </p>
> + <p>
> + In <code>generic</code> case the XDP program in kernel works after
> + skb allocation on early stages of packet processing inside the
> + network stack. This mode doesn't require driver support, but has
> + much lower performance.
> + </p>
> + <p>
> + <code>best-effort</code> tries to detect and choose the best
> + (fastest) from the available modes for current interface.
> + </p>
> + <p>
> + Note that this option is specific to netdev-afxdp.
> + Defaults to <code>best-effort</code> mode.
> + </p>
> </p>
> </column>
>
> --
> 2.17.1
>
More information about the dev
mailing list