[ovs-dev] [PATCH v2] netdev-afxdp: Best-effort configuration of XDP mode.

Ilya Maximets i.maximets at ovn.org
Tue Nov 19 16:52:22 UTC 2019


On 19.11.2019 17:16, Eelco Chaudron wrote:
> 
> 
> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> 
>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>
>> Devices like 'veth' interfaces in Linux supports native XDP, but
>> doesn't support zero-copy mode.  This case can not be covered by
>> existing API and we have to use slower generic XDP for such devices.
>> There are few more issues, e.g. TCP is not supported in generic XDP
>> mode for veth interfaces due to kernel limitations, however it is
>> supported in native mode.
>>
>> This change introduces ability to use native XDP without zero-copy
>> along with best-effort configuration option that enabled by default.
>> In best-effort case OVS will sequentially try different modes starting
>> from the fastest one and will choose the first acceptable for current
>> interface.  This will guarantee the best possible performance.
>>
>> If user will want to choose specific mode, it's still possible by
>> setting the 'options:xdp-mode'.
>>
>> This change additionally changes the API by renaming the configuration
>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>> themselves to be more user-friendly.
>>
>> The full list of currently supported modes:
>>   * native-with-zerocopy - former DRV
>>   * native               - new one, DRV without zero-copy
>>   * generic              - former SKB
>>   * best-effort          - new one, chooses the best available from
>>                            3 above modes
>>
>> Since 'best-effort' is a default mode, users will not need to
>> explicitely set 'xdp-mode' in most cases.
>>
>> TCP related tests enabled back in system afxdp testsuite, because
>> 'best-effort' will choose 'native' mode for veth interfaces
>> and this mode has no issues with TCP.
> Patch in general looks good, two small comments inline.

Thanks for review.

> 
> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?

Could you check if TCP works over tap interfaces in generic mode?
For me the point is that correctness is better than performance.
I also hope that native implementation for tap will be improved
over time.

> 
>> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
>> ---
>>
>> With this patch I modified the user-visible API, but I think it's OK
>> since it's still an experimental netdev.  Comments are welcome.
>>
>> Version 2:
>>   * Rebased on current master.
>>
>>  Documentation/intro/install/afxdp.rst |  54 ++++---
>>  NEWS                                  |  12 +-
>>  lib/netdev-afxdp.c                    | 223 ++++++++++++++++----------
>>  lib/netdev-afxdp.h                    |   9 ++
>>  lib/netdev-linux-private.h            |   8 +-
>>  tests/system-afxdp-macros.at          |   7 -
>>  vswitchd/vswitch.xml                  |  38 +++--
>>  7 files changed, 227 insertions(+), 124 deletions(-)
>>
>> diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
>> index a136db0c9..937770ad0 100644
>> --- a/Documentation/intro/install/afxdp.rst
>> +++ b/Documentation/intro/install/afxdp.rst
>> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
>>    make check-afxdp TESTSUITEFLAGS='1'
>>
>>  .. note::
>> -   Not all test cases pass at this time. Currenly all TCP related
>> -   tests, ex: using wget or http, are skipped due to XDP limitations
>> -   on veth. cvlan test is also skipped.
>> +   Not all test cases pass at this time. Currenly all cvlan tests are skipped
>> +   due to kernel issues.
>>
>>  If a test case fails, check the log at::
>>
>> @@ -177,33 +176,35 @@ in :doc:`general`::
>>    ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>>
>>  Make sure your device driver support AF_XDP, netdev-afxdp supports
>> -the following additional options (see man ovs-vswitchd.conf.db for
>> +the following additional options (see ``man ovs-vswitchd.conf.db`` for
>>  more details):
>>
>> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
>> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
>> +   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best of
>> +   supported modes, so in most cases you don't need to change it.
>>
>> - * **use-need-wakeup**: default "true" if libbpf supports it, otherwise false.
>> + * ``use-need-wakeup``: default ``true`` if libbpf supports it,
>> +   otherwise ``false``.
>>
>>  For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
>> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
>> -The **xdpmode** can be "drv" or "skb"::
>> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
>> +``n_rxq``::
>>
>>    ethtool -L enp2s0 combined 1
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=1 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:4"
>> +                                   other_config:pmd-rxq-affinity="0:4"
>>
>>  Or, use 4 pmds/cores and 4 queues by doing::
>>
>>    ethtool -L enp2s0 combined 4
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=4 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>> +    options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>>
>>  .. note::
>> -   pmd-rxq-affinity is optional. If not specified, system will auto-assign.
>> +   ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
>> +   ``n_rxq`` equals ``1`` by default.
>>
>>  To validate that the bridge has successfully instantiated, you can use the::
>>
>> @@ -214,12 +215,21 @@ Should show something like::
>>    Port "ens802f0"
>>     Interface "ens802f0"
>>        type: afxdp
>> -      options: {n_rxq="1", xdpmode=drv}
>> +      options: {n_rxq="1"}
>>
>>  Otherwise, enable debugging by::
>>
>>    ovs-appctl vlog/set netdev_afxdp::dbg
>>
>> +To check which XDP mode was chosen by ``best-effort``, you can look for
>> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
>> +
>> +  # ovs-appctl dpctl/show
>>netdev at ovs-netdev:
>> +    <...>
>> +    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
>> +                      xdp-mode=best-effort,
>> +                      xdp-mode-in-use=native-with-zerocopy)
>>
>>  References
>>  ----------
>> @@ -323,8 +333,11 @@ Limitations/Known Issues
>>  #. Most of the tests are done using i40e single port. Multiple ports and
>>     also ixgbe driver also needs to be tested.
>>  #. No latency test result (TODO items)
>> -#. Due to limitations of current upstream kernel, TCP and various offloading
>> +#. Due to limitations of current upstream kernel, various offloading
>>     (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
>> +   Also, TCP is not working over virtual interfaces in generic XDP mode.
>> +   Some more information and possible workaround available `here
>> +   <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
>>
>>
>>  PVP using tap device
>> @@ -335,8 +348,7 @@ First, start OVS, then add physical port::
>>    ethtool -L enp2s0 combined 1
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=1 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:4"
>> +    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
>>
>>  Start a VM with virtio and tap device::
>>
>> @@ -414,13 +426,11 @@ Create namespace and veth peer devices::
>>
>>  Attach the veth port to br0 (linux kernel mode)::
>>
>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>> -    set interface afxdp-p0 options:n_rxq=1
>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
>>
>> -Or, use AF_XDP with skb mode::
>> +Or, use AF_XDP::
>>
>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>> -    set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
>>
>>  Setup the OpenFlow rules::
>>
>> diff --git a/NEWS b/NEWS
>> index 88b818948..d5f476d6e 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -5,11 +5,19 @@ Post-v2.12.0
>>         separate project. You can find it at
>>         https://github.com/ovn-org/ovn.git
>>     - Userspace datapath:
>> +     * Add option to enable, disable and query TCP sequence checking in
>> +       conntrack.
>> +   - AF_XDP:
>>       * New option 'use-need-wakeup' for netdev-afxdp to control enabling
>>         of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
>>         if supported by libbpf.
>> -     * Add option to enable, disable and query TCP sequence checking in
>> -       conntrack.
>> +     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
>> +       Modes also updated.  New values:
>> +         native-with-zerocopy  - former DRV
>> +         native                - new one, DRV without zero-copy
>> +         generic               - former SKB
>> +         best-effort [default] - new one, chooses the best available from
>> +                                 3 above modes
>>
>>  v2.12.0 - 03 Sep 2019
>>  ---------------------
>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
>> index af654d498..74dde219d 100644
>> --- a/lib/netdev-afxdp.c
>> +++ b/lib/netdev-afxdp.c
>> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
>>  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
>>
>>  static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
>> -                                             int mode, bool use_need_wakeup);
>> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
>> +                                             enum afxdp_mode mode,
>> +                                             bool use_need_wakeup,
>> +                                             bool report_socket_failures);
>> +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
>>  static void xsk_destroy(struct xsk_socket_info *xsk);
>>  static int xsk_configure_all(struct netdev *netdev);
>>  static void xsk_destroy_all(struct netdev *netdev);
>>
>> +static struct {
>> +    const char *name;
>> +    uint32_t bind_flags;
>> +    uint32_t xdp_flags;
>> +} xdp_modes[] = {
>> +    [OVS_AF_XDP_MODE_UNSPEC] = {
>> +        .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
>> +    },
>> +    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
>> +        .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
>> +    },
> 
> <nitpicking> Maybe format the two entries above the same as below, as it’s more pleasing to my OCD brain </nitpicking>

Not a big deal for me.  I just wanted to save some space.
Could be changed before applying.

> 
>> +    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
>> +        .name = "native-with-zerocopy",
>> +        .bind_flags = XDP_ZEROCOPY,
>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>> +    },
>> +    [OVS_AF_XDP_MODE_NATIVE] = {
>> +        .name = "native",
>> +        .bind_flags = XDP_COPY,
>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>> +    },
>> +    [OVS_AF_XDP_MODE_GENERIC] = {
>> +        .name = "generic",
>> +        .bind_flags = XDP_COPY,
>> +        .xdp_flags = XDP_FLAGS_SKB_MODE,
>> +    },
>> +};
>> +
>>  struct unused_pool {
>>      struct xsk_umem_info *umem_info;
>>      int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
>> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
>>  }
>>
>>  static struct xsk_umem_info *
>> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>> +xsk_configure_umem(void *buffer, uint64_t size)
>>  {
>>      struct xsk_umem_config uconfig;
>>      struct xsk_umem_info *umem;
>> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>>      ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
>>                             &uconfig);
>>      if (ret) {
>> -        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
>> -                 ovs_strerror(errno),
>> -                 xdpmode == XDP_COPY ? "SKB": "DRV");
>> +        VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
>>          free(umem);
>>          return NULL;
>>      }
>> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>>
>>  static struct xsk_socket_info *
>>  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>> -                     uint32_t queue_id, int xdpmode, bool use_need_wakeup)
>> +                     uint32_t queue_id, enum afxdp_mode mode,
>> +                     bool use_need_wakeup, bool report_socket_failures)
>>  {
>>      struct xsk_socket_config cfg;
>>      struct xsk_socket_info *xsk;
>> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>>      cfg.rx_size = CONS_NUM_DESCS;
>>      cfg.tx_size = PROD_NUM_DESCS;
>>      cfg.libbpf_flags = 0;
>> -
>> -    if (xdpmode == XDP_ZEROCOPY) {
>> -        cfg.bind_flags = XDP_ZEROCOPY;
>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
>> -    } else {
>> -        cfg.bind_flags = XDP_COPY;
>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
>> -    }
>> +    cfg.bind_flags = xdp_modes[mode].bind_flags;
>> +    cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
>>
>>  #ifdef HAVE_XDP_NEED_WAKEUP
>>      if (use_need_wakeup) {
>> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>>      ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
>>                               &xsk->rx, &xsk->tx, &cfg);
>>      if (ret) {
>> -        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
>> -                 "use-need-wakeup: %s qid: %d",
>> -                 ovs_strerror(errno),
>> -                 xdpmode == XDP_COPY ? "SKB": "DRV",
>> -                 use_need_wakeup ? "true" : "false",
>> -                 queue_id);
>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>> +             "xsk_socket__create failed (%s) mode: %s, "
>> +             "use-need-wakeup: %s, qid: %d",
>> +             ovs_strerror(errno), xdp_modes[mode].name,
>> +             use_need_wakeup ? "true" : "false", queue_id);
>>          free(xsk);
>>          return NULL;
>>      }
>> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>>  }
>>
>>  static struct xsk_socket_info *
>> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>> -              bool use_need_wakeup)
>> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
>> +              bool use_need_wakeup, bool report_socket_failures)
>>  {
>>      struct xsk_socket_info *xsk;
>>      struct xsk_umem_info *umem;
>> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>      memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
>>
>>      /* Create AF_XDP socket. */
>> -    umem = xsk_configure_umem(bufs,
>> -                              NUM_FRAMES * FRAME_SIZE,
>> -                              xdpmode);
>> +    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
>>      if (!umem) {
>>          free_pagealign(bufs);
>>          return NULL;
>> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>
>>      VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
>>
>> -    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
>> -                               use_need_wakeup);
>> +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
>> +                               use_need_wakeup, report_socket_failures);
>>      if (!xsk) {
>>          /* Clean up umem and xpacket pool. */
>>          if (xsk_umem__delete(umem->umem)) {
>> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>      return xsk;
>>  }
>>
>> +static int
>> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
>> +                    enum afxdp_mode mode, bool report_socket_failures)
>> +{
>> +    struct xsk_socket_info *xsk_info;
>> +
>> +    VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
>> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
>> +             dev->use_need_wakeup ? "true" : "false");
>> +    xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
>> +                             report_socket_failures);
>> +    if (!xsk_info) {
>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>> +             "%s: Failed to create AF_XDP socket on queue %d in %s mode.",
>> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
>> +        dev->xsks[queue_id] = NULL;
>> +        return -1;
>> +    }
>> +    dev->xsks[queue_id] = xsk_info;
>> +    atomic_init(&xsk_info->tx_dropped, 0);
>> +    xsk_info->outstanding_tx = 0;
>> +    xsk_info->available_rx = PROD_NUM_DESCS;
>> +    return 0;
>> +}
>> +
>> +
>>  static int
>>  xsk_configure_all(struct netdev *netdev)
>>  {
>>      struct netdev_linux *dev = netdev_linux_cast(netdev);
>> -    struct xsk_socket_info *xsk_info;
>>      int i, ifindex, n_rxq, n_txq;
>> +    int qid = 0;
>>
>>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>>
>> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
>>      n_rxq = netdev_n_rxq(netdev);
>>      dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
>>
>> -    /* Configure each queue. */
>> -    for (i = 0; i < n_rxq; i++) {
>> -        VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
>> -                 netdev_get_name(netdev), i,
>> -                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
>> -                 dev->use_need_wakeup ? "true" : "false");
>> -        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
>> -                                 dev->use_need_wakeup);
>> -        if (!xsk_info) {
>> -            VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
>> -            dev->xsks[i] = NULL;
>> +    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
>> +        /* Trying to configure first queue with different modes to
>> +         * find the most suitable. */
>> +        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
>> +            if (!xsk_configure_queue(dev, ifindex, qid, i,
>> +                                     i == OVS_AF_XDP_MODE_MAX - 1)) {
>> +                dev->xdp_mode_in_use = i;
>> +                VLOG_INFO("%s: %s XDP mode will be in use.",
>> +                          netdev_get_name(netdev), xdp_modes[i].name);
>> +                break;
>> +            }
>> +        }
> 
> What if all modes fail? Guess it will use empty flags in xsk_configure_socket(). I would suggest to do a check in xsk_configure_socket() and bailout with a specific error?

What about the code on the next line?
It is here exactly for that purpose.  Am I missing something?

> 
>> +        if (i == OVS_AF_XDP_MODE_MAX) {
>> +            VLOG_ERR("%s: Failed to detect suitable XDP mode.",
>> +                     netdev_get_name(netdev));
>> +            goto err;
>> +        }
>> +        qid++;
>> +    } else {
>> +        dev->xdp_mode_in_use = dev->xdp_mode;
>> +    }
>> +
>> +    /* Configure remaining queues. */
>> +    for (; qid < n_rxq; qid++) {
>> +        if (xsk_configure_queue(dev, ifindex, qid,
>> +                                dev->xdp_mode_in_use, true)) {
>> +            VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
>> +                     netdev_get_name(netdev), qid);
>>              goto err;
>>          }
>> -        dev->xsks[i] = xsk_info;
>> -        atomic_init(&xsk_info->tx_dropped, 0);
>> -        xsk_info->outstanding_tx = 0;
>> -        xsk_info->available_rx = PROD_NUM_DESCS;
>>      }

<snip>


More information about the dev mailing list