[ovs-dev] [PATCH v4 3/5] netdev-offload: Add xdp flow api provider
Toshiaki Makita
toshiaki.makita1 at gmail.com
Tue Feb 9 10:01:42 UTC 2021
On 2021/02/05 12:03, William Tu wrote:
> On Thu, Feb 4, 2021 at 5:15 PM William Tu <u9012063 at gmail.com> wrote:
>>
>> Hi Toshiaki,
>>
>> Thanks for the patch. I have some questions inline.
>>
>> On Thu, Jul 30, 2020 at 7:55 PM Toshiaki Makita
>> <toshiaki.makita1 at gmail.com> wrote:
>>>
>>> This provider offloads classifier to software XDP.
>>>
>>> It works only when a custom XDP object is loaded by afxdp netdev.
>>> The BPF program needs to implement classifier with array-of-maps for
>>> subtable hashmaps and arraymap for subtable masks. The flow api
>>> provider detects classifier support in the custom XDP program when
>>> loading it.
>>>
>>> The requirements for the BPF program is as below:
>>>
>>> - A struct named "meta_info" in ".ovs_meta" section
>>> inform vswitchd of supported keys, actions, and the max number of
>>> actions.
>>>
>>> - A map named "subtbl_masks"
>>> An array which implements a list. The entry contains struct
>>> xdp_subtables_mask provided by lib/netdev-offload-xdp.h followed by
>>> miniflow buf of any length for the mask.
>>>
>>> - A map named "subtbl_masks_hd"
>>> A one entry int array which contains the first entry index of the list
>>> implemented by subtbl_masks.
>>>
>>> - A map named "flow_table"
>>> An array-of-maps. Each entry points to subtable hash-map instantiated
>>> with the following "subtbl_template".
>>> The entry of each index of subtbl_masks has miniflow mask of subtable
>>> in the corresponding index of flow_table.
>>>
>>> - A map named "subtbl_template"
>>> A template for subtable, used for entries of "flow_table".
>>> The key must be miniflow buf (without leading map).
>>>
>>> - A map named "output_map"
>>> A devmap. Each entry represents odp port.
>>>
>>> - A map named "xsks_map"
>>> A xskmap. Used for upcall.
>>>
>>> For more details, refer to the reference BPF program in next commit.
>>>
>>> In the future it may be possible to offload classifier to SmartNIC
>>> through XDP, but currently it's not because map-in-map is not supported
>>> by XDP hw-offload.
>>>
>>> Signed-off-by: Toshiaki Makita <toshiaki.makita1 at gmail.com>
>>> ---
>>> acinclude.m4 | 25 +
>>> configure.ac | 1 +
>>> lib/automake.mk | 8 +
>>> lib/bpf-util.c | 38 ++
>>> lib/bpf-util.h | 22 +
>>> lib/netdev-afxdp.c | 218 +++++-
>>> lib/netdev-afxdp.h | 3 +
>>> lib/netdev-linux-private.h | 2 +
>>> lib/netdev-offload-provider.h | 8 +-
>>> lib/netdev-offload-xdp.c | 1213 +++++++++++++++++++++++++++++++++
>>> lib/netdev-offload-xdp.h | 49 ++
>>> lib/netdev-offload.c | 3 +
>>> 12 files changed, 1588 insertions(+), 2 deletions(-)
>>> create mode 100644 lib/bpf-util.c
>>> create mode 100644 lib/bpf-util.h
>>> create mode 100644 lib/netdev-offload-xdp.c
>>> create mode 100644 lib/netdev-offload-xdp.h
>>>
>>> diff --git a/acinclude.m4 b/acinclude.m4
>>> index 4bac9dbdd..31ff0c013 100644
>>> --- a/acinclude.m4
>>> +++ b/acinclude.m4
>>> @@ -329,6 +329,31 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
>>> AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
>>> ])
>>>
>>> +dnl OVS_CHECK_LINUX_XDP_OFFLOAD
>>> +dnl
>>> +dnl Check both llvm and libbpf support
>>> +AC_DEFUN([OVS_CHECK_LINUX_XDP_OFFLOAD], [
>>> + AC_ARG_ENABLE([xdp_offload],
>>> + [AC_HELP_STRING([--enable-xdp-offload],
>>> + [Compile XDP offload])],
>>> + [], [enable_xdp_offload=no])
>>> + AC_MSG_CHECKING([whether XDP offload is enabled])
>>> + if test "$enable_xdp_offload" != yes; then
>>> + AC_MSG_RESULT([no])
>>> + XDP_OFFLOAD_ENABLE=false
>>> + else
>>> + if test "$enable_afxdp" == no; then
>>> + AC_MSG_ERROR([XDP offload depends on afxdp. Please add --enable-afxdp.])
>>> + fi
>>> + AC_MSG_RESULT([yes])
>>> + XDP_OFFLOAD_ENABLE=true
>>> +
>>> + AC_DEFINE([HAVE_XDP_OFFLOAD], [1],
>>> + [Define to 1 if XDP offload compilation is available and enabled.])
>>> + fi
>>> + AM_CONDITIONAL([HAVE_XDP_OFFLOAD], test "$XDP_OFFLOAD_ENABLE" = true)
>>> +])
>>> +
>>> dnl OVS_CHECK_DPDK
>>> dnl
>>> dnl Configure DPDK source tree
>>> diff --git a/configure.ac b/configure.ac
>>> index 8d37af9db..530112c49 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -99,6 +99,7 @@ OVS_CHECK_DOT
>>> OVS_CHECK_IF_DL
>>> OVS_CHECK_STRTOK_R
>>> OVS_CHECK_LINUX_AF_XDP
>>> +OVS_CHECK_LINUX_XDP_OFFLOAD
>>> AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
>>> AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
>>> [], [], [[#include <sys/stat.h>]])
>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>> index 920c958e3..9486b32e7 100644
>>> --- a/lib/automake.mk
>>> +++ b/lib/automake.mk
>>> @@ -451,6 +451,14 @@ lib_libopenvswitch_la_SOURCES += \
>>> lib/netdev-afxdp.h
>>> endif
>>>
>>> +if HAVE_XDP_OFFLOAD
>>> +lib_libopenvswitch_la_SOURCES += \
>>> + lib/bpf-util.c \
>>> + lib/bpf-util.h \
>>> + lib/netdev-offload-xdp.c \
>>> + lib/netdev-offload-xdp.h
>>> +endif
>>> +
>>> if DPDK_NETDEV
>>> lib_libopenvswitch_la_SOURCES += \
>>> lib/dpdk.c \
>>> diff --git a/lib/bpf-util.c b/lib/bpf-util.c
>>> new file mode 100644
>>> index 000000000..324cfbe1d
>>> --- /dev/null
>>> +++ b/lib/bpf-util.c
>>> @@ -0,0 +1,38 @@
>>> +/*
>>> + * Copyright (c) 2020 NTT Corp.
>>> + *
>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>> + * you may not use this file except in compliance with the License.
>>> + * You may obtain a copy of the License at:
>>> + *
>>> + * http://www.apache.org/licenses/LICENSE-2.0
>>> + *
>>> + * Unless required by applicable law or agreed to in writing, software
>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>> + * See the License for the specific language governing permissions and
>>> + * limitations under the License.
>>> + */
>>> +
>>> +#include <config.h>
>>> +
>>> +#include "bpf-util.h"
>>> +
>>> +#include <bpf/libbpf.h>
>>> +
>>> +#include "ovs-thread.h"
>>> +
>>> +DEFINE_STATIC_PER_THREAD_DATA(struct { char s[128]; },
>>> + libbpf_strerror_buffer,
>>> + { "" });
>>> +
>>> +const char *
>>> +ovs_libbpf_strerror(int err)
>>> +{
>>> + enum { BUFSIZE = sizeof libbpf_strerror_buffer_get()->s };
>>> + char *buf = libbpf_strerror_buffer_get()->s;
>>> +
>>> + libbpf_strerror(err, buf, BUFSIZE);
>>> +
>>> + return buf;
>>> +}
>>> diff --git a/lib/bpf-util.h b/lib/bpf-util.h
>>> new file mode 100644
>>> index 000000000..6346935b3
>>> --- /dev/null
>>> +++ b/lib/bpf-util.h
>>> @@ -0,0 +1,22 @@
>>> +/*
>>> + * Copyright (c) 2020 NTT Corp.
>>> + *
>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>> + * you may not use this file except in compliance with the License.
>>> + * You may obtain a copy of the License at:
>>> + *
>>> + * http://www.apache.org/licenses/LICENSE-2.0
>>> + *
>>> + * Unless required by applicable law or agreed to in writing, software
>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>>> + * See the License for the specific language governing permissions and
>>> + * limitations under the License.
>>> + */
>>> +
>>> +#ifndef BPF_UTIL_H
>>> +#define BPF_UTIL_H 1
>>> +
>>> +const char *ovs_libbpf_strerror(int err);
>>> +
>>> +#endif /* bpf-util.h */
>>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
>>> index e1374ccbb..00204f299 100644
>>> --- a/lib/netdev-afxdp.c
>>> +++ b/lib/netdev-afxdp.c
>>> @@ -22,6 +22,7 @@
>>> #include "netdev-afxdp-pool.h"
>>>
>>> #include <bpf/bpf.h>
>>> +#include <bpf/btf.h>
>>> #include <errno.h>
>>> #include <inttypes.h>
>>> #include <linux/rtnetlink.h>
>>> @@ -37,10 +38,13 @@
>>> #include <sys/types.h>
>>> #include <unistd.h>
>>>
>>> +#include "bpf-util.h"
>>> #include "coverage.h"
>>> #include "dp-packet.h"
>>> #include "dpif-netdev.h"
>>> #include "fatal-signal.h"
>>> +#include "netdev-offload-provider.h"
>>> +#include "netdev-offload-xdp.h"
>>> #include "openvswitch/compiler.h"
>>> #include "openvswitch/dynamic-string.h"
>>> #include "openvswitch/list.h"
>>> @@ -261,10 +265,204 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
>>> ovs_mutex_unlock(&unused_pools_mutex);
>>> }
>>>
>>> +bool
>>> +has_xdp_flowtable(struct netdev *netdev)
>>> +{
>>> + struct netdev_linux *dev = netdev_linux_cast(netdev);
>>> +
>>> + return dev->has_xdp_flowtable;
>>> +}
>>> +
>>> +struct bpf_object *
>>> +get_xdp_object(struct netdev *netdev)
>>> +{
>>> + struct netdev_linux *dev = netdev_linux_cast(netdev);
>>> +
>>> + return dev->xdp_obj;
>>> +}
>>> +
>>> +#ifdef HAVE_XDP_OFFLOAD
>>> +static int
>>> +xdp_preload(struct netdev *netdev, struct bpf_object *obj)
>>> +{
>>> + static struct ovsthread_once output_map_once = OVSTHREAD_ONCE_INITIALIZER;
>>> + struct btf *btf;
>>> + struct bpf_map *flow_table, *subtbl_template, *subtbl_masks_hd, *xsks_map,
>>> + *output_map;
>>> + const struct bpf_map_def *flow_table_def, *subtbl_def;
>>> + int flow_table_fd, subtbl_meta_fd, xsks_map_fd;
>>> + static int output_map_fd = -1;
>>> + int n_rxq, err;
>>> +
>>> + btf = bpf_object__btf(obj);
>>> + if (!btf) {
>>> + VLOG_DBG("BPF object for netdev \"%s\" does not contain BTF",
>>> + netdev_get_name(netdev));
>>> + return EOPNOTSUPP;
>>> + }
>>> + if (btf__find_by_name_kind(btf, ".ovs_meta", BTF_KIND_DATASEC) < 0) {
>>> + VLOG_DBG("\".ovs_meta\" datasec not found in BTF");
>>> + return EOPNOTSUPP;
>>> + }
>>> +
>>> + flow_table = bpf_object__find_map_by_name(obj, "flow_table");
>>> + if (!flow_table) {
>>> + VLOG_DBG("%s: \"flow_table\" map does not exist",
>>> + netdev_get_name(netdev));
>>> + return EOPNOTSUPP;
>>> + }
>>> +
>>> + subtbl_template = bpf_object__find_map_by_name(obj, "subtbl_template");
>>> + if (!subtbl_template) {
>>> + VLOG_DBG("%s: \"subtbl_template\" map does not exist",
>>> + netdev_get_name(netdev));
>>> + return EOPNOTSUPP;
>>> + }
>>> +
>>> + output_map = bpf_object__find_map_by_name(obj, "output_map");
>>> + if (!output_map) {
>>> + VLOG_DBG("%s: \"output_map\" map does not exist",
>>> + netdev_get_name(netdev));
>>> + return EOPNOTSUPP;
>>> + }
>>> +
>>> + if (ovsthread_once_start(&output_map_once)) {
>>> + output_map_fd = bpf_create_map_name(BPF_MAP_TYPE_DEVMAP, "output_map",
>>> + sizeof(int), sizeof(int),
>>> + XDP_MAX_PORTS, 0);
>>> + if (output_map_fd < 0) {
>>> + err = errno;
>>> + VLOG_WARN("%s: Map creation for output_map failed: %s",
>>> + netdev_get_name(netdev), ovs_strerror(errno));
>>> + ovsthread_once_done(&output_map_once);
>>> + return err;
>>> + }
>>> + ovsthread_once_done(&output_map_once);
>>> + }
>>> +
>>> + flow_table_def = bpf_map__def(flow_table);
>>> + if (flow_table_def->type != BPF_MAP_TYPE_ARRAY_OF_MAPS) {
>>> + VLOG_WARN("%s: \"flow_table\" map type is not map-in-map array.",
>>> + netdev_get_name(netdev));
>>> + return EINVAL;
>>> + }
>>
>> IIUC, for example flow_table, here we tried to read its definition from the
>> object file (without loading it), and create a map, get its fd.
>>
>>> +
>>> + subtbl_def = bpf_map__def(subtbl_template);
>>> + if (subtbl_def->type != BPF_MAP_TYPE_HASH) {
>>> + VLOG_WARN("%s: \"subtbl_templates\" map type is not hash-table",
>>> + netdev_get_name(netdev));
>>> + return EINVAL;
>>> + }
>>> +
>>> + subtbl_meta_fd = bpf_create_map(BPF_MAP_TYPE_HASH, subtbl_def->key_size,
>>> + subtbl_def->value_size,
>>> + subtbl_def->max_entries, 0);
>>> + if (subtbl_meta_fd < 0) {
>>> + err = errno;
>>> + VLOG_WARN("%s: Map creation for flow_table meta table failed: %s",
>>> + netdev_get_name(netdev), ovs_strerror(errno));
>>> + return err;
>>> + }
>>> + flow_table_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS,
>>> + "flow_table", sizeof(uint32_t),
>>> + subtbl_meta_fd,
>>> + flow_table_def->max_entries,
>>> + 0);
>>> + if (flow_table_fd < 0) {
>>> + err = errno;
>>> + VLOG_WARN("%s: Map creation for flow_table failed: %s",
>>> + netdev_get_name(netdev), ovs_strerror(errno));
>>> + close(subtbl_meta_fd);
>>> + return err;
>>> + }
>>> + close(subtbl_meta_fd);
>>> +
>>> + err = bpf_map__reuse_fd(flow_table, flow_table_fd);
>>> + if (err) {
>>> + VLOG_WARN("%s: Failed to reuse flow_table fd: %s",
>>> + netdev_get_name(netdev), ovs_libbpf_strerror(err));
>>> + close(flow_table_fd);
>>> + return EINVAL;
>>> + }
>>> + close(flow_table_fd);
>>
>> Question:
>> What's the purpose of bpf_map__reuse_fd and why do we close the flow_table_fd.
>> I though later on when we call "bpf_object__load(obj)", all the maps
>> will be loaded?
Map-in-map could not be created by bpf_object__load() at that point.
Let me check if this has been changed or not.
>>> +
>>> + subtbl_masks_hd = bpf_object__find_map_by_name(obj, "subtbl_masks_hd");
>>> + if (!subtbl_masks_hd) {
>>> + /* Head index can be a global variable. In that case libbpf will
>>> + * initialize it. */
>>> + VLOG_DBG("%s: \"subtbl_masks_hd\" map does not exist",
>>> + netdev_get_name(netdev));
>>> + } else {
>>> + int head_fd, head = XDP_SUBTABLES_TAIL, zero = 0;
>>> +
>>> + head_fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "subtbl_masks_hd",
>>> + sizeof(uint32_t), sizeof(int), 1, 0);
>>> + if (head_fd < 0) {
>>> + err = errno;
>>> + VLOG_ERR("%s: Map creation of \"subtbl_masks_hd\" failed: %s",
>>> + netdev_get_name(netdev), ovs_strerror(errno));
>>> + return err;
>>> + }
>>> +
>>> + if (bpf_map_update_elem(head_fd, &zero, &head, 0)) {
>>> + err = errno;
>>> + VLOG_ERR("Cannot update subtbl_masks_hd: %s",
>>> + ovs_strerror(errno));
>>> + return err;
>>> + }
>>> +
>>> + err = bpf_map__reuse_fd(subtbl_masks_hd, head_fd);
>>> + if (err) {
>>> + VLOG_ERR("%s: Failed to reuse subtbl_masks_hd fd: %s",
>>> + netdev_get_name(netdev), ovs_libbpf_strerror(err));
>>> + close(head_fd);
>>> + return EINVAL;
>>> + }
>>> + close(head_fd);
>>> + }
>>> +
>>> + xsks_map = bpf_object__find_map_by_name(obj, "xsks_map");
>>> + if (!xsks_map) {
>>> + VLOG_ERR("%s: BUG: \"xsks_map\" map does not exist",
>>> + netdev_get_name(netdev));
>>> + return EOPNOTSUPP;
>>> + }
>>> +
>>> + n_rxq = netdev_n_rxq(netdev);
>>> + xsks_map_fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
>>> + sizeof(int), sizeof(int), n_rxq, 0);
>>> + if (xsks_map_fd < 0) {
>>> + err = errno;
>>> + VLOG_WARN("%s: Map creation for xsks_map failed: %s",
>>> + netdev_get_name(netdev), ovs_strerror(errno));
>>> + return err;
>>> + }
>>> +
>>> + err = bpf_map__reuse_fd(xsks_map, xsks_map_fd);
>>> + if (err) {
>>> + VLOG_WARN("%s: Failed to reuse xsks_map fd: %s",
>>> + netdev_get_name(netdev), ovs_libbpf_strerror(err));
>>> + close(xsks_map_fd);
>>> + return EINVAL;
>>> + }
>>> + close(xsks_map_fd);
>>> +
>>> + err = bpf_map__reuse_fd(output_map, output_map_fd);
>>> + if (err) {
>>> + VLOG_WARN("%s: Failed to reuse output_map fd: %s",
>>> + netdev_get_name(netdev), ovs_libbpf_strerror(err));
>>> + return EINVAL;
>>> + }
>>
>> Do you need to close(output_map_fd)?
That's a global resource so we need to close it when shutting down the process.
I left it to automatic closing by kernel.
>>> +
>>> + return 0;
>>> +}
>>> +#endif
>>> +
>>> static int
>>> xsk_load_prog(struct netdev *netdev, const char *path,
>>> struct bpf_object **pobj, int *prog_fd)
>>> {
>>> + struct netdev_linux *dev OVS_UNUSED = netdev_linux_cast(netdev);
>>> struct bpf_object_open_attr attr = {
>>> .prog_type = BPF_PROG_TYPE_XDP,
>>> .file = path,
>>> @@ -298,6 +496,14 @@ xsk_load_prog(struct netdev *netdev, const char *path,
>>> goto err;
>>> }
>>>
>>> +#ifdef HAVE_XDP_OFFLOAD
>>> + if (!xdp_preload(netdev, obj)) {
>>
>> I forgot what's the purpose of doing xdp_preload, before we call
>> bpf_object__load below.
>> Does it just to check whether the flowtable_afxdp.o has every map we want?
>> Or it does initialize something else.
- Create map-in-maps as they cannot be automatically created.
- Create xsks_map with the proper n_rxq parameter.
- Use global output_map table instead of creating new one.
All of them need to be done before bpf_object__load.
>>> + VLOG_INFO("%s: Detected flowtable support in XDP program",
>>> + netdev_get_name(netdev));
>>> + dev->has_xdp_flowtable = true;
>>> + }
>>> +#endif
>>> +
>>> if (bpf_object__load(obj)) {
>>> VLOG_ERR("%s: Can't load XDP program at '%s'",
>>> netdev_get_name(netdev), path);
>>> @@ -1297,7 +1503,17 @@ libbpf_print(enum libbpf_print_level level,
>>>
>>> int netdev_afxdp_init(void)
>>> {
>>> - libbpf_set_print(libbpf_print);
>>> + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>> +
>>> + if (ovsthread_once_start(&once)) {
>>> + libbpf_set_print(libbpf_print);
>>> +#ifdef HAVE_XDP_OFFLOAD
>>> + if (netdev_register_flow_api_provider(&netdev_offload_xdp)) {
>>> + VLOG_WARN("Failed to register XDP flow api provider");
>>> + }
>>> +#endif
>>> + ovsthread_once_done(&once);
>>> + }
>>> return 0;
>>> }
>>>
>>> diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
>>> index e91cd102d..324152e8f 100644
>>> --- a/lib/netdev-afxdp.h
>>> +++ b/lib/netdev-afxdp.h
>>> @@ -44,6 +44,7 @@ struct netdev_stats;
>>> struct smap;
>>> struct xdp_umem;
>>> struct xsk_socket_info;
>>> +struct bpf_object;
>>>
>>> int netdev_afxdp_rxq_construct(struct netdev_rxq *rxq_);
>>> void netdev_afxdp_rxq_destruct(struct netdev_rxq *rxq_);
>>> @@ -70,6 +71,8 @@ int netdev_afxdp_get_custom_stats(const struct netdev *netdev,
>>> void free_afxdp_buf(struct dp_packet *p);
>>> int netdev_afxdp_reconfigure(struct netdev *netdev);
>>> void signal_remove_xdp(struct netdev *netdev);
>>> +bool has_xdp_flowtable(struct netdev *netdev);
>>> +struct bpf_object *get_xdp_object(struct netdev *netdev);
>>>
>>> #else /* !HAVE_AF_XDP */
>>>
>> Thanks
>> William
>
> btw, looking at lib/netdev-offload-xdp.c
> the netdev_xdp_flow_get is not implemented.
> Does that mean we will not detect flow that alread exists and always
> replace with the new flow?
IIUC flows are inserted when upcall happens. It does not use flow_get API and
just insert flows.
Without flow_get, we cannot use dpctl/get-flow or we cannot get flows stats.
But implementing stats is not a trivial work as we need to prepare per-cpu counters
for flows so we don't slow down the XDP program. So I left it as a future work.
> also, when I tried to dump-flow, is there a way to know which flow is
> processed in XDP,
> and which flows are not offloaded?
Let me check if this is possible.
> looking at man ovs-vswitchd(8), DATAPATH FLOW TABLE DEBUGGING COMMANDS,
> dpctl/dump-flows, should we add "xdp - displays flows handled in the xdp" there?
Will check if this is doable.
Thanks,
Toshiaki Makita
More information about the dev
mailing list