[ovs-dev] [PATCH] netdev: Dynamic per-port Flow API.

Roni Bar Yanai roniba at mellanox.com
Wed Apr 24 21:58:42 UTC 2019


Hi Ilya,

Please see comment/clarification inline.
Thanks,
Roni

>-----Original Message-----
>From: Ilya Maximets <i.maximets at samsung.com>
>Sent: Wednesday, April 24, 2019 6:12 PM
>To: Ophir Munk <ophirmu at mellanox.com>; ovs-dev at openvswitch.org
>Cc: Ian Stokes <ian.stokes at intel.com>; Flavio Leitner <fbl at sysclose.org>; Kevin
>Traynor <ktraynor at redhat.com>; Roni Bar Yanai <roniba at mellanox.com>; Finn
>Christensen <fc at napatech.com>; Ben Pfaff <blp at ovn.org>; Roi Dayan
><roid at mellanox.com>; Simon Horman <simon.horman at netronome.com>; Olga
>Shern <olgas at mellanox.com>; Asaf Penso <asafp at mellanox.com>; Oz Shlomo
><ozsh at mellanox.com>; Paul Blakey <paulb at mellanox.com>
>Subject: Re: [PATCH] netdev: Dynamic per-port Flow API.
>
>On 24.04.2019 17:26, Ophir Munk wrote:
>> Hi Ilya,
>> Please find comments inline and at the end of this mail.
>
>Hi. Thanks for review.
>Comments inline.
>
>Best regards, Ilya Maximets.
>
>>
>> Regards,
>> Ophir
>>
>>> -----Original Message-----
>>> From: Ilya Maximets <i.maximets at samsung.com>
>>> Sent: Tuesday, April 23, 2019 7:19 PM
>>> To: ovs-dev at openvswitch.org
>>> Cc: Ian Stokes <ian.stokes at intel.com>; Flavio Leitner <fbl at sysclose.org>;
>>> Ophir Munk <ophirmu at mellanox.com>; Kevin Traynor
>>> <ktraynor at redhat.com>; Roni Bar Yanai <roniba at mellanox.com>; Finn
>>> Christensen <fc at napatech.com>; Ben Pfaff <blp at ovn.org>; Roi Dayan
>>> <roid at mellanox.com>; Simon Horman <simon.horman at netronome.com>;
>>> Ilya Maximets <i.maximets at samsung.com>
>>> Subject: [PATCH] netdev: Dynamic per-port Flow API.
>>>
>>> Current issues with Flow API:
>>>
>>> * OVS calls offloading functions regardless of successful
>>>   flow API initialization. (ex. on init_flow_api failure)
>>> * Static initilaization of Flow API for a netdev_class forbids
>>>   having different offloading types for different instances
>>>   of netdev with the same netdev_class. (ex. different vports in
>>>   'system' and 'netdev' datapaths at the same time)
>>>
>>> Solution:
>>>
>>> * Move Flow API from the netdev_class to netdev instance.
>>> * Make Flow API dynamic, i.e. probe the APIs and choose the
>>>   suitable one.
>>>
>>
>> I suggest distinguishing between initialization and probing.
>> The probing you mention is implemented by testing device initialization:
>init_flow_api().
>> I suggest adding a new probe() API just for the sake of probing. It will check the
>netdev struct.
>> I suggest  leaving the init_flow_api() API for HW/drivers initialization.
>> The HW/driver may fail at initialization, but it does not mean we need to probe
>for a new API in that case.
>
>I also thought about separate 'probe()' api, but it seems that probing
>will be equal to 'init()' in most cases. Checking the 'netdev struct'
>fro probing is a bad solution as instances of same netdev class could
>require different offloading apis. Probing a new api on init failure
>could be useful exactly for this case. Anyway we'll be able to improve
>the logic of assigning in the future if it will be needed. This is
>the single point in code where we have information about all the available
>offloading providers.
>
>>
>>> Side effects:
>>>
>>> * Flow API providers localized as possible in their modules.
>>> * Now we have an ability to make runtime checks. For example,
>>>   we could check if particular device supports features we
>>>   need, like if dpdk device supports RSS+MARK action.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>>> ---
>>>
>>> Since RFC:
>>>   * Fixed sparce build.
>>>   * Some logs turned from INFO to DBG.
>>>   * Enabled HW offloading on non-Linux systems
>>>     (For testing with dummy provider).
>>>
>>>  lib/automake.mk               |   2 +-
>>>  lib/dpdk.c                    |   2 +
>>>  lib/netdev-dpdk.c             |  24 +++-
>>>  lib/netdev-dpdk.h             |   3 +
>>>  lib/netdev-dummy.c            |  24 +++-
>>>  lib/netdev-linux.c            |   3 -
>>>  lib/netdev-linux.h            |  10 --
>>>  lib/netdev-offload-provider.h |  99 +++++++++++++++
>>>  lib/netdev-provider.h         |  67 +----------
>>>  lib/netdev-rte-offloads.c     |  40 +++++-
>>>  lib/netdev-rte-offloads.h     |  41 +------
>>>  lib/netdev-tc-offloads.c      |  39 ++++--
>>>  lib/netdev-tc-offloads.h      |  44 -------
>>>  lib/netdev-vport.c            |   6 +-
>>>  lib/netdev.c                  | 221 +++++++++++++++++++++++++++-------
>>>  tests/dpif-netdev.at          |   4 +-
>>>  tests/ofproto-macros.at       |   1 -
>>>  17 files changed, 398 insertions(+), 232 deletions(-)  create mode 100644
>>> lib/netdev-offload-provider.h  delete mode 100644 lib/netdev-tc-offloads.h
>>>
>>> diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39..b9f3b69e7
>>> 100644
>>> --- a/lib/automake.mk
>>> +++ b/lib/automake.mk
>>> @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \
>>>  	lib/netdev-dpdk.h \
>>>  	lib/netdev-dummy.c \
>>>  	lib/netdev-provider.h \
>>> +	lib/netdev-offload-provider.h \
>>
>> Please keep alphabetical order (switch between the last 2 lines)
>>
>>>  	lib/netdev-rte-offloads.h \
>>>  	lib/netdev-vport.c \
>>>  	lib/netdev-vport.h \
>>> @@ -393,7 +394,6 @@ lib_libopenvswitch_la_SOURCES += \
>>>  	lib/netdev-linux.c \
>>>  	lib/netdev-linux.h \
>>>  	lib/netdev-tc-offloads.c \
>>> -	lib/netdev-tc-offloads.h \
>>>  	lib/netlink-conntrack.c \
>>>  	lib/netlink-conntrack.h \
>>>  	lib/netlink-notifier.c \
>>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>>> index dc6171546..6c6298635 100644
>>> --- a/lib/dpdk.c
>>> +++ b/lib/dpdk.c
>>> @@ -34,6 +34,7 @@
>>>  #include "dirs.h"
>>>  #include "fatal-signal.h"
>>>  #include "netdev-dpdk.h"
>>> +#include "netdev-rte-offloads.h"
>>>  #include "openvswitch/dynamic-string.h"
>>>  #include "openvswitch/vlog.h"
>>>  #include "ovs-numa.h"
>>> @@ -442,6 +443,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>>>
>>>      /* Finally, register the dpdk classes */
>>>      netdev_dpdk_register();
>>> +    netdev_dpdk_flow_api_register();
>>>      return true;
>>>  }
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>> 47153dc60..c06c9ef81 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -4204,6 +4204,27 @@ unlock:
>>>      return err;
>>>  }
>>>
>>> +bool
>>> +netdev_dpdk_flow_api_supported(struct netdev *netdev) {
>>> +    struct netdev_dpdk *dev;
>>> +    bool ret = false;
>>> +
>>> +    if (!is_dpdk_class(netdev->netdev_class)) {
>>> +        goto out;
>>> +    }
>>> +
>>> +    dev = netdev_dpdk_cast(netdev);
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    if (dev->type == DPDK_DEV_ETH) {
>>> +        /* TODO: Check if we able to offload some minimal flow. */
>>> +        ret = true;
>>> +    }
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +out:
>>> +    return ret;
>>> +}
>>> +
>>>  int
>>>  netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>                               struct rte_flow *rte_flow, @@ -4268,8 +4289,7 @@
>>> netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>>      .get_features = netdev_dpdk_get_features,           \
>>>      .get_status = netdev_dpdk_get_status,               \
>>>      .reconfigure = netdev_dpdk_reconfigure,             \
>>> -    .rxq_recv = netdev_dpdk_rxq_recv,                   \
>>> -    DPDK_FLOW_OFFLOAD_API
>>> +    .rxq_recv = netdev_dpdk_rxq_recv
>>>
>>>  static const struct netdev_class dpdk_class = {
>>>      .type = "dpdk",
>>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index
>>> 9bbb8d8d6..60631c4f0 100644
>>> --- a/lib/netdev-dpdk.h
>>> +++ b/lib/netdev-dpdk.h
>>> @@ -34,6 +34,9 @@ struct rte_flow_action;
>>>
>>>  void netdev_dpdk_register(void);
>>>  void free_dpdk_buf(struct dp_packet *);
>>> +
>>> +bool netdev_dpdk_flow_api_supported(struct netdev *);
>>> +
>>>  int
>>>  netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>                               struct rte_flow *rte_flow, diff --git a/lib/netdev-dummy.c
>>> b/lib/netdev-dummy.c index 3f90ffa09..2e2e0c2ab 100644
>>> --- a/lib/netdev-dummy.c
>>> +++ b/lib/netdev-dummy.c
>>> @@ -24,6 +24,7 @@
>>>  #include "dp-packet.h"
>>>  #include "dpif-netdev.h"
>>>  #include "flow.h"
>>> +#include "netdev-offload-provider.h"
>>>  #include "netdev-provider.h"
>>>  #include "netdev-vport.h"
>>>  #include "odp-util.h"
>>> @@ -1523,10 +1524,6 @@ exit:
>>>      return error ? -1 : 0;
>>>  }
>>>
>>> -#define DUMMY_FLOW_OFFLOAD_API                          \
>>> -    .flow_put = netdev_dummy_flow_put,                  \
>>> -    .flow_del = netdev_dummy_flow_del
>>> -
>>>  #define NETDEV_DUMMY_CLASS_COMMON                       \
>>>      .run = netdev_dummy_run,                            \
>>>      .wait = netdev_dummy_wait,                          \
>>> @@ -1559,8 +1556,7 @@ exit:
>>>      .rxq_dealloc = netdev_dummy_rxq_dealloc,            \
>>>      .rxq_recv = netdev_dummy_rxq_recv,                  \
>>>      .rxq_wait = netdev_dummy_rxq_wait,                  \
>>> -    .rxq_drain = netdev_dummy_rxq_drain,                \
>>> -    DUMMY_FLOW_OFFLOAD_API
>>> +    .rxq_drain = netdev_dummy_rxq_drain
>>>
>>>  static const struct netdev_class dummy_class = {
>>>      NETDEV_DUMMY_CLASS_COMMON,
>>> @@ -1578,6 +1574,20 @@ static const struct netdev_class
>>> dummy_pmd_class = {
>>>      .is_pmd = true,
>>>      .reconfigure = netdev_dummy_reconfigure  };
>>> +
>>> +static int
>>> +netdev_dummy_offloads_init_flow_api(struct netdev *netdev) {
>>> +    return is_dummy_class(netdev->netdev_class) ? 0 : -1; }
>>> +
>>> +static const struct netdev_flow_api netdev_dummy_offloads = {
>>> +    .type = "dummy",
>>> +    .flow_put = netdev_dummy_flow_put,
>>> +    .flow_del = netdev_dummy_flow_del,
>>> +    .init_flow_api = netdev_dummy_offloads_init_flow_api,
>>> +};
>>> +
>>>
>>
>>>  /* Helper functions. */
>>>
>>> @@ -2024,5 +2034,7 @@ netdev_dummy_register(enum dummy_level
>>> level)
>>>      netdev_register_provider(&dummy_internal_class);
>>>      netdev_register_provider(&dummy_pmd_class);
>>>
>>> +    netdev_register_flow_api_provider(&netdev_dummy_offloads);
>>> +
>>>      netdev_vport_tunnel_register();
>>>  }
>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index f75d73fd3..e4ea94cf9
>>> 100644
>>> --- a/lib/netdev-linux.c
>>> +++ b/lib/netdev-linux.c
>>> @@ -55,7 +55,6 @@
>>>  #include "hash.h"
>>>  #include "openvswitch/hmap.h"
>>>  #include "netdev-provider.h"
>>> -#include "netdev-tc-offloads.h"
>>>  #include "netdev-vport.h"
>>>  #include "netlink-notifier.h"
>>>  #include "netlink-socket.h"
>>> @@ -3321,7 +3320,6 @@ exit:
>>>
>>>  const struct netdev_class netdev_linux_class = {
>>>      NETDEV_LINUX_CLASS_COMMON,
>>> -    LINUX_FLOW_OFFLOAD_API,
>>>      .type = "system",
>>
>> Please note that before this patch type="system" and type="internal" had
>LINUX_FLOW_OFFLOAD_API, but type="tap" did not.
>> With this patch type="tap" is going to be tested with init_flow_api(). If the test
>passes - type="tap" is going to have flow_api which it did not have before.
>> Was is verified that type="tap" will fail with init_flow_api()?
>
>If 'init_flow_api' will be able to create tc qdisc than offloading
>is supported. If not - init will fail. It's simple. When linux
>will support tc offloading for tap devices we'll have offloading
>for free without changing the OVS code.
>
>>
>>>      .construct = netdev_linux_construct,
>>>      .get_stats = netdev_linux_get_stats, @@ -3341,7 +3339,6 @@ const struct
>>> netdev_class netdev_tap_class = {
>>>
>>>  const struct netdev_class netdev_internal_class = {
>>>      NETDEV_LINUX_CLASS_COMMON,
>>> -    LINUX_FLOW_OFFLOAD_API,
>>>      .type = "internal",
>>>      .construct = netdev_linux_construct,
>>>      .get_stats = netdev_internal_get_stats, diff --git a/lib/netdev-linux.h
>> ...
>>> +static int
>>> +netdev_assign_flow_api(struct netdev *netdev) {
>>> +    struct netdev_registered_flow_api *rfa;
>>> +
>>> +    CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
>>> +        if (!rfa->flow_api->init_flow_api(netdev)) {
>>> +            ovs_refcount_ref(&rfa->refcnt);
>>> +            ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>> +            VLOG_INFO("%s: Assigned flow API '%s'.",
>>> +                      netdev_get_name(netdev), rfa->flow_api->type);
>>> +            return 0;
>>> +        }
>>
>> By this logic the first API which successfully passes init_flow_api() becomes the
>flow_api for this netdev.
>> Assuming there are 2 vport APIs: 1. system datapath vport API 2. netdev
>datapath vport API (planned to be added soon).
>> Assuming both vport APIs successfully pass init_flow_api(), then the first one to
>be tested will always be chosen and the second one will never be used.
>> Are we sure that the system datapath vport will not always be chosen with the
>current netdev_tc_init_flow_api() implementation.
>> How can we distinguish between a vport created under different datapaths?
>> This issue must be resolved for this patch.
>
>netdev-vport that works in userspace datapath has no backing linux
>interface --> get_ifindex failure --> no offloading API configured.
>When the planned offloading in dpdk flow api provider will be
>implemented it will be chosen.
>
>Right now we don't have and don't plan to have natdevs that suitable
>for more than one flow api provider. If they appear in the future, we
>could solve this by choosing most suitable provider instead of first
>one. But this is over-engineering for now and even for a near future.
>
This is more a request for a clarification. 
If I understand correctly,  you expect that there will be one flow api provider for dpdk. 
When the api is called we must test the netdev class and then do the needed tests (maybe HW capabilities) so we can tell if we should return true or false. 
For vport we have multiple subtypes (vxlan,gre..etc), so if we only support sub group we can test the sub type by its name. 
What happens if we have both system and netdev datapath (corner case, but still)? If the system is called first and there is no backing linux port then we are OK. 
What will happen if the netdev offload provider will be called first? do you expect that we do the same test (linux backing port?) in the dpdk implementation? Is this type of test being correct for all vport types? That is, if we don't have a backing linux port (ifindex) it means that the port is part of a netdev bridge?
Basically, there is nothing in the init flow parameters that can tell us if the port should be netdev or system (ifindex is implementation related) because we don't have the context of the caller.

>>
>>> +        VLOG_DBG("%s: flow API '%s' is not suitable.",
>>> +                 netdev_get_name(netdev), rfa->flow_api->type);
>>> +    }
>>> +    VLOG_INFO("%s: No suitable flow API found.",
>>> + netdev_get_name(netdev));
>>> +
>>> +    return -1;
>>
>> Please return an enumerated positive error.
>
>It's an internal helper function. Not the public API.
>All the functions that visible outside this module returns proper
>positive error codes.
>
>>
>>> +}
>>> +
>>>  int
>>>  netdev_flow_flush(struct netdev *netdev)  {
>>> -    const struct netdev_class *class = netdev->netdev_class;
>>
>> A few more comments.
>> 1. This patch changes code related to OVS offload implementation. OVS offload
>must be confirmed with this patch before it is accepted.
>
>Sure. And I'll appreciate any help with testing as I'm limited with
>offloading capable hardware.
>
>>
>> 2. We may suggest to offload "dpdkvhostclient" port type and "vxlan" (under
>netdev datapath)  port type.
>> Please confirm the following understanding of the required steps.
>> 2.1 Need to register two new flow_apis for the two ports types.
>> 2.2 Need to write the corresponding init_flow_api() to test the relevant netdev
>instances.
>
>No, you don't need to implement new flow api provider. I assume that
>offloading for these netdevs will be implemented via DPDK. So, you
>need to add the functionality to netdev_rte_offload provider and
>teach it to return success for these netdevs from the init_flow_api().
>
>> 2.3 The question how to distinguish between a vport under system datapath
>versus vport under netdev datapath is still an open question (would be happy to
>learn that it is already resolved).
>
>Solved. See above.
>
>>
>> 3. Could you please add inline documentation for the newly added code? It
>seems the logic is spread is several places so any comments will be helpful.
>
>Provider registering API has comments. 'netdev_assign_flow_api' is
>quiet simple and mostly documented by the log messages inside.
>I'd like to add some comments if you'll point me functions that needs them.


More information about the dev mailing list