[ovs-dev] [PATCH v8 1/3] dpif-netlink: Detect Out-Of-Resource condition on a netdev
Eelco Chaudron
echaudro at redhat.com
Fri Oct 26 07:58:01 UTC 2018
On 25 Oct 2018, at 16:00, Sriharsha Basavapatna wrote:
> Hi Eelco,
>
> Thanks for your comments, please see my response below.
> On Fri, Oct 19, 2018 at 7:52 PM Eelco Chaudron <echaudro at redhat.com>
> wrote:
>>
>> On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:
>>
>>> This is the first patch in the patch-set to support dynamic
>>> rebalancing
>>> of offloaded flows.
>>>
>>> The patch detects OOR condition on a netdev port when ENOSPC error
>>> is
>>> returned by TC-Flower while adding a flow rule. A new structure is
>>> added
>>> to the netdev called "netdev_hw_info", to store OOR related
>>> information
>>> required to perform dynamic offload-rebalancing.
>>>
>>> Signed-off-by: Sriharsha Basavapatna
>>> <sriharsha.basavapatna at broadcom.com>
>>> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru at broadcom.com>
>>> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru at broadcom.com>
>>> Reviewed-by: Sathya Perla <sathya.perla at broadcom.com>
>>> Reviewed-by: Simon Horman <simon.horman at netronome.com>
>>> Reviewed-by: Ben Pfaff <blp at ovn.org>
>>> ---
>>> lib/dpif-netlink.c | 18 +++++++++++++++++-
>>> lib/flow.c | 25 +++++++++++++++++++++++++
>>> lib/flow.h | 1 +
>>> lib/netdev-provider.h | 11 +++++++++++
>>> lib/netdev.c | 34 ++++++++++++++++++++++++++++++++++
>>> lib/netdev.h | 3 +++
>>> 6 files changed, 91 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>>> index e6d5a6ec5..b9ce9cbe2 100644
>>> --- a/lib/dpif-netlink.c
>>> +++ b/lib/dpif-netlink.c
>>> @@ -2178,7 +2178,23 @@ parse_flow_put(struct dpif_netlink *dpif,
>>> struct dpif_flow_put *put)
>>>
>>> VLOG_DBG("added flow");
>>> } else if (err != EEXIST) {
>>> - VLOG_ERR_RL(&rl, "failed to offload flow: %s",
>>> ovs_strerror(err));
>>> + struct netdev *oor_netdev = NULL;
>>> + if (err == ENOSPC &&
>>> netdev_is_offload_rebalance_policy_enabled()) {
>>> + /*
>>> + * We need to set OOR on the input netdev (i.e, 'dev')
>>> for the
>>> + * flow. But if the flow has a tunnel attribute (i.e,
>>> decap action,
>>> + * with a virtual device like a VxLAN interface as its
>>> in-port),
>>> + * then lookup and set OOR on the underlying tunnel
>>> (real) netdev.
>>> + */
>>> + oor_netdev =
>>> flow_get_tunnel_netdev(&match.flow.tunnel);
>>> + if (!oor_netdev) {
>>> + /* Not a 'tunnel' flow */
>>> + oor_netdev = dev;
>>> + }
>>> + netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true);
>>
>> Why not just oor_netdev->hw_info.oor = true, see also below.
>
> The original code was directly accessing netdev members. It was
> changed
> based on a review comment to avoid direct access and add an interface.
>
>>
>> I have a general comment, don't know where to put it, so I put it
>> here.
>> Some hardware might have multiple tables. If one type of table is
>> full
>> the ENOSPC might be returned, but it does not mean all type of flows
>> can
>> no longer be offloaded. This might be a situation to think about.
>
> Ok, thanks for bringing it up. Currently from OvS daemon's perspective
> a
> request to add/delete a flow is issued on a netdev and the failure
> indicates
> that the particular netdev is out of resources. If we need to handle
> the
> condition where HW has different tables, we need to further extend
> this
> design and the tc interfaces to propagate this fine grained
> information.
Would be good if other hardware vendors can comment here?
>>
>>> + }
>>> + VLOG_ERR_RL(&rl, "failed to offload flow: %s: %s",
>>> ovs_strerror(err),
>>> + (oor_netdev ? oor_netdev->name : dev->name));
>>> }
>>>
>>> out:
>>> diff --git a/lib/flow.c b/lib/flow.c
>>> index 77ed3d9df..a39807908 100644
>>> --- a/lib/flow.c
>>> +++ b/lib/flow.c
>>> @@ -19,6 +19,7 @@
>>> #include <errno.h>
>>> #include <inttypes.h>
>>> #include <limits.h>
>>> +#include <net/if.h>
>>> #include <netinet/in.h>
>>> #include <netinet/icmp6.h>
>>> #include <netinet/ip6.h>
>>> @@ -41,6 +42,8 @@
>>> #include "unaligned.h"
>>> #include "util.h"
>>> #include "openvswitch/nsh.h"
>>> +#include "ovs-router.h"
>>> +#include "lib/netdev-provider.h"
>>>
>>> COVERAGE_DEFINE(flow_extract);
>>> COVERAGE_DEFINE(miniflow_malloc);
>>> @@ -3403,3 +3406,25 @@ flow_limit_vlans(int vlan_limit)
>>> flow_vlan_limit = MIN(vlan_limit, FLOW_MAX_VLAN_HEADERS);
>>> }
>>> }
>>> +
>>> +struct netdev *
>>> +flow_get_tunnel_netdev(struct flow_tnl *tunnel)
>>> +{
>>> + char iface[IFNAMSIZ];
>>> + struct in6_addr ip6;
>>> + struct in6_addr gw;
>>> +
>>> + if (tunnel->ip_src) {
>>> + in6_addr_set_mapped_ipv4(&ip6, tunnel->ip_src);
>>> + } else if (ipv6_addr_is_set(&tunnel->ipv6_src)) {
>>> + ip6 = tunnel->ipv6_src;
>>> + } else {
>>> + return NULL;
>>> + }
>>> +
>>> + if (!ovs_router_lookup(0, &ip6, iface, NULL, &gw)) {
>>> + return NULL;
>>> + }
>>> +
>>> + return netdev_from_name(iface);
>>> +}
>>> diff --git a/lib/flow.h b/lib/flow.h
>>> index d03f1ba9c..aca60c41a 100644
>>> --- a/lib/flow.h
>>> +++ b/lib/flow.h
>>> @@ -73,6 +73,7 @@ void flow_extract(struct dp_packet *, struct flow
>>> *);
>>> void flow_zero_wildcards(struct flow *, const struct flow_wildcards
>>> *);
>>> void flow_unwildcard_tp_ports(const struct flow *, struct
>>> flow_wildcards *);
>>> void flow_get_metadata(const struct flow *, struct match
>>> *flow_metadata);
>>> +struct netdev *flow_get_tunnel_netdev(struct flow_tnl *tunnel);
>>>
>>> const char *ct_state_to_string(uint32_t state);
>>> uint32_t ct_state_from_string(const char *);
>>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
>>> index 5a7947351..e320dad61 100644
>>> --- a/lib/netdev-provider.h
>>> +++ b/lib/netdev-provider.h
>>> @@ -35,6 +35,15 @@ extern "C" {
>>> struct netdev_tnl_build_header_params;
>>> #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC
>>>
>>> +/* Offload-capable (HW) netdev information */
>>> +struct netdev_hw_info {
>>> + bool oor; /* Out of Offload Resources ? */
>>> +};
>>> +
>>> +enum hw_info_type {
>>> + HW_INFO_TYPE_OOR = 1 /* OOR state */
>>> +};
>>> +
>>> /* A network device (e.g. an Ethernet device).
>>> *
>>> * Network device implementations may read these members but should
>>> not modify
>>> @@ -80,6 +89,8 @@ struct netdev {
>>> int n_rxq;
>>> struct shash_node *node; /* Pointer to element in
>>> global map. */
>>> struct ovs_list saved_flags_list; /* Contains "struct
>>> netdev_saved_flags". */
>>> +
>>> + struct netdev_hw_info hw_info; /* offload-capable netdev info
>>> */
>>> };
>>>
>>> static inline void
>>> diff --git a/lib/netdev.c b/lib/netdev.c
>>> index 690d28409..f3fa08ca3 100644
>>> --- a/lib/netdev.c
>>> +++ b/lib/netdev.c
>>> @@ -415,6 +415,7 @@ netdev_open(const char *name, const char *type,
>>> struct netdev **netdevp)
>>> netdev->reconfigure_seq = seq_create();
>>> netdev->last_reconfigure_seq =
>>> seq_read(netdev->reconfigure_seq);
>>> + netdev->hw_info.oor = false;
>>> netdev->node = shash_add(&netdev_shash, name,
>>> netdev);
>>>
>>> /* By default enable one tx and rx queue per
>>> netdev.
>> */
>>> @@ -2252,6 +2253,31 @@ netdev_get_block_id(struct netdev *netdev)
>>> : 0);
>>> }
>>>
>>> +/*
>>> + * Get the value of the hw info parameter specified by type.
>>> + * Returns the value on success (>= 0). Returns -1 on failure.
>>> + */
>>> +int
>>> +netdev_get_hw_info(struct netdev *netdev, int type)
>>> +{
>>> + if (type == HW_INFO_TYPE_OOR) {
>>> + return netdev->hw_info.oor;
>>> + }
>>> +
>>> + return -1;
>>> +}
>>> +
>>> +/*
>>> + * Set the value of the hw info parameter specified by type.
>>> + */
>>> +void
>>> +netdev_set_hw_info(struct netdev *netdev, int type, int val)
>>> +{
>>> + if (type == HW_INFO_TYPE_OOR) {
>>> + netdev->hw_info.oor = val;
>>> + }
>>> +}
>>> +
>>> bool
>>> netdev_is_flow_api_enabled(void)
>>> {
>>> @@ -2488,6 +2514,14 @@ netdev_free_custom_stats_counters(struct
>>> netdev_custom_stats *custom_stats)
>>> }
>>> }
>>
>> I do not like these type of functions as they are easy to cause
>> problems. For example, you are using a general parameter to set a
>> bool,
>> what if your int input needs to set a uint32_t or uint64_t? What on
>> return values, you might need to explicitly cast them.
>>
>> I would suggest that if this structure values need to be used outside
>> of
>> the netdev framework define the specific function. Inside just
>> reference/change the values.
>>
>> So something like bool netdev_get(set)_hw_offload_oor().
>
> I didn't want to proliferate hw-info interfaces in netdev; we can
> change
> it in the future if we need to process other types like you mentioned.
>>
>>>
>>> +static bool netdev_offload_rebalance_policy = false;
>>> +
>>> +bool
>>> +netdev_is_offload_rebalance_policy_enabled(void)
>>> +{
>>> + return netdev_offload_rebalance_policy;
>>> +}
>>> +
>>
>> See general comment on the cover letter, about a more granular
>> configuration option.
>
> I responded earlier to this comment on the cover letter.
>
> Thanks,
> -Harsha
>>
>>> #ifdef __linux__
>>> static void
>>> netdev_ports_flow_init(void)
>>> diff --git a/lib/netdev.h b/lib/netdev.h
>>> index 556676046..b0e5c5b72 100644
>>> --- a/lib/netdev.h
>>> +++ b/lib/netdev.h
>>> @@ -227,8 +227,11 @@ int netdev_flow_del(struct netdev *, const
>>> ovs_u128 *,
>>> struct dpif_flow_stats *);
>>> int netdev_init_flow_api(struct netdev *);
>>> uint32_t netdev_get_block_id(struct netdev *);
>>> +int netdev_get_hw_info(struct netdev *, int);
>>> +void netdev_set_hw_info(struct netdev *, int, int);
>>> bool netdev_is_flow_api_enabled(void);
>>> void netdev_set_flow_api_enabled(const struct smap
>>> *ovs_other_config);
>>> +bool netdev_is_offload_rebalance_policy_enabled(void);
>>>
>>> struct dpif_port;
>>> int netdev_ports_insert(struct netdev *, const struct dpif_class *,
>>> --
>>> 2.18.0.rc1.1.g6f333ff
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list