[ovs-dev] [PATCH v8 1/3] dpif-netlink: Detect Out-Of-Resource condition on a netdev

Simon Horman simon.horman at netronome.com
Fri Oct 26 09:53:08 UTC 2018


On Fri, 26 Oct 2018 at 08:58, Eelco Chaudron <echaudro at redhat.com> wrote:

>
>
> On 25 Oct 2018, at 16:00, Sriharsha Basavapatna wrote:
>
> > Hi Eelco,
> >
> > Thanks for your comments, please see my response below.
> > On Fri, Oct 19, 2018 at 7:52 PM Eelco Chaudron <echaudro at redhat.com>
> > wrote:
> >>
> >> On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:
> >>
> >>> This is the first patch in the patch-set to support dynamic
> >>> rebalancing
> >>> of offloaded flows.
> >>>
> >>> The patch detects OOR condition on a netdev port when ENOSPC error
> >>> is
> >>> returned by TC-Flower while adding a flow rule. A new structure is
> >>> added
> >>> to the netdev called "netdev_hw_info", to store OOR related
> >>> information
> >>> required to perform dynamic offload-rebalancing.
> >>>
> >>> Signed-off-by: Sriharsha Basavapatna
> >>> <sriharsha.basavapatna at broadcom.com>
> >>> Co-authored-by: Venkat Duvvuru <venkatkumar.duvvuru at broadcom.com>
> >>> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru at broadcom.com>
> >>> Reviewed-by: Sathya Perla <sathya.perla at broadcom.com>
> >>> Reviewed-by: Simon Horman <simon.horman at netronome.com>
> >>> Reviewed-by: Ben Pfaff <blp at ovn.org>
> >>> ---
> >>>  lib/dpif-netlink.c    | 18 +++++++++++++++++-
> >>>  lib/flow.c            | 25 +++++++++++++++++++++++++
> >>>  lib/flow.h            |  1 +
> >>>  lib/netdev-provider.h | 11 +++++++++++
> >>>  lib/netdev.c          | 34 ++++++++++++++++++++++++++++++++++
> >>>  lib/netdev.h          |  3 +++
> >>>  6 files changed, 91 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> >>> index e6d5a6ec5..b9ce9cbe2 100644
> >>> --- a/lib/dpif-netlink.c
> >>> +++ b/lib/dpif-netlink.c
> >>> @@ -2178,7 +2178,23 @@ parse_flow_put(struct dpif_netlink *dpif,
> >>> struct dpif_flow_put *put)
> >>>
> >>>          VLOG_DBG("added flow");
> >>>      } else if (err != EEXIST) {
> >>> -        VLOG_ERR_RL(&rl, "failed to offload flow: %s",
> >>> ovs_strerror(err));
> >>> +        struct netdev *oor_netdev = NULL;
> >>> +        if (err == ENOSPC &&
> >>> netdev_is_offload_rebalance_policy_enabled()) {
> >>> +            /*
> >>> +             * We need to set OOR on the input netdev (i.e, 'dev')
> >>> for the
> >>> +             * flow. But if the flow has a tunnel attribute (i.e,
> >>> decap action,
> >>> +             * with a virtual device like a VxLAN interface as its
> >>> in-port),
> >>> +             * then lookup and set OOR on the underlying tunnel
> >>> (real) netdev.
> >>> +             */
> >>> +            oor_netdev =
> >>> flow_get_tunnel_netdev(&match.flow.tunnel);
> >>> +            if (!oor_netdev) {
> >>> +                /* Not a 'tunnel' flow */
> >>> +                oor_netdev = dev;
> >>> +            }
> >>> +            netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true);
> >>
> >> Why not just oor_netdev->hw_info.oor = true, see also below.
> >
> > The original code was directly accessing netdev members. It was
> > changed
> > based on a review comment to avoid direct access and add an interface.
> >
> >>
> >> I have a general comment, don't know where to put it, so I put it
> >> here.
> >> Some hardware might have multiple tables. If one type of table is
> >> full
> >> the ENOSPC might be returned, but it does not mean all type of flows
> >> can
> >> no longer be offloaded. This might be a situation to think about.
> >
> > Ok, thanks for bringing it up. Currently from OvS daemon's perspective
> > a
> > request to add/delete a flow is issued on a netdev and the failure
> > indicates
> > that the particular netdev is out of resources. If we need to handle
> > the
> > condition where HW has different tables, we need to further extend
> > this
> > design and the tc interfaces to propagate this fine grained
> > information.
>
> Would be good if other hardware vendors can comment here?
>

There was a discussion in another forum involving at least Mellanox,
Broadcom and Netronome.
>From a Netronome point of view this scheme is satisfactory and my
recollection is that
was the agreement of those involved in the discussion.


>
> >>
> >>> +        }
> >>> +        VLOG_ERR_RL(&rl, "failed to offload flow: %s: %s",
> >>> ovs_strerror(err),
> >>> +                    (oor_netdev ? oor_netdev->name : dev->name));
> >>>      }
> >>>
> >>>  out:
> >>> diff --git a/lib/flow.c b/lib/flow.c
> >>> index 77ed3d9df..a39807908 100644
> >>> --- a/lib/flow.c
> >>> +++ b/lib/flow.c
> >>> @@ -19,6 +19,7 @@
> >>>  #include <errno.h>
> >>>  #include <inttypes.h>
> >>>  #include <limits.h>
> >>> +#include <net/if.h>
> >>>  #include <netinet/in.h>
> >>>  #include <netinet/icmp6.h>
> >>>  #include <netinet/ip6.h>
> >>> @@ -41,6 +42,8 @@
> >>>  #include "unaligned.h"
> >>>  #include "util.h"
> >>>  #include "openvswitch/nsh.h"
> >>> +#include "ovs-router.h"
> >>> +#include "lib/netdev-provider.h"
> >>>
> >>>  COVERAGE_DEFINE(flow_extract);
> >>>  COVERAGE_DEFINE(miniflow_malloc);
> >>> @@ -3403,3 +3406,25 @@ flow_limit_vlans(int vlan_limit)
> >>>          flow_vlan_limit = MIN(vlan_limit, FLOW_MAX_VLAN_HEADERS);
> >>>      }
> >>>  }
> >>> +
> >>> +struct netdev *
> >>> +flow_get_tunnel_netdev(struct flow_tnl *tunnel)
> >>> +{
> >>> +    char iface[IFNAMSIZ];
> >>> +    struct in6_addr ip6;
> >>> +    struct in6_addr gw;
> >>> +
> >>> +    if (tunnel->ip_src) {
> >>> +        in6_addr_set_mapped_ipv4(&ip6, tunnel->ip_src);
> >>> +    } else if (ipv6_addr_is_set(&tunnel->ipv6_src)) {
> >>> +        ip6 = tunnel->ipv6_src;
> >>> +    } else {
> >>> +        return NULL;
> >>> +    }
> >>> +
> >>> +    if (!ovs_router_lookup(0, &ip6, iface, NULL, &gw)) {
> >>> +        return NULL;
> >>> +    }
> >>> +
> >>> +    return netdev_from_name(iface);
> >>> +}
> >>> diff --git a/lib/flow.h b/lib/flow.h
> >>> index d03f1ba9c..aca60c41a 100644
> >>> --- a/lib/flow.h
> >>> +++ b/lib/flow.h
> >>> @@ -73,6 +73,7 @@ void flow_extract(struct dp_packet *, struct flow
> >>> *);
> >>>  void flow_zero_wildcards(struct flow *, const struct flow_wildcards
> >>> *);
> >>>  void flow_unwildcard_tp_ports(const struct flow *, struct
> >>> flow_wildcards *);
> >>>  void flow_get_metadata(const struct flow *, struct match
> >>> *flow_metadata);
> >>> +struct netdev *flow_get_tunnel_netdev(struct flow_tnl *tunnel);
> >>>
> >>>  const char *ct_state_to_string(uint32_t state);
> >>>  uint32_t ct_state_from_string(const char *);
> >>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> >>> index 5a7947351..e320dad61 100644
> >>> --- a/lib/netdev-provider.h
> >>> +++ b/lib/netdev-provider.h
> >>> @@ -35,6 +35,15 @@ extern "C" {
> >>>  struct netdev_tnl_build_header_params;
> >>>  #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC
> >>>
> >>> +/* Offload-capable (HW) netdev information */
> >>> +struct netdev_hw_info {
> >>> +    bool oor;                /* Out of Offload Resources ? */
> >>> +};
> >>> +
> >>> +enum hw_info_type {
> >>> +    HW_INFO_TYPE_OOR = 1     /* OOR state */
> >>> +};
> >>> +
> >>>  /* A network device (e.g. an Ethernet device).
> >>>   *
> >>>   * Network device implementations may read these members but should
> >>> not modify
> >>> @@ -80,6 +89,8 @@ struct netdev {
> >>>      int n_rxq;
> >>>      struct shash_node *node;            /* Pointer to element in
> >>> global map. */
> >>>      struct ovs_list saved_flags_list; /* Contains "struct
> >>> netdev_saved_flags". */
> >>> +
> >>> +    struct netdev_hw_info hw_info;   /* offload-capable netdev info
> >>> */
> >>>  };
> >>>
> >>>  static inline void
> >>> diff --git a/lib/netdev.c b/lib/netdev.c
> >>> index 690d28409..f3fa08ca3 100644
> >>> --- a/lib/netdev.c
> >>> +++ b/lib/netdev.c
> >>> @@ -415,6 +415,7 @@ netdev_open(const char *name, const char *type,
> >>> struct netdev **netdevp)
> >>>                  netdev->reconfigure_seq = seq_create();
> >>>                  netdev->last_reconfigure_seq =
> >>>                      seq_read(netdev->reconfigure_seq);
> >>> +                netdev->hw_info.oor = false;
> >>>                  netdev->node = shash_add(&netdev_shash, name,
> >>> netdev);
> >>>
> >>>                  /* By default enable one tx and rx queue per
> >>> netdev.
> >> */
> >>> @@ -2252,6 +2253,31 @@ netdev_get_block_id(struct netdev *netdev)
> >>>              : 0);
> >>>  }
> >>>
> >>> +/*
> >>> + * Get the value of the hw info parameter specified by type.
> >>> + * Returns the value on success (>= 0). Returns -1 on failure.
> >>> + */
> >>> +int
> >>> +netdev_get_hw_info(struct netdev *netdev, int type)
> >>> +{
> >>> +    if (type == HW_INFO_TYPE_OOR) {
> >>> +        return netdev->hw_info.oor;
> >>> +    }
> >>> +
> >>> +    return -1;
> >>> +}
> >>> +
> >>> +/*
> >>> + * Set the value of the hw info parameter specified by type.
> >>> + */
> >>> +void
> >>> +netdev_set_hw_info(struct netdev *netdev, int type, int val)
> >>> +{
> >>> +    if (type == HW_INFO_TYPE_OOR) {
> >>> +        netdev->hw_info.oor = val;
> >>> +    }
> >>> +}
> >>> +
> >>>  bool
> >>>  netdev_is_flow_api_enabled(void)
> >>>  {
> >>> @@ -2488,6 +2514,14 @@ netdev_free_custom_stats_counters(struct
> >>> netdev_custom_stats *custom_stats)
> >>>      }
> >>>  }
> >>
> >> I do not like these type of functions as they are easy to cause
> >> problems. For example, you are using a general parameter to set a
> >> bool,
> >> what if your int input needs to set a uint32_t or uint64_t? What on
> >> return values, you might need to explicitly cast them.
> >>
> >> I would suggest that if this structure values need to be used outside
> >> of
> >> the netdev framework define the specific function. Inside just
> >> reference/change the values.
> >>
> >> So something like bool netdev_get(set)_hw_offload_oor().
> >
> > I didn't want to proliferate hw-info interfaces in netdev; we can
> > change
> > it in the future if we need to process other types like you mentioned.
> >>
> >>>
> >>> +static bool netdev_offload_rebalance_policy = false;
> >>> +
> >>> +bool
> >>> +netdev_is_offload_rebalance_policy_enabled(void)
> >>> +{
> >>> +    return netdev_offload_rebalance_policy;
> >>> +}
> >>> +
> >>
> >> See general comment on the cover letter, about a more granular
> >> configuration option.
> >
> > I responded earlier to this comment on the cover letter.
> >
> > Thanks,
> > -Harsha
> >>
> >>>  #ifdef __linux__
> >>>  static void
> >>>  netdev_ports_flow_init(void)
> >>> diff --git a/lib/netdev.h b/lib/netdev.h
> >>> index 556676046..b0e5c5b72 100644
> >>> --- a/lib/netdev.h
> >>> +++ b/lib/netdev.h
> >>> @@ -227,8 +227,11 @@ int netdev_flow_del(struct netdev *, const
> >>> ovs_u128 *,
> >>>                      struct dpif_flow_stats *);
> >>>  int netdev_init_flow_api(struct netdev *);
> >>>  uint32_t netdev_get_block_id(struct netdev *);
> >>> +int netdev_get_hw_info(struct netdev *, int);
> >>> +void netdev_set_hw_info(struct netdev *, int, int);
> >>>  bool netdev_is_flow_api_enabled(void);
> >>>  void netdev_set_flow_api_enabled(const struct smap
> >>> *ovs_other_config);
> >>> +bool netdev_is_offload_rebalance_policy_enabled(void);
> >>>
> >>>  struct dpif_port;
> >>>  int netdev_ports_insert(struct netdev *, const struct dpif_class *,
> >>> --
> >>> 2.18.0.rc1.1.g6f333ff
> >>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev at openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list