[ovs-dev] [PATCH v7 3/6] netdev-dpdk: implement flow offload with rte flow

Stokes, Ian ian.stokes at intel.com
Thu Mar 22 09:47:22 UTC 2018


> >-----Original Message-----
> >From: Stokes, Ian <ian.stokes at intel.com>
> >Sent: 21. marts 2018 14:38
> >To: Finn Christensen <fc at napatech.com>; 'Yuanhan Liu'
> ><yliu at fridaylinux.org>; dev at openvswitch.org; Shahaf Shuler
> ><shahafs at mellanox.com>
> >Cc: Darrell Ball <dball at vmware.com>; Chandran, Sugesh
> ><sugesh.chandran at intel.com>; Simon Horman <simon.horman at netronome.com>
> >Subject: RE: [PATCH v7 3/6] netdev-dpdk: implement flow offload with
> >rte flow
> >
> >> -----Original Message-----
> >> From: Finn Christensen [mailto:fc at napatech.com]
> >> Sent: Friday, March 16, 2018 12:44 PM
> >> To: Stokes, Ian <ian.stokes at intel.com>; 'Yuanhan Liu'
> >> <yliu at fridaylinux.org>; dev at openvswitch.org; Shahaf Shuler
> >> <shahafs at mellanox.com>
> >> Cc: Darrell Ball <dball at vmware.com>; Chandran, Sugesh
> >> <sugesh.chandran at intel.com>; Simon Horman
> ><simon.horman at netronome.com>
> >> Subject: RE: [PATCH v7 3/6] netdev-dpdk: implement flow offload with
> >> rte flow
> >>
> >> Hi Ian,
> >>
> >> Thanks for the review.
> >> I'll fix the compilation issues and the comments you have below.
> >> Please see my answers below.
> >>
> >> Thanks
> >> Finn
> >>
> >> >-----Original Message-----
> >> >From: Stokes, Ian <ian.stokes at intel.com>
> >> >Sent: 12. marts 2018 16:00
> >> >To: 'Yuanhan Liu' <yliu at fridaylinux.org>; dev at openvswitch.org
> >> >Cc: Finn Christensen <fc at napatech.com>; Darrell Ball
> >> ><dball at vmware.com>; Chandran, Sugesh <sugesh.chandran at intel.com>;
> >> >Simon Horman <simon.horman at netronome.com>
> >> >Subject: RE: [PATCH v7 3/6] netdev-dpdk: implement flow offload with
> >> >rte flow
> >> >
> >> >> -----Original Message-----
> >> >> From: Yuanhan Liu [mailto:yliu at fridaylinux.org]
> >> >> Sent: Monday, January 29, 2018 7:00 AM
> >> >> To: dev at openvswitch.org
> >> >> Cc: Stokes, Ian <ian.stokes at intel.com>; Finn Christensen
> >> >> <fc at napatech.com>; Darrell Ball <dball at vmware.com>; Chandran,
> >> >> Sugesh <sugesh.chandran at intel.com>; Simon Horman
> >> >> <simon.horman at netronome.com>; Yuanhan Liu <yliu at fridaylinux.org>
> >> >> Subject: [PATCH v7 3/6] netdev-dpdk: implement flow offload with
> >> >> rte flow
> >> >>
> >> >> From: Finn Christensen <fc at napatech.com>
> >> >>
> >> >> The basic yet the major part of this patch is to translate the
> "match"
> >> >> to rte flow patterns. And then, we create a rte flow with MARK +
> >> >> RSS actions. Afterwards, all packets match the flow will have the
> >> >> mark id in the mbuf.
> >> >>
> >> >> The reason RSS is needed is, for most NICs, a MARK only action is
> >> >> not allowed. It has to be used together with some other actions,
> >> >> such as QUEUE, RSS, etc. However, QUEUE action can specify one
> >> >> queue only, which may break the rss. Likely, RSS action is
> >> >> currently the best we could
> >> >now.
> >> >> Thus, RSS action is choosen.
> >> >>
> >> >> For any unsupported flows, such as MPLS, -1 is returned, meaning
> >> >> the flow offload is failed and then skipped.
> >> >>
> >> >> Co-authored-by: Yuanhan Liu <yliu at fridaylinux.org>
> >> >> Signed-off-by: Finn Christensen <fc at napatech.com>
> >> >> Signed-off-by: Yuanhan Liu <yliu at fridaylinux.org>
> >> >
> >> >Hi Finn,
> >> >
> >> >Thanks for working on this. There are compilation errors introduced
> >> >by this commit, details below along with other comments.
> >> >
> >> >For completeness here is the link to the OVS Travis build with
> >> >associated failures
> >> >
> >> >https://travis-ci.org/istokes/ovs/builds/352283122
> >> >
> >> >
> >> >Thanks
> >> >Ian
> >> >
> >> >> ---
> >> >>
> >> >> v7: - set the rss_conf for rss action to NULL, to workaround a
> >> >> mlx5
> >> change
> >> >>       in DPDK v17.11. Note that it will obey the rss settings
> >> >> OVS-DPDK
> >> has
> >> >>       set in the beginning. Thus, nothing should be effected.
> >> >>
> >> >> ---
> >> >>  lib/netdev-dpdk.c | 559
> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >> >>  1 file changed, 558 insertions(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >> >> ac2e38e..4bd0503
> >> >> 100644
> >> >> --- a/lib/netdev-dpdk.c
> >> >> +++ b/lib/netdev-dpdk.c
> >> >> @@ -38,7 +38,9 @@
> >> >>  #include <rte_pci.h>
> >> >>  #include <rte_vhost.h>
> >> >>  #include <rte_version.h>
> >> >> +#include <rte_flow.h>
> >> >>
> >> >> +#include "cmap.h"
> >> >>  #include "dirs.h"
> >> >>  #include "dp-packet.h"
> >> >>  #include "dpdk.h"
> >> >> @@ -60,6 +62,7 @@
> >> >>  #include "sset.h"
> >> >>  #include "unaligned.h"
> >> >>  #include "timeval.h"
> >> >> +#include "uuid.h"
> >> >>  #include "unixctl.h"
> >> >>
> >> >>  enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM}; @@ -3633,6
> >+3636,560
> >> >@@
> >> >> unlock:
> >> >>      return err;
> >> >>  }
> >> >>
> >> >> +/*
> >> >> + * A mapping from ufid to dpdk rte_flow.
> >> >> + */
> >> >> +static struct cmap ufid_to_rte_flow = CMAP_INITIALIZER;
> >> >> +
> >> >> +struct ufid_to_rte_flow_data {
> >> >> +    struct cmap_node node;
> >> >> +    ovs_u128 ufid;
> >> >> +    struct rte_flow *rte_flow;
> >> >> +};
> >> >
> >> >Might be cleaner to declare above along with the other structs/maps
> >> >specific to netdev-dpdk.c towards the beginning of this file.
> >>
> >> Ok done.
> >>
> >> >
> >> >> +
> >> >> +/* Find rte_flow with @ufid */
> >> >> +static struct rte_flow *
> >> >> +ufid_to_rte_flow_find(const ovs_u128 *ufid) {
> >> >> +    size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
> >> >> +    struct ufid_to_rte_flow_data *data;
> >> >> +
> >> >> +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> >&ufid_to_rte_flow) {
> >> >> +        if (ovs_u128_equals(*ufid, data->ufid)) {
> >> >> +            return data->rte_flow;
> >> >> +        }
> >> >> +    }
> >> >> +
> >> >> +    return NULL;
> >> >> +}
> >> >> +
> >> >> +static inline void
> >> >> +ufid_to_rte_flow_associate(const ovs_u128 *ufid, struct rte_flow
> >> >> +*rte_flow) {
> >> >> +    size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
> >> >> +    struct ufid_to_rte_flow_data *data = xzalloc(sizeof(*data));
> >> >> +
> >> >> +    /*
> >> >> +     * We should not simply overwrite an existing rte flow.
> >> >> +     * We should have deleted it first before re-adding it.
> >> >> +     * Thus, if following assert triggers, something is wrong:
> >> >> +     * the rte_flow is not destroyed.
> >> >> +     */
> >> >> +    ovs_assert(ufid_to_rte_flow_find(ufid) == NULL);
> >> >> +
> >> >> +    data->ufid = *ufid;
> >> >> +    data->rte_flow = rte_flow;
> >> >> +
> >> >> +    cmap_insert(&ufid_to_rte_flow,
> >> >> +                CONST_CAST(struct cmap_node *, &data->node),
> >> >> + hash); }
> >> >> +
> >> >> +static inline void
> >> >> +ufid_to_rte_flow_disassociate(const ovs_u128 *ufid) {
> >> >> +    size_t hash = hash_bytes(ufid, sizeof(*ufid), 0);
> >> >> +    struct ufid_to_rte_flow_data *data;
> >> >> +
> >> >> +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> >&ufid_to_rte_flow) {
> >> >> +        if (ovs_u128_equals(*ufid, data->ufid)) {
> >> >> +            cmap_remove(&ufid_to_rte_flow,
> >> >> +                        CONST_CAST(struct cmap_node *,
> >> >> + &data->node),
> >> >> hash);
> >> >> +            free(data);
> >> >> +            return;
> >> >> +        }
> >> >> +    }
> >> >> +
> >> >> +    VLOG_WARN("ufid "UUID_FMT" is not associated with an rte
> >flow\n",
> >> >> +              UUID_ARGS((struct uuid *)ufid)); }
> >> >> +
> >> >> +struct flow_patterns {
> >> >> +    struct rte_flow_item *items;
> >> >> +    int cnt;
> >> >> +    int max;
> >> >> +};
> >> >> +
> >> >> +struct flow_actions {
> >> >> +    struct rte_flow_action *actions;
> >> >> +    int cnt;
> >> >> +    int max;
> >> >> +};
> >> >> +
> >> >> +static void
> >> >> +add_flow_pattern(struct flow_patterns *patterns, enum
> >> >> +rte_flow_item_type
> >> >> type,
> >> >> +                 const void *spec, const void *mask) {
> >> >> +    int cnt = patterns->cnt;
> >> >> +
> >> >> +    if (cnt == 0) {
> >> >> +        patterns->max = 8;
> >> >> +        patterns->items = xcalloc(patterns->max, sizeof(struct
> >> >> rte_flow_item));
> >> >> +    } else if (cnt == patterns->max) {
> >> >> +        patterns->max *= 2;
> >> >> +        patterns->items = xrealloc(patterns->items, patterns->max *
> >> >> +                                   sizeof(struct rte_flow_item));
> >> >> +    }
> >> >
> >> >Just a general query about max value above, so if cnt == 0 you set
> >> >the max to 8. However if cnt is == max you then double the value of
> max.
> >> >
> >> >What is the purpose of max and what is it's limit? Maybe it becomes
> >> >clearer later in the patch but at the moment it seems 8 is the
> >> >default max however it can be higher, I just find the behavior a
> >> >little
> >> misleading.
> >>
> >> The rte flow pattern item part is allocated in one allocation and
> >> then grows with xrealloc if needed. It initializes with 8 elements
> >> and then grows by doubling the number of elements. This is done to
> >> avoid a maximum on the number of items and also not allocate for each
> >> single item element individually, basically to simplify the code.
> >> There is no limit other than when xalloc fails and then ovs_abort is
> >> called.
> >> If the "max" wording is a bit misleading, we can change it. Maybe
> >> "current_max" is better?
> >
> >Ah ok, that makes a bit more sense. 'current_max' would be fine with a
> >small comment explaining that we want to avoid individual allocations.
> 
> OK, I'll do that.
> 
> >
> >In the case where the xrealoc fails I can see why allocating on an
> >individual babsis would be poor here.
> >
> >In the case where no more items can be allocated can we fall back to
> >the previous number allocated and flag an error?
> 
> It uses xrealloc. And when it fails, it calls out_of_memory and finally
> abort.
> These allocations are really small, and we will probably have a limited
> number of items, even usually below 8. Furthermore, if it fails that small
> an allocation I think it is OK to have xrealloc flag an out_of_memory. It
> makes the code simpler.
> If you still think it is worth implementing code for failing a flow
> offload caused by an xrealloc failure, let me know.
> 

And once we've exhausted memory there's not much can be done at that point form a user's perspective. OK this approach as is is fine. We can always revisit down the road if it does become an issue.

Ian



More information about the dev mailing list