[ovs-dev] [PATCH 01/12] dpcls: Use 32 packet batches for lookups.

Fischetti, Antonio antonio.fischetti at intel.com
Tue Oct 11 15:33:17 UTC 2016


Comments inline.

Thanks,
Antonio

> -----Original Message-----
> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Jarno
> Rajahalme
> Sent: Friday, October 7, 2016 10:08 PM
> To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 01/12] dpcls: Use 32 packet batches for
> lookups.
> 
> 
> > On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy
> <bhanuprakash.bodireddy at intel.com> wrote:
> >
> > This patch increases the number of packets processed in a batch during a
> > lookup from 16 to 32. Processing batches of 32 packets improves
> > performance and also one of the internal loops can be avoided here.
> >
> 
> Can you provide some qualification of the performance test conditions? Do
> you believe the performance difference applies universally?
> 
[Antonio F] We saw a performance improvement in EMC disabled PHY2PHY loopback testcase with 2 physical ports and few tens of active IXIA streams. With 1 subtable - flow: from port1 to port2 - we got a throughput increment of +150 kpps. With 4 subtables the increment was +80 kpps. 
Also we spent some time profiling OVS with VTune. When comparing stock ovs with the patched one, it’s been observed that for dpcls_lookup(), there is significant reduction in the overall retired instructions (reduced by 729,800,000), CPI rate improved from 0.471 to 0.427 and Front-end bound cycles reduced from 19.1% to 8.5%.


> > Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
> > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
> > ---
> > lib/dpif-netdev.c | 109 +++++++++++++++++++++++-------------------------
> ------
> > 1 file changed, 46 insertions(+), 63 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 6e09e44..c002dd3 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4975,23 +4975,20 @@ dpcls_lookup(struct dpcls *cls, const struct
> netdev_flow_key keys[],
> >              int *num_lookups_p)
> > {
> >     /* The received 'cnt' miniflows are the search-keys that will be
> processed
> > -     * in batches of 16 elements.  N_MAPS will contain the number of
> these
> > -     * 16-elements batches.  i.e. for 'cnt' = 32, N_MAPS will be 2.
> The batch
> > -     * size 16 was experimentally found faster than 8 or 32. */
> > -    typedef uint16_t map_type;
> > +     * to find a matching entry into the available subtables.
> > +     * The number of bits in map_type is equal to NETDEV_MAX_BURST. */
> 
> This needs a build time assertion to catch any future change in
> NETDEV_MAX_BURST.
> 
> Preferably, if you can verify that the compiler is able to remove the
> unnecessary loop in this case, you should consider not removing the extra
> loop that would kick in only if NETDEV_MAX_BURST becomes larger than 32.
> 
[Antonio F] Is that ok if we add the following line after the MAP_BITS define?
BUILD_ASSERT_DECL(MAP_BITS >= NETDEV_MAX_BURST);

It seems NETDEV_MAX_BURST was reduced to 32 from 256 a while ago, and one of the reasons 
being the slow VMs and packet drops at VMs. Can we safely assume that NETDEV_MAX_BURST is
unlikely to be increased as it has wider consequence on the system stability?


> > +    typedef uint32_t map_type;
> > #define MAP_BITS (sizeof(map_type) * CHAR_BIT)
> >
> > -#if !defined(__CHECKER__) && !defined(_WIN32)
> > -    const int N_MAPS = DIV_ROUND_UP(cnt, MAP_BITS);
> > -#else
> > -    enum { N_MAPS = DIV_ROUND_UP(NETDEV_MAX_BURST, MAP_BITS) };
> > -#endif
> > -    map_type maps[N_MAPS];
> >     struct dpcls_subtable *subtable;
> >
> > -    memset(maps, 0xff, sizeof maps);
> > -    if (cnt % MAP_BITS) {
> > -        maps[N_MAPS - 1] >>= MAP_BITS - cnt % MAP_BITS; /* Clear extra
> bits. */
> > +    map_type keys_map = 0xffffffff;
> > +    map_type found_map;
> > +    uint32_t hashes[MAP_BITS];
> > +    const struct cmap_node *nodes[MAP_BITS];
> > +
> > +    if (OVS_UNLIKELY(cnt != NETDEV_MAX_BURST)) {
> > +        keys_map >>= NETDEV_MAX_BURST - cnt; /* Clear extra bits. */
> >     }
> >     memset(rules, 0, cnt * sizeof *rules);
> >
> > @@ -5007,59 +5004,45 @@ dpcls_lookup(struct dpcls *cls, const struct
> netdev_flow_key keys[],
> >     PVECTOR_FOR_EACH (subtable, &cls->subtables) {
> >         const struct netdev_flow_key *mkeys = keys;
> >         struct dpcls_rule **mrules = rules;
> > -        map_type remains = 0;
> > -        int m;
> > -
> > -        BUILD_ASSERT_DECL(sizeof remains == sizeof *maps);
> > -
> > -        /* Loops on each batch of 16 search-keys. */
> > -        for (m = 0; m < N_MAPS; m++, mkeys += MAP_BITS, mrules +=
> MAP_BITS) {
> > -            uint32_t hashes[MAP_BITS];
> > -            const struct cmap_node *nodes[MAP_BITS];
> > -            unsigned long map = maps[m];
> > -            int i;
> > -
> > -            if (!map) {
> > -                continue; /* Skip empty maps. */
> > -            }
> > -
> > -            /* Compute hashes for the remaining keys.  Each search-key
> is
> > -             * masked with the subtable's mask to avoid hashing the
> wildcarded
> > -             * bits. */
> > -            ULLONG_FOR_EACH_1(i, map) {
> > -                hashes[i] = netdev_flow_key_hash_in_mask(&mkeys[i],
> > -                                                         &subtable-
> >mask);
> > -            }
> > -            /* Lookup. */
> > -            map = cmap_find_batch(&subtable->rules, map, hashes,
> nodes);
> > -            /* Check results.  When the i-th bit of map is set, it
> means that a
> > -             * set of nodes with a matching hash value was found for
> the i-th
> > -             * search-key.  Due to possible hash collisions we need to
> check
> > -             * which of the found rules, if any, really matches our
> masked
> > -             * search-key. */
> > -            ULLONG_FOR_EACH_1(i, map) {
> > -                struct dpcls_rule *rule;
> > -
> > -                CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
> > -                    if (OVS_LIKELY(dpcls_rule_matches_key(rule,
> &mkeys[i]))) {
> > -                        mrules[i] = rule;
> > -                        /* Even at 20 Mpps the 32-bit hit_cnt cannot
> wrap
> > -                         * within one second optimization interval  */
> > -                        subtable->hit_cnt++;
> > -                        lookups_match += subtable_pos;
> > -                        goto next;
> > -                    }
> > +        int i;
> > +        found_map = keys_map;
> > +
> > +        /* Compute hashes for the remaining keys.  Each search-key is
> > +         * masked with the subtable's mask to avoid hashing the
> wildcarded
> > +         * bits. */
> > +        ULLONG_FOR_EACH_1(i, keys_map) {
> > +            hashes[i] = netdev_flow_key_hash_in_mask(&mkeys[i],
> > +                                                     &subtable->mask);
> > +        }
> > +        /* Lookup. */
> > +        found_map = cmap_find_batch(&subtable->rules, found_map,
> hashes,
> > +                                    nodes);
> > +        /* Check results.  When the i-th bit of found_map is set, it
> means
> > +         * that a set of nodes with a matching hash value was found for
> the
> > +         * i-th search-key.  Due to possible hash collisions we need to
> check
> > +         * which of the found rules, if any, really matches our masked
> > +         * search-key. */
> > +        ULLONG_FOR_EACH_1(i, found_map) {
> > +            struct dpcls_rule *rule;
> > +
> > +            CMAP_NODE_FOR_EACH (rule, cmap_node, nodes[i]) {
> > +                if (OVS_LIKELY(dpcls_rule_matches_key(rule,
> &mkeys[i]))) {
> > +                    mrules[i] = rule;
> > +                    /* Even at 20 Mpps the 32-bit hit_cnt cannot wrap
> > +                     * within one second optimization interval. */
> > +                    subtable->hit_cnt++;
> > +                    lookups_match += subtable_pos;
> > +                    goto next;
> >                 }
> > -                /* None of the found rules was a match. Reset the i-th
> bit to
> > -                 * keep searching in the next subtable. */
> > -                ULLONG_SET0(map, i);  /* Did not match. */
> > -            next:
> > -                ;                     /* Keep Sparse happy. */
> >             }
> > -            maps[m] &= ~map;          /* Clear the found rules. */
> > -            remains |= maps[m];
> > +            /* None of the found rules was a match.  Reset the i-th bit
> to
> > +             * keep searching this key in the next subtable. */
> > +            ULLONG_SET0(found_map, i);  /* Did not match. */
> > +        next:
> > +            ;                     /* Keep Sparse happy. */
> >         }
> > -        if (!remains) {
> > +        keys_map &= ~found_map;             /* Clear the found rules.
> */
> > +        if (!keys_map) {
> >             if (num_lookups_p) {
> >                 *num_lookups_p = lookups_match;
> >             }
> > --
> > 2.4.11
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev


More information about the dev mailing list