[ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

Van Haaren, Harry harry.van.haaren at intel.com
Tue Jun 29 16:32:05 UTC 2021


> -----Original Message-----
> From: dev <ovs-dev-bounces at openvswitch.org> On Behalf Of Eelco Chaudron
> Sent: Tuesday, June 29, 2021 1:38 PM
> To: Amber, Kumar <kumar.amber at intel.com>
> Cc: dev at openvswitch.org; i.maximets at ovn.org
> Subject: Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best
> mfex function
> 
> More comments below. FYI I’m only reviewing right now, no testing.

Sure, thanks for reviews.

> On 17 Jun 2021, at 18:27, Kumar Amber wrote:

<snip patch commit and some code>

> > +/* Allocate per thread PMD pointer space for study_stats. */
> > +static inline struct study_stats *
> > +get_study_stats(void)
> > +{
> > +    struct study_stats *stats = study_stats_get();
> > +    if (OVS_UNLIKELY(!stats)) {
> > +       stats = xzalloc(sizeof *stats);
> > +       study_stats_set_unsafe(stats);
> > +    }
> > +    return stats;
> > +}
> > +
> 
> Just got a mind-meld with the code, and realized that the function might be different
> per PMD thread due to this auto mode (and autovalidator mode in the previous
> patch).
> 
> This makes it only stronger that we need a way to see the currently selected mode,
> and not per datapath, but per PMD per datapath!

Study depends on the traffic pattern, so yes you're correct that it depends.
The study command was added after community suggested user-experience
would improve if the user doesn't have to provide an exact miniflow profile name.

Study studies the traffic running on that PMD, compares all MFEX impls, and prints out
hits. It selects the _first_ implementation that surpasses the threshold of packets.

Users are free to use the more specific names of MFEX impls instead of "study"
for fine-grained control over the MFEX impl in use, e.g.

ovs-appctl dpif-netdev/miniflow-parser-set avx512_vbmi_ipv4_udp

> Do we also need a way to set this per PMD?

I don't feel there is real value here, but we could investigate adding an
optional parameter to the command indicating a PMD thread IDX to set?
We have access to "pmd->core_id" in our set() function, so limiting changes
to a specific PMD thread can be done ~ easily... but is it really required?

Perfect is the enemy of good... I'd prefer focus on getting existing code changes merged,
and add additional (optional) parameters in future if deemed useful in real world testing?


> > +uint32_t
> > +mfex_study_traffic(struct dp_packet_batch *packets,
> > +                   struct netdev_flow_key *keys,
> > +                   uint32_t keys_size, odp_port_t in_port,
> > +                   void *pmd_handle)
> > +{
> > +    uint32_t hitmask = 0;
> > +    uint32_t mask = 0;
> > +    struct dp_netdev_pmd_thread *pmd = pmd_handle;
> > +    struct dpif_miniflow_extract_impl *miniflow_funcs;
> > +    uint32_t impl_count = dpif_miniflow_extract_info_get(&miniflow_funcs);
> > +    struct study_stats *stats = get_study_stats();
> > +
> > +    /* Run traffic optimized miniflow_extract to collect the hitmask
> > +     * to be compared after certain packets have been hit to choose
> > +     * the best miniflow_extract version for that traffic. */
> > +    for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> > +        if (miniflow_funcs[i].available) {
> > +            hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
> > +                                                     in_port, pmd_handle);
> > +            stats->impl_hitcount[i] += count_1bits(hitmask);
> > +
> > +            /* If traffic is not classified than we dont overwrite the keys
> > +             * array in minfiflow implementations so its safe to create a
> > +             * mask for all those packets whose miniflow have been created. */
> > +            mask |= hitmask;
> > +        }
> > +    }
> > +    stats->pkt_count += dp_packet_batch_size(packets);
> > +
> > +    /* Choose the best implementation after a minimum packets have been
> > +     * processed. */
> > +    if (stats->pkt_count >= MFEX_MAX_COUNT) {
> > +        uint32_t best_func_index = MFEX_IMPL_START_IDX;
> > +        uint32_t max_hits = 0;
> > +        for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
> > +            if (stats->impl_hitcount[i] > max_hits) {
> > +                max_hits = stats->impl_hitcount[i];
> > +                best_func_index = i;
> > +            }
> > +        }
> > +
> > +        if (max_hits >= MFEX_MIN_HIT_COUNT_FOR_USE) {
> > +            /* Set the implementation to index with max_hits. */
> > +            pmd->miniflow_extract_opt =
> > +                        miniflow_funcs[best_func_index].extract_func;
> > +            VLOG_INFO("MFEX study chose impl %s: (hits %d/%d pkts)\n",
> > +                      miniflow_funcs[best_func_index].name, max_hits,
> > +                      stats->pkt_count);
> 
> We have no idea which PMD the mode is selected for guess we might need to add
> this?
> 
> Maybe we should report the numbers/hits for the other methods, as they might be
> equal, and some might be faster in execution time?

As above, the implementations are sorted in performance order. Performance
here can be known by micro-benchmarks, and developers of such SIMD optimized
code can be expected to know which impl is fastest.

In our current code, the avx512_vbmi_* impls are always before the avx512_*
impls, as the VBMI instruction set allows a faster runtime.

If really desired, we could dump the whole results of the MFEX table for that
PMD thread, however I would expect the results to be noise, and not signal.

I'm happy to discuss, but a bit fearful of adding all sorts of fancy features
that in reality are not going to be useful.

<snip code changes till end of patch>

Regards, -Harry


More information about the dev mailing list