[ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best mfex function

Eelco Chaudron echaudro at redhat.com
Wed Jun 30 09:18:09 UTC 2021



On 29 Jun 2021, at 18:32, Van Haaren, Harry wrote:

>> -----Original Message-----
>> From: dev <ovs-dev-bounces at openvswitch.org> On Behalf Of Eelco Chaudron
>> Sent: Tuesday, June 29, 2021 1:38 PM
>> To: Amber, Kumar <kumar.amber at intel.com>
>> Cc: dev at openvswitch.org; i.maximets at ovn.org
>> Subject: Re: [ovs-dev] [v4 03/12] dpif-netdev: Add study function to select the best
>> mfex function
>>
>> More comments below. FYI I’m only reviewing right now, no testing.
>
> Sure, thanks for reviews.
>
>> On 17 Jun 2021, at 18:27, Kumar Amber wrote:
>
> <snip patch commit and some code>
>
>>> +/* Allocate per thread PMD pointer space for study_stats. */
>>> +static inline struct study_stats *
>>> +get_study_stats(void)
>>> +{
>>> +    struct study_stats *stats = study_stats_get();
>>> +    if (OVS_UNLIKELY(!stats)) {
>>> +       stats = xzalloc(sizeof *stats);
>>> +       study_stats_set_unsafe(stats);
>>> +    }
>>> +    return stats;
>>> +}
>>> +
>>
>> Just got a mind-meld with the code, and realized that the function might be different
>> per PMD thread due to this auto mode (and autovalidator mode in the previous
>> patch).
>>
>> This makes it only stronger that we need a way to see the currently selected mode,
>> and not per datapath, but per PMD per datapath!
>
> Study depends on the traffic pattern, so yes you're correct that it depends.
> The study command was added after community suggested user-experience
> would improve if the user doesn't have to provide an exact miniflow profile name.
>
> Study studies the traffic running on that PMD, compares all MFEX impls, and prints out
> hits. It selects the _first_ implementation that surpasses the threshold of packets.
>
> Users are free to use the more specific names of MFEX impls instead of "study"
> for fine-grained control over the MFEX impl in use, e.g.
>
> ovs-appctl dpif-netdev/miniflow-parser-set avx512_vbmi_ipv4_udp
>
>> Do we also need a way to set this per PMD?
>
> I don't feel there is real value here, but we could investigate adding an
> optional parameter to the command indicating a PMD thread IDX to set?
> We have access to "pmd->core_id" in our set() function, so limiting changes
> to a specific PMD thread can be done ~ easily... but is it really required?
>
> Perfect is the enemy of good... I'd prefer focus on getting existing code changes merged,
> and add additional (optional) parameters in future if deemed useful in real world testing?

See Flavio’s reply, as those were the concerns same concerns I thought of.

>>> +uint32_t
>>> +mfex_study_traffic(struct dp_packet_batch *packets,
>>> +                   struct netdev_flow_key *keys,
>>> +                   uint32_t keys_size, odp_port_t in_port,
>>> +                   void *pmd_handle)
>>> +{
>>> +    uint32_t hitmask = 0;
>>> +    uint32_t mask = 0;
>>> +    struct dp_netdev_pmd_thread *pmd = pmd_handle;
>>> +    struct dpif_miniflow_extract_impl *miniflow_funcs;
>>> +    uint32_t impl_count = dpif_miniflow_extract_info_get(&miniflow_funcs);
>>> +    struct study_stats *stats = get_study_stats();
>>> +
>>> +    /* Run traffic optimized miniflow_extract to collect the hitmask
>>> +     * to be compared after certain packets have been hit to choose
>>> +     * the best miniflow_extract version for that traffic. */
>>> +    for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
>>> +        if (miniflow_funcs[i].available) {
>>> +            hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
>>> +                                                     in_port, pmd_handle);
>>> +            stats->impl_hitcount[i] += count_1bits(hitmask);
>>> +
>>> +            /* If traffic is not classified than we dont overwrite the keys
>>> +             * array in minfiflow implementations so its safe to create a
>>> +             * mask for all those packets whose miniflow have been created. */
>>> +            mask |= hitmask;
>>> +        }
>>> +    }
>>> +    stats->pkt_count += dp_packet_batch_size(packets);
>>> +
>>> +    /* Choose the best implementation after a minimum packets have been
>>> +     * processed. */
>>> +    if (stats->pkt_count >= MFEX_MAX_COUNT) {
>>> +        uint32_t best_func_index = MFEX_IMPL_START_IDX;
>>> +        uint32_t max_hits = 0;
>>> +        for (int i = MFEX_IMPL_START_IDX; i < impl_count; i++) {
>>> +            if (stats->impl_hitcount[i] > max_hits) {
>>> +                max_hits = stats->impl_hitcount[i];
>>> +                best_func_index = i;
>>> +            }
>>> +        }
>>> +
>>> +        if (max_hits >= MFEX_MIN_HIT_COUNT_FOR_USE) {
>>> +            /* Set the implementation to index with max_hits. */
>>> +            pmd->miniflow_extract_opt =
>>> +                        miniflow_funcs[best_func_index].extract_func;
>>> +            VLOG_INFO("MFEX study chose impl %s: (hits %d/%d pkts)\n",
>>> +                      miniflow_funcs[best_func_index].name, max_hits,
>>> +                      stats->pkt_count);
>>
>> We have no idea which PMD the mode is selected for guess we might need to add
>> this?
>>
>> Maybe we should report the numbers/hits for the other methods, as they might be
>> equal, and some might be faster in execution time?
>
> As above, the implementations are sorted in performance order. Performance
> here can be known by micro-benchmarks, and developers of such SIMD optimized
> code can be expected to know which impl is fastest.

Don’t think we can, as it’s not documented in the code, and some one can just add his own, and has no clue about the existing ones.

> In our current code, the avx512_vbmi_* impls are always before the avx512_*
> impls, as the VBMI instruction set allows a faster runtime.

Guess we need some documentation in the developer's section on how to add processor optimized functions, and how to benchmark them (and maybe some benchmark data for the current implementations).
Also, someone can write a sloppy avx512_vbmi* function that might be slower than an avx512_*, right?

> If really desired, we could dump the whole results of the MFEX table for that
> PMD thread, however I would expect the results to be noise, and not signal.

What about dumping it as debug messages, so in case we would like to see it (either for development, or production case), we can still enable it?


> I'm happy to discuss, but a bit fearful of adding all sorts of fancy features
> that in reality are not going to be useful.
>
> <snip code changes till end of patch>
>
> Regards, -Harry



More information about the dev mailing list