[ovs-dev] [v6 10/11] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

Eelco Chaudron echaudro at redhat.com
Fri Jul 9 09:54:40 UTC 2021



On 8 Jul 2021, at 17:08, Van Haaren, Harry wrote:

>> -----Original Message-----
>> From: Eelco Chaudron <echaudro at redhat.com>
>> Sent: Thursday, July 8, 2021 2:42 PM
>> To: Ferriter, Cian <cian.ferriter at intel.com>
>> Cc: ovs-dev at openvswitch.org; fbl at sysclose.org; i.maximets at ovn.org; 
>> Van
>> Haaren, Harry <harry.van.haaren at intel.com>; Amber, Kumar
>> <kumar.amber at intel.com>; Stokes, Ian <ian.stokes at intel.com>
>> Subject: Re: [v6 10/11] dpif-netdev/mfex: Add AVX512 based optimized 
>> miniflow
>> extract
>>
>> On 6 Jul 2021, at 15:11, Cian Ferriter wrote:
>>
>>> From: Harry van Haaren <harry.van.haaren at intel.com>
>>>
>>> This commit adds AVX512 implementations of miniflow extract.
>>> By using the 64 bytes available in an AVX512 register, it is
>>> possible to convert a packet to a miniflow data-structure in
>>> a small quantity instructions.
>>>
>>> The implementation here probes for Ether()/IP()/UDP() traffic,
>>> and builds the appropriate miniflow data-structure for packets
>>> that match the probe.
>>>
>>> The implementation here is auto-validated by the miniflow
>>> extract autovalidator, hence its correctness can be easily
>>> tested and verified.
>>>
>>> Note that this commit is designed to easily allow addition of new
>>> traffic profiles in a scalable way, without code duplication for
>>> each traffic profile.
>>>
>>> Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
>>>
>>> ---
>>>
>>> v5:
>>
>> <SNIP>
>>
>>> +
>>> +BUILD_ASSERT_DECL((OFFSETOFEND(struct dp_packet, l4_ofs)
>>> +                  - offsetof(struct dp_packet, l2_pad_size)) ==
>>> +                  MEMBER_SIZEOF(struct mfex_profile, dp_pkt_offs));
>>> +
>>> +/* Any change in flow abi should be inluded here otherwise build 
>>> should
>>
>> included
>>
>>> + * fail.
>>> + */
>>> +BUILD_ASSERT_DECL(FLOW_WC_SEQ == 42);
>>
>> I was discussing this assert offline, and some issues were raised as 
>> a lot of people
>> build on top of OVS, and add new protocols, hence changing the 
>> FLOW_WC_SEQ.
>>
>> It would be good to add more details on what to change/check before 
>> you
>> increase the value here. For example, which fields in the in 
>> mfex_profile
>> structure, or what changes to the flow structure require changes in 
>> the AVX
>> code.
>>
>> On top of this, a solution could also be to allow for compile-time 
>> disabling of
>> MFEX for people that do their own hacking of OVS?
>>
>> Thoughts?
>
> I think of the FLOW_WC_SEQ as an "ABI for miniflow" counter.
> If we break the miniflow ABI for a given packet, then all mappings 
> from
> packet data to miniflow need to be updated, or there is inconsistency.
> All mappings includes scalar miniflow_extract(), and potentially 
> optimized
> SIMD implementations, it depends on the change/breakage.
>
> Simply put, if the MFEX autovalidator fails after changes, then the 
> MFEX
> code needs to be updated.

The problem is that not everybody has an AVX512 machine, so it might be 
impossible to run. Even the OVS GitHub hooks will not run the AVX512 
part :)

> If a change is made to the miniflow ABI, the miniflow bits are likely 
> to
> change position (if items are added in the middle of "struct flow").
> Hence, the bit that previously represented IPv4 might now represent
> the newly hacked-in protocol. That's a break, and requires a counter 
> bump.
>
> This is why upstream collaboration is favored across the industry,
> there is overhead/breakage in keeping features/code out-of-tree.

While this is a nice goal, we all know this is not true in real life :( 
There are a lot of people out there that use OVS in the proprietary 
environment, and their changes are not sharable (or the community is not 
willing to include them).

> I feel that we the OVS community should not take steps towards making
> it easier to disable parts of code-bases internal to a project, just 
> so others maintaining
> out-of-tree code might have an easier time hacking in features. 
> Instead I encourage
> all to collaborate and contribute changes to OVS, and avoid the 
> overhead of maintaining
> out-of-tree work. A similar stance is also taken by Linux kernel and 
> DPDK on out-of-tree work.

They want to, but it’s not how it’s in practice :(

> MFEX opts are not enabled by default, so nothing will *break* due to 
> the inclusion of
> this code. If somebody first hacks in out-of-tree changes, breaks 
> things, and then
> blindly enables a specific MFEX impl without testing, then things can 
> break yes.
> But note that running MFEX autovalidation will flag breakage: so even 
> in this corner-case
> nothing will break silently.


I think we are wasting time discussing stuff here, where we should try 
to be inclusive, as not many OVS users/developers might have access to 
AVZ512 hardware.
I would suggest adding some documentation around this. Maybe something 
like this:

/* If the above build assert happens, this means that you might need to 
make
  * some modifications to the AVX512 miniflow extractor code. In 
general, the
  * AVX512 flow extractor code uses hardcoded miniflow->map->bits which 
are
  * defined in the mfex_profile structure as mf_bits. In addition to the
  * hardcoded bits, it also has hardcoded offsets/masks that tell the 
AVX512
  * code how to translate packet data in the required miniflow values. 
These
  * are stored in the mfex_profile structure as store_shuf and 
store_kmsk.
  * See the respective documentation on their usage.
  *
  * If you have made changes to the flow structure, but only additions, 
no
  * re-arranging of the actual members, you might be good to go. To be 
100%
  * sure, if possible, run the AVX512 MFEX autovalidator tests on an 
AVX512
  * enabled machine.
  *
  * If you did make changes to the order, you have to run the 
autovalidator
  * tests on an AVX512 machine, and the errors will help you determine 
what to
  * change.
  *
  * In case your change increased the maximum size of the map, i.e.,
  * UFLOWMAP_UNITS, you need to study the code as it will need some 
rewriting.
  *
  * If you are not using the AVX512 MFEX implementation at all, i.e. 
keeping it
  * to the default scalar implementation, see "ovs-appctl
  * dpif-netdev/miniflow-parser-get", you could ignore this assert, and 
just
  * increase the number.
  */

In addition I think the following assert should also be added to the 
code:

   BUILD_ASSERT_DECL(UFLOWMAP_UNITS == 2);

And the following should use the variable, so it’s also clear where 
it’s used:

   /* Constant data to set in mf.bits and dp_packet data on hit. */
   uint64_t mf_bits[UFLOWMAP_UNITS];

//Eelco


More information about the dev mailing list