[ovs-dev] [PATCH] Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure."

Tue Nov 28 15:42:51 UTC 2017

>
>> Analyzing the memory layout with gdb for large structures is time consuming
>and not usually recommended.
>> I would suggest using Poke-a-hole(pahole) and that helps to understand
>and fix the structures in no time.
>> With pahole it's going to be lot easier to work with large structures
>especially.
>
>Thanks for the pointer. I'll have a look at pahole.
>It doesn't affect my reasoning against optimizing the compactification of struct
>dp_netdev_pmd_thread, though.
>
>> >Finally, even for x86 there is not even a performance improvement. I
>> >re-ran our standard L3VPN over VXLAN performance PVP test on master
>> >and with Ilya's revert patch:
>> >
>> >Flows   master  reverted
>> >8,      4.46    4.48
>> >100,    4.27    4.29
>> >1000,   4.07    4.07
>> >2000,   3.68    3.68
>> >5000,   3.03    3.03
>> >10000,  2.76    2.77
>> >20000,  2.64    2.65
>> >50000,  2.60    2.61
>> >100000, 2.60    2.61
>> >500000, 2.60    2.61
>>
>> What are the  CFLAGS in this case, as they seem to make difference. I
>> have added my finding here for a different patch targeted at performance
>>
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>November/341270.ht
>> ml
>
>I'm compiling with "-O3 -msse4.2" to be in line with production deployments
>of OVS-DPDK that need to run on a wider family of Xeon generations.

Thanks for this. AFAIK by specifying  '-msse4.2' alone, you don't get to use the builtin_popcnt().
One way to enable is to use '-mpopcnt'   in CFLAGS or build with march=native.

(This is slightly out of context for this thread and JFYI. Ignore this if you only want to use intrinsics and not builtin popcnt.)

>
>>
>> Patches to consider when testing your use case:
>>      Xzalloc_cachline:  https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>November/341231.html
>>      (If using output batching)      https://mail.openvswitch.org/pipermail/ovs-
>dev/2017-November/341230.html
>
>I didn't use these. Tx batching is not relevant here. And I understand the
>xzalloc_cacheline patch alone does not guarantee that the allocated memory
>is indeed cache line-aligned.

Atleast with POSIX_MEMALIGN, address will be aligned on 64 byte and start at CACHE_LINE_SIZE boundary.
I am yet to check Ben's new patch and test it. 

- Bhanuprakash.

>
>Thx, Jan