[ovs-dev] [PATCH] Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure."
Bodireddy, Bhanuprakash
bhanuprakash.bodireddy at intel.com
Tue Nov 28 15:42:51 UTC 2017
>
>> Analyzing the memory layout with gdb for large structures is time consuming
>and not usually recommended.
>> I would suggest using Poke-a-hole(pahole) and that helps to understand
>and fix the structures in no time.
>> With pahole it's going to be lot easier to work with large structures
>especially.
>
>Thanks for the pointer. I'll have a look at pahole.
>It doesn't affect my reasoning against optimizing the compactification of struct
>dp_netdev_pmd_thread, though.
>
>> >Finally, even for x86 there is not even a performance improvement. I
>> >re-ran our standard L3VPN over VXLAN performance PVP test on master
>> >and with Ilya's revert patch:
>> >
>> >Flows master reverted
>> >8, 4.46 4.48
>> >100, 4.27 4.29
>> >1000, 4.07 4.07
>> >2000, 3.68 3.68
>> >5000, 3.03 3.03
>> >10000, 2.76 2.77
>> >20000, 2.64 2.65
>> >50000, 2.60 2.61
>> >100000, 2.60 2.61
>> >500000, 2.60 2.61
>>
>> What are the CFLAGS in this case, as they seem to make difference. I
>> have added my finding here for a different patch targeted at performance
>>
>> https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>November/341270.ht
>> ml
>
>I'm compiling with "-O3 -msse4.2" to be in line with production deployments
>of OVS-DPDK that need to run on a wider family of Xeon generations.
Thanks for this. AFAIK by specifying '-msse4.2' alone, you don't get to use the builtin_popcnt().
One way to enable is to use '-mpopcnt' in CFLAGS or build with march=native.
(This is slightly out of context for this thread and JFYI. Ignore this if you only want to use intrinsics and not builtin popcnt.)
>
>>
>> Patches to consider when testing your use case:
>> Xzalloc_cachline: https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>November/341231.html
>> (If using output batching) https://mail.openvswitch.org/pipermail/ovs-
>dev/2017-November/341230.html
>
>I didn't use these. Tx batching is not relevant here. And I understand the
>xzalloc_cacheline patch alone does not guarantee that the allocated memory
>is indeed cache line-aligned.
Atleast with POSIX_MEMALIGN, address will be aligned on 64 byte and start at CACHE_LINE_SIZE boundary.
I am yet to check Ben's new patch and test it.
- Bhanuprakash.
>
>Thx, Jan
More information about the dev
mailing list