[ovs-dev] [PATCH] packets: Prefetch the packet metadata in cacheline1.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Tue Nov 28 14:24:12 UTC 2017



>-----Original Message-----
>From: Chandran, Sugesh
>Sent: Monday, November 27, 2017 5:58 PM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>; 'Aaron
>Conole' <aconole at redhat.com>
>Cc: 'dev at openvswitch.org' <dev at openvswitch.org>; Ben Pfaff
><blp at ovn.org>
>Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata in
>cacheline1.
>
>Hi Bhanu,
>
>Regards
>_Sugesh
>
>> -----Original Message-----
>> From: Bodireddy, Bhanuprakash
>> Sent: Monday, November 27, 2017 4:35 PM
>> To: 'Aaron Conole' <aconole at redhat.com>
>> Cc: 'dev at openvswitch.org' <dev at openvswitch.org>; Ben Pfaff
>> <blp at ovn.org>; Chandran, Sugesh <sugesh.chandran at intel.com>
>> Subject: RE: [ovs-dev] [PATCH] packets: Prefetch the packet metadata
>> in cacheline1.
>>
>> >>Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com> writes:
>> >>
>> >>> pkt_metadata_prefetch_init() is used to prefetch the packet
>> >>> metadata before initializing the metadata in pkt_metadata_init().
>> >>> This is done for every packet in userspace datapath and is performance
>critical.
>> >>>
>> >>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the
>> >>> metadata part of respective cachelines will be initialized by
>> >>pkt_metadata_init().
>> >>>
>> >>> However in VXLAN case when popping the vxlan header,
>> >>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which
>> >>> zeroes out metadata part of
>> >>> cacheline1 that wasn't prefetched earlier and causes performance
>> >>> degradation.
>> >>>
>> >>> By prefetching cacheline1, 9% performance improvement is observed.
>> >>
>> >>Do we see a degredation in the non-vxlan case?  If not, then I don't
>> >>see any reason not to apply this patch.
>> >
>> >This patch doesn't impact the performance of non-vxlan cases and only
>> >have a positive impact in vxlan case.
>>
>> The commit message claims that the performance improvement was 9%
>with
>> this patch but when Sugesh was checking he wasn't getting that
>> performance improvement on his Haswell.
>>
>> I was chatting to Sugesh this afternoon on this patch and we found
>> some interesting details and much of this boils down to how the OvS is
>> built .( Apart from HW, BIOS settings - TB disabled).
>>
>> The test case here measure the VXLAN de capsulation performance alone
>> for packet sizes of 118 bytes.
>> The OvS CFLAGS and throughput numbers are as below.
>>
>> CFLAGS="-O2"
>>     Master              4.667 Mpps
>>     With Patch       5.045 Mpps
>>
>> CFLAGS="-O2 -msse4.2"
>>     Master              4.710 Mpps
>>     With Patch       5.097 Mpps
>>
>> CFLAGS="-O2 -march=native"
>>     Master              5.072 Mpps
>>     With Patch       5.193 Mpps
>>
>> CFLAGS="-Ofast -march=native"
>>     Master              5.349 Mpps
>>     With Patch       5.378 Mpps
>>
>> This means the performance measurements/claims are difficult to assess
>> and as one can see above with "-Ofast, -march=native"
>> the improvement is insignificant but this is very platform dependent
>> due to "march=native" flag. Also the optimization flags seems to make
>> significant difference.
>[Sugesh] I also tested on my board with same set of configuration and getting
>the same result as yours.
>So this patch offers performance improvement based on the compiler option.
>I am not sure whats the most preferred/used compiler option out there.
>I always build OVS with CFLAGS="-Ofast -march=native" and the patch
>doesn't have a great improvement in it.
>
>I don't mind Acking the patch, if you could re-send the patch with these
>results and options in the commit message.
>Atleast it will offer performance improvement for other build options.

Thanks Sugesh for testing this out. I will send out v2 of this with the information I mentioned in
the earlier mail included in the commit message.

Bhanuprakash.


More information about the dev mailing list