[ovs-dev] [PATCH] packets: Prefetch the packet metadata in cacheline1.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Mon Nov 27 16:35:24 UTC 2017


>>Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com> writes:
>>
>>> pkt_metadata_prefetch_init() is used to prefetch the packet metadata
>>> before initializing the metadata in pkt_metadata_init(). This is done
>>> for every packet in userspace datapath and is performance critical.
>>>
>>> Commit 99fc16c0 prefetches only cachline0 and cacheline2 as the
>>> metadata part of respective cachelines will be initialized by
>>pkt_metadata_init().
>>>
>>> However in VXLAN case when popping the vxlan header,
>>> netdev_vxlan_pop_header() invokes pkt_metadata_init_tnl() which
>>> zeroes out metadata part of
>>> cacheline1 that wasn't prefetched earlier and causes performance
>>> degradation.
>>>
>>> By prefetching cacheline1, 9% performance improvement is observed.
>>
>>Do we see a degredation in the non-vxlan case?  If not, then I don't
>>see any reason not to apply this patch.
>
>This patch doesn't impact the performance of non-vxlan cases and only have a
>positive impact in vxlan case.

The commit message claims that the performance improvement was 9% with this patch
but when Sugesh was checking he wasn't getting that performance improvement on his Haswell.

I was chatting to Sugesh this afternoon on this patch and we found some interesting details and much
of this boils down to how the OvS is built .( Apart from HW, BIOS settings - TB disabled).

The test case here measure the VXLAN de capsulation performance alone for packet sizes of 118 bytes.
The OvS CFLAGS and throughput numbers are as below.

CFLAGS="-O2"
    Master              4.667 Mpps  
    With Patch       5.045 Mpps

CFLAGS="-O2 -msse4.2"
    Master              4.710 Mpps
    With Patch       5.097 Mpps

CFLAGS="-O2 -march=native"
    Master              5.072 Mpps
    With Patch       5.193 Mpps

CFLAGS="-Ofast -march=native"
    Master              5.349 Mpps
    With Patch       5.378 Mpps

This means the performance measurements/claims are difficult to assess and as one can see above with "-Ofast, -march=native"
the improvement is insignificant but this is very platform dependent due to "march=native" flag. Also the optimization flags seems to
make significant difference.

- Bhanuprakash.


More information about the dev mailing list