[ovs-dev] [PATCH RFC 5/5] dpif-netdev: Prefetch the cacheline having the cycle stats.
Ilya Maximets
i.maximets at samsung.com
Thu Dec 7 14:04:10 UTC 2017
On 05.12.2017 18:11, Bodireddy, Bhanuprakash wrote:
>>
>>> Prefetch the cacheline having the cycle stats so that we can speed up
>>> the cycles_count_start() and cycles_count_intermediate().
>>
>> Do you have any performance results?
>
> I don’t have nos. for this patch alone. I was testing the overall throughput along with other patches (that were *not* part of this RFC series) to verify performance improvements. I will include in commit log when I do for individual patches.
>
> BTW, I usually look at the % of total instructions getting retired, cycles spent in front and back-end for the functions to see if prefetching does improve/degrade performance.
>
> - Bhanuprakash.
>
>>
>>>
>>> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at
>>> intel.com>
>>> ---
>>> lib/dpif-netdev.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>> b74b5d7..ab13d83 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -576,7 +576,7 @@ struct dp_netdev_pmd_thread {
>>> struct ovs_mutex flow_mutex;
>>> /* 8 pad bytes. */
>>> );
>>> - PADDED_MEMBERS(CACHE_LINE_SIZE,
>>> + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE,
>> cachelineC,
>>> struct cmap flow_table OVS_GUARDED; /* Flow table. */
>>>
>>> /* One classifier per in_port polled by the pmd */ @@ -4082,6
>>> +4082,7 @@ reload:
>>> lc = UINT_MAX;
>>> }
>>>
>>> + OVS_PREFETCH_CACHE(&pmd->cachelineC, OPCH_HTW);
How does prefetch just before the infinite loop should improve performance?
I didn't test that, but IMHO, this should have zero impact.
>>> cycles_count_start(pmd);
>>> for (;;) {
>>> for (i = 0; i < poll_cnt; i++) {
>>> --
>>> 2.4.11
More information about the dev
mailing list