[ovs-dev] [PATCH v3 00/12] Improve performance of OVS-DPDK classifier.

Tue Nov 15 13:50:35 UTC 2016

>-----Original Message-----
>From: Daniele Di Proietto [mailto:diproiettod at ovn.org]
>Sent: Monday, November 14, 2016 10:45 PM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>
>Cc: dev at openvswitch.org; Jarno Rajahalme <jarno at ovn.org>
>Subject: Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS-DPDK
>classifier.
>
>
>
>2016-11-14 4:10 GMT-08:00 Bodireddy, Bhanuprakash
><bhanuprakash.bodireddy at intel.com>:
>Hello daniele,
>
>Did you get a chance to review v4 of the remaining 4 patches in this
>series?  Also I have sent v5 of patch "dpcls: Use 32 packet batches for
>lookups"  separately based on your comments.
>
>Bhanu Prakash.
>
>Hi Bhanu,
>I merged almost everything to master, with minor style fixes (I had to convert
>some tabs to spaces).
>Thanks for your work!
>The only patch I left out is "cmap: Remove prefetching in cmap_find_batch()."
>With the patch applied (and emc disabled), compared to current master I see:
>* A very small improvement with a single megaflow, single stream of 64 bytes
>UDP packets
>* A more significant drop with 1000 megaflows, 1000 streams of 64 bytes UDP
>packets
>I just remembered that we also have a benchmark for the cmap:
>
>tests/ovstest test-cmap benchmark 2000000 1 0.1 32
>I run it with and without the patch and the patch seems to make the
>benchmark slower (in particular I'm talking about the "batch search:" row)

Thanks Daniele for merging the patches. We did some benchmarks with tens of flows and found no drop in performance.
However its fine to drop this patch if you see a different behavior in your benchmarks.  I don't know that we have a benchmark test for cmap, 
that’s really helpful indeed. 

Regards,
Bhanu Prakash. 

>
>As expected, it appears that prefetching introduces overhead for small cmaps
>(1 or 2 flows) but it makes the performance better with bigger cmaps.
>What do you think?  Have you tried this with bigger flow tables?
>Thanks,
>Daniele
>
>
>>-----Original Message-----
>>From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Bodireddy,
>>Bhanuprakash
>>Sent: Tuesday, October 18, 2016 5:24 PM
>>To: Daniele Di Proietto <diproiettod at ovn.org>
>>Cc: dev at openvswitch.org
>>Subject: Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS-DPDK
>>classifier.
>>
>>Thanks daniele. Will send on the remaining patches with appropriate tags.
>>
>>Regards,
>>Bhanu Prakash.
>>
>>>-----Original Message-----
>>>From: Daniele Di Proietto [mailto:diproiettod at ovn.org]
>>>Sent: Tuesday, October 18, 2016 4:04 AM
>>>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>
>>>Cc: dev at openvswitch.org
>>>Subject: Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS-
>DPDK
>>>classifier.
>>>
>>>Thanks for the series, I applied most of it to master.
>>>I sent some comments on the few remaining patches.
>>>Thanks again,
>>>Daniele
>>>
>>>2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy
>>><bhanuprakash.bodireddy at intel.com>:
>>>This patch series is aimed at improving the performance of OVS-DPDK
>>>dpcls.
>>>
>>>With few thousand flows installed, the EMC becomes inefficient due to
>>>thrashing and the bottleneck moves to the dpcls. In EMC disabled case,
>>>through VTune we found that significant performance degradation is due
>>>to LLC thrashing, memory latency, machine clears and expensive hash
>>>computation.
>>>
>>>This first patch-set improves the dpcls performance by 15% (+1 Mpps)
>>>when EMC is disabled and OVS-DPDK built with CFLAGS="-O2 -g".
>>>
>>>Bhanuprakash Bodireddy (12):
>>>  dpcls: Use 32 packet batches for lookups.
>>>        Comment: ~120k performance throughput improvement.
>>>
>>>  flow: Add comments to mf_get_next_in_map().
>>>        Comment: Add comments to the function.
>>>
>>>  flow: Skip invoking expensive count_1bits() with zero input.
>>>        Comment: ~630k performance throughput improvement.
>>>
>>>  hash: Skip invoking mhash_add__() with zero input.
>>>        Comment: ~150k performance throughput improvement.
>>>
>>>  dpif-netdev: Add comments to dp_netdev_input__().
>>>        Comment: Add comments to the function.
>>>
>>>  cmap: Remove prefetching in cmap_find_batch().
>>>        Comment: ~39k performance throughput improvement.
>>>
>>>  dpif-netdev: Cache align netdev_flow_keys.
>>>        Comment: ~170k performance throughput improvement in EMC
>>>enabled case.
>>>
>>>  dpif-netdev: Reorder elements in dp_netdev_port structure.
>>>  dpif: Reorder elements in dpif_upcall structure.
>>>  ovsdb: Reorder elements in ovsdb_table_schema structure.
>>>  netlink-socket: Reorder elements in nl_dump structure.
>>>  timeval: Reorder elements in clock structure.
>>>        Comment: Reorder memeber variables of the structures to reduce
>>>                 pad bytes and there by the memory footprint.
>>>
>>> lib/cmap.c           |   8 +---
>>> lib/dpif-netdev.c    | 123
>>>+++++++++++++++++++++++----------------------------
>>> lib/dpif.h           |   5 ++-
>>> lib/flow.h           |  47 +++++++++++++++-----
>>> lib/hash.h           |   5 +++
>>> lib/netlink-socket.h |   6 +--
>>> lib/timeval.c        |   4 +-
>>> ovsdb/table.h        |   4 +-
>>> 8 files changed, 111 insertions(+), 91 deletions(-)
>>>
>>>--
>>>2.4.11
>>>
>>>_______________________________________________
>>>dev mailing list
>>>dev at openvswitch.org
>>>http://openvswitch.org/mailman/listinfo/dev
>>
>>_______________________________________________
>>dev mailing list
>>dev at openvswitch.org
>>http://openvswitch.org/mailman/listinfo/dev