[ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for datapath refactor
O Mahony, Billy
billy.o.mahony at intel.com
Thu Mar 8 10:31:28 UTC 2018
Hi All,
I have run some tests using a more realistic distribution of flows - see below
- to what we normally test with.
This is a phy to phy test with just port forwarding but I think that is the
best way to test the EMC as it avoids noise from other mechanisms e.g. vhost
and dpcls lookup. It uses 64B packets and 1 PMD.
I also ran the tests with the EMC disabled.
Baseline HoM 951cbaf 6/3/18
|| emc-insert-prob 1/100 | emc disabled
offered flows || rxd emc cycles/ | rxd emc cycles
kpps || kpps hits pkt | kpps hits pkt
----------------++--------------------------+-----------------------
14,000 8 || 10830 100% 212 | 7360 0% 311
14,000 100 || 9730 100% 236 | 7370 0% 311
14,000 10000 || 6545 69% 345 | 7370 0% 311
14,000 1000000 || 6202 3% 370 | 7370 0% 311
Combine CD and DFC patch
|| emc-insert-prob 1/100 | emc disabled
offered flows || rxd emc dfc cycles | rxd emc dfc cycles
kpps || kpps hits hits /pkt | hits hits /pkt
----------------++--------------------------+-----------------------
14,000 8 || 10930 100% 0% 210 | 8570 0% 100% 268
14,000 100 || 10220 100% 0% 224 | 7800 0% 100% 294
14,000 10000 || 8000 84% 16% 287 | 6770 0% 100% 339
14,000 1000000 || 5921 7% 65% 387 | 6060 0% 72% 378
In these scenarios the patch gives an advantage at lower numbers of flows but
this advantaged is reversed at very high flow numbers - presumably as the DFC itself
approaches capacity.
Another interesting scenario to test would be the case of many shortlived flows.
1M flows and 200k new flows/sec for example. This real use case was presented at
the last OvS Con https://www.slideshare.net/LF_OpenvSwitch/lfovs17ovsdpdk-for-nfv-go-live-feedback.
Which I'd hope to implement in due course.
Below is some details on the how and why flow distribution was made for these tests.
Regards,
Billy
All caches are designed on the assumption that in the real world that access
requests are not uniformly distributed. By design they are used to improve
performance only in situations where some items are accessed more frequently
than others. If this assumption does not hold then the use a cache actually
degrades performance.
Currently we test the EMC with one of two scenarios both of which break the
above assumption:
1) Round Robin 'Distribution':
The TG sends a packet from flow#1 then flow#2 up to flow#N then back to
flow#1 again.
Testing with this case gives results that under-state the benefit of the
EMC to the maximum extent possible. By sending packet from every other flow in
between every two packets from any given flow the chances of the flow having
been evicted in the interim between the two packets are maximized. If a tester
were to intentionally design a test to understate the benefits of the EMC it
would be a round-robin flow distribution.
2) Uniform Distribution:
The TG randomly selects the next packet to send from all the configured flows.
Testing with this case gives results that under-state the benefit of the
EMC. Testing with this case gives results that under-state the benefit
of the EMC. As each flow is equally likely to occur then unless the number of
flows is less than the number of EMC entries, there are many more flows than
is likely no benefit from using the EMC.
By testing only with these flow distributions that break the fundamental
assumptions indicating the use of an EMC, we are consistently under-stating the
benefits of flow-caching.
A More Realistic Distribution:
A more realistic distribution is almost certainly some kind of power-law or
Pareto distribution. In this kind of distribution the majority of packets
belong to a small number of flows. Think of the classic '80/20' rule. Given a
power-law distribution if we rank all the flows by their packets-per-second we
see that flow with the Nth most packets-per-second is has a rate that is some
fraction of the N-1th flow for all N. This kind of distribution is what is
seen in the natural world regarding the distribution of ranking of word
occurrence in a language, the population ranks of cities in various countries,
corporation sizes, income rankings, ranks of number of people watching the same
TV channel, and so on. [https://en.wikipedia.org/wiki/Zipf%27s_law].
For example using a Zipfian distribution with k=1 as the model for pps for
flows the flow distribution would look like this (y-axis is not linear):
1000| *
500| * *
333| * * *
PPS 250| * * * *
..
10| * * * * *
+------------...-------------
1 2 3 4 ... 100
Flow (PPS Ranking)
Trex software-based traffic generator with a small amount of additional
scripting does allow an approximation of such a distribution (specifically the
limitation is that instead of pps rate for each individual flow a bucket of
flows much share the same pps value - for example below all the flows are
distributed across 8 streams to appoximate this kind of power-law
distribution).
For example:
# start 14000kpps 10000flows
Each stream has 14000/8 kpps but has a different number of flows. Each stream
has about half the number of flows as the previous one.
# flows per stream: [5019, 2509, 1254, 627, 313, 156, 78, 39] total: 9995
The actual number of flows per stream is a close approximation of the precise
distribution - usually it's out by just one or two. This is done to allow use
of prime numbers for the ranges of the elements of the 5-tuple members. For
example incrementing over 29 ip address and 173 port numbers means 29*173=5017
unique flows as 29 and 173 are prime. Use of non-primes would result in
aliasing of the combinations resulting in much less unique flows.
# stream#: 0 req. #flows: 5019 tuple_spans: [29, 173] diff: 2
# stream#: 1 req. #flows: 2509 tuple_spans: [13, 193] diff: 0
# stream#: 2 req. #flows: 1254 tuple_spans: [5, 251] diff: 1
# stream#: 3 req. #flows: 627 tuple_spans: [313, 2] diff: 1
# stream#: 4 req. #flows: 313 tuple_spans: [1, 313] diff: 0
# stream#: 5 req. #flows: 156 tuple_spans: [1, 157] diff: 1
# stream#: 6 req. #flows: 78 tuple_spans: [1, 79] diff: 1
# stream#: 7 req. #flows: 39 tuple_spans: [3, 13] diff: 0
# stream#: 0 src: 16.0.0.0 - 16.0.0.28 : 0 - 172 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 1 src: 16.0.0.29 - 16.0.0.41 : 173 - 365 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 2 src: 16.0.0.42 - 16.0.0.46 : 366 - 616 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 3 src: 16.0.0.47 - 16.0.1.103 : 617 - 618 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 4 src: 16.0.1.104 - 16.0.1.104 : 619 - 931 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 5 src: 16.0.1.105 - 16.0.1.105 : 932 - 1088 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 6 src: 16.0.1.106 - 16.0.1.106 : 1089 - 1167 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 7 src: 16.0.1.107 - 16.0.1.109 : 1168 - 1180 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> bounces at openvswitch.org] On Behalf Of yipeng wang
> Sent: Tuesday, March 6, 2018 7:23 PM
> To: dev at openvswitch.org; Wang, Yipeng1 <yipeng1.wang at intel.com>;
> jan.scheurich at ericsson.com; Bodireddy, Bhanuprakash
> <bhanuprakash.bodireddy at intel.com>; u9012063 at gmail.com
> Cc: Tai, Charlie <charlie.tai at intel.com>
> Subject: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for
> datapath refactor
>
> This patch set is the V1 implementation to combine the CD and DFC design.
> Both patches intend to refactor datapath to avoid costly sequential subtable
> search.
>
> CD and DFC patch sets:
> CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor implementation
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html
>
> DFC: [PATCH] dpif-netdev: Refactor datapath flow cache
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html
>
> The first commit is a rebase of Jan Scheurich's patch of [PATCH] dpif-netdev:
> Refactor datapath flow cache
>
> The second commit is to incorporate CD's way-associative design into DFC to
> improve the hit rate.
>
> The third commit is to change the distributor to cache an index of flow_table
> entry to improve memory efficiency.
>
>
> RFC of this patch set:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html
>
>
>
> RFC->V1:
> 1. rebase to master head.
> 2. The last commit is totally rewritten to use the flow_table as indirect table.
> The CD/DFC distributor will cache the index of flow_table entry.
> 3. Incorporate commit 2 into commit 1. (Bhanu's comment) 4. Change DFC to be
> always on in commit 1. (Bhanu's comment)
>
>
> Yipeng Wang (2):
> dpif-netdev: Use way-associative cache
> use flow_table as indirect table
>
> Jan Scheurich (1):
> dpif-netdev: Refactor datapath flow cache
>
> lib/cmap.c | 62 +++++++++
> lib/cmap.h | 5 +
> lib/dpif-netdev-perf.h | 1 +
> lib/dpif-netdev.c | 359 +++++++++++++++++++++++++++++++++----------------
> 4 files changed, 310 insertions(+), 117 deletions(-)
>
> --
> 2.7.4
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list