[ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for datapath refactor

Thu Mar 8 10:31:28 UTC 2018

Hi All,

I have run some tests using a more realistic distribution of flows - see below
- to what we normally test with.

This is a phy to phy test with just port forwarding but I think that is the 
best way to test the EMC as it avoids noise from other mechanisms e.g. vhost
and dpcls lookup. It uses 64B packets and 1 PMD.

I also ran the tests with the EMC disabled.

                             Baseline HoM 951cbaf 6/3/18
                ||   emc-insert-prob 1/100  |     emc disabled
offered   flows ||   rxd    emc    cycles/  |  rxd  emc       cycles
kpps            ||  kpps   hits       pkt   | kpps hits          pkt
----------------++--------------------------+-----------------------
14,000        8 || 10830   100%       212   | 7360   0%          311
14,000      100 ||  9730   100%       236   | 7370   0%          311
14,000    10000 ||  6545    69%       345   | 7370   0%          311
14,000  1000000 ||  6202     3%       370   | 7370   0%          311

                                 Combine CD and DFC patch
                ||  emc-insert-prob 1/100   |     emc disabled
offered   flows ||  rxd    emc  dfc  cycles | rxd   emc   dfc cycles
kpps            || kpps   hits hits    /pkt |      hits  hits   /pkt
----------------++--------------------------+-----------------------
14,000        8 || 10930  100%   0%     210 | 8570   0%  100%    268
14,000      100 || 10220  100%   0%     224 | 7800   0%  100%    294
14,000    10000 || 8000    84%  16%     287 | 6770   0%  100%    339
14,000  1000000 || 5921     7%  65%     387 | 6060   0%   72%    378

In these scenarios the patch gives an advantage at lower numbers of flows but
this advantaged is reversed at very high flow numbers - presumably as the DFC itself
approaches capacity.

Another interesting scenario to test would be the case of many shortlived flows.
1M flows and 200k new flows/sec for example. This real use case was presented at
the last OvS Con https://www.slideshare.net/LF_OpenvSwitch/lfovs17ovsdpdk-for-nfv-go-live-feedback.
Which I'd hope to implement in due course.

Below is some details on the how and why flow distribution was made for these tests.

Regards,
Billy

All caches are designed on the assumption that in the real world that access
requests are not uniformly distributed.  By design they are used to improve
performance only in situations where some items are accessed more frequently
than others.  If this assumption does not hold then the use a cache actually
degrades performance.

Currently we test the EMC with one of two scenarios both of which break the
above assumption:

1) Round Robin 'Distribution':

    The TG sends a packet from flow#1 then flow#2 up to flow#N then back to
    flow#1 again.

    Testing with this case gives results that under-state the benefit of the
    EMC to the maximum extent possible.  By sending packet from every other flow in
    between every two packets from any given flow the chances of the flow having
    been evicted in the interim between the two packets are maximized.  If a tester
    were to intentionally design a test to understate the benefits of the EMC it
    would be a round-robin flow distribution.

2) Uniform Distribution:

    The TG randomly selects the next packet to send from all the configured flows.

    Testing with this case gives results that under-state the benefit of the
    EMC.  Testing with this case gives results that under-state the benefit
    of the EMC.  As each flow is equally likely to occur then unless the number of 
    flows is less than the number of EMC entries, there are many more flows than
    is likely no benefit from using the EMC.

By testing only with these flow distributions that break the fundamental
assumptions indicating the use of an EMC, we are consistently under-stating the
benefits of flow-caching.

A More Realistic Distribution:

A more realistic distribution is almost certainly some kind of power-law or
Pareto distribution.  In this kind of distribution the majority of packets
belong to a small number of flows.  Think of the classic '80/20' rule.  Given a
power-law distribution if we rank all the flows by their packets-per-second we
see that flow with the Nth most packets-per-second is has a rate that is some
fraction of the N-1th flow for all N.  This kind of distribution is what is
seen in the natural world regarding the distribution of ranking of word
occurrence in a language,  the population ranks of cities in various countries,
corporation sizes, income rankings, ranks of number of people watching the same
TV channel, and so on. [https://en.wikipedia.org/wiki/Zipf%27s_law].

For example using a Zipfian distribution with k=1 as the model for pps for
flows the flow distribution would look like this (y-axis is not linear):

        1000| *
         500| *  *
         333| *  *  *
PPS      250| *  *  *  *
           ..
          10| *  *  *  *       *
            +------------...-------------
              1  2  3  4 ...  100
              Flow (PPS Ranking)

Trex software-based traffic generator with a small amount of additional
scripting does allow an approximation of such a distribution (specifically the
limitation is that instead of pps rate for each individual flow a bucket of
flows much share the same pps value - for example below all the flows are
distributed across 8 streams to appoximate this kind of power-law
distribution).

For example: 

# start 14000kpps 10000flows
Each stream has 14000/8 kpps but has a different number of flows. Each stream
has about half the number of flows as the previous one.

# flows per stream: [5019, 2509, 1254, 627, 313, 156, 78, 39] total: 9995

The actual number of flows per stream is a close approximation of the precise
distribution - usually it's out by just one or two. This is done to allow use
of prime numbers for the ranges of the elements of the 5-tuple members. For
example incrementing over 29 ip address and 173 port numbers means 29*173=5017
unique flows as 29 and 173 are prime. Use of non-primes would result in
aliasing of the combinations resulting in much less unique flows.

# stream#: 0 req. #flows: 5019 tuple_spans: [29, 173] diff:  2
# stream#: 1 req. #flows: 2509 tuple_spans: [13, 193] diff:  0
# stream#: 2 req. #flows: 1254 tuple_spans: [5, 251] diff:  1
# stream#: 3 req. #flows: 627 tuple_spans: [313, 2] diff:  1
# stream#: 4 req. #flows: 313 tuple_spans: [1, 313] diff:  0
# stream#: 5 req. #flows: 156 tuple_spans: [1, 157] diff:  1
# stream#: 6 req. #flows: 78 tuple_spans: [1, 79] diff:  1
# stream#: 7 req. #flows: 39 tuple_spans: [3, 13] diff:  0
# stream#: 0 src: 16.0.0.0 - 16.0.0.28 : 0 - 172 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 1 src: 16.0.0.29 - 16.0.0.41 : 173 - 365 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 2 src: 16.0.0.42 - 16.0.0.46 : 366 - 616 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 3 src: 16.0.0.47 - 16.0.1.103 : 617 - 618 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 4 src: 16.0.1.104 - 16.0.1.104 : 619 - 931 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 5 src: 16.0.1.105 - 16.0.1.105 : 932 - 1088 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 6 src: 16.0.1.106 - 16.0.1.106 : 1089 - 1167 dst 48.0.0.0 - 48.0.0.1 : 0 - 1
# stream#: 7 src: 16.0.1.107 - 16.0.1.109 : 1168 - 1180 dst 48.0.0.0 - 48.0.0.1 : 0 - 1

> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> bounces at openvswitch.org] On Behalf Of yipeng wang
> Sent: Tuesday, March 6, 2018 7:23 PM
> To: dev at openvswitch.org; Wang, Yipeng1 <yipeng1.wang at intel.com>;
> jan.scheurich at ericsson.com; Bodireddy, Bhanuprakash
> <bhanuprakash.bodireddy at intel.com>; u9012063 at gmail.com
> Cc: Tai, Charlie <charlie.tai at intel.com>
> Subject: [ovs-dev] [PATCH 0/3] dpif-netdev: Combine CD and DFC patch for
> datapath refactor
> 
> This patch set is the V1 implementation to combine the CD and DFC design.
> Both patches intend to refactor datapath to avoid costly sequential subtable
> search.
> 
> CD and DFC patch sets:
> CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor	implementation
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-October/340305.html
> 
> DFC: [PATCH] dpif-netdev: Refactor datapath flow cache
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341066.html
> 
> The first commit is a rebase of Jan Scheurich's patch of [PATCH] dpif-netdev:
> Refactor datapath flow cache
> 
> The second commit is to incorporate CD's way-associative design into DFC to
> improve the hit rate.
> 
> The third commit is to change the distributor to cache an index of flow_table
> entry to improve memory efficiency.
> 
> 
> RFC of this patch set:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343411.html
> 
> 
> 
> RFC->V1:
> 1. rebase to master head.
> 2. The last commit is totally rewritten to use the flow_table as indirect table.
>    The CD/DFC distributor will cache the index of flow_table entry.
> 3. Incorporate commit 2 into commit 1. (Bhanu's comment) 4. Change DFC to be
> always on in commit 1. (Bhanu's comment)
> 
> 
> Yipeng Wang (2):
>   dpif-netdev: Use way-associative cache
>   use flow_table as indirect table
> 
> Jan Scheurich (1):
>   dpif-netdev: Refactor datapath flow cache
> 
>  lib/cmap.c             |  62 +++++++++
>  lib/cmap.h             |   5 +
>  lib/dpif-netdev-perf.h |   1 +
>  lib/dpif-netdev.c      | 359 +++++++++++++++++++++++++++++++++----------------
>  4 files changed, 310 insertions(+), 117 deletions(-)
> 
> --
> 2.7.4
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev