[ovs-dev] [PATCH v3 0/6] dpif-netdev: Combine CD and DFC patch for datapath refactor
O Mahony, Billy
billy.o.mahony at intel.com
Fri Jun 22 08:38:51 UTC 2018
I have replicated some of tests scenarios described below and confirm the performance improvements.
I hope to get some time to review the code itself in the next week.
> -----Original Message-----
> From: Wang, Yipeng1
> Sent: Tuesday, May 15, 2018 5:13 PM
> To: dev at openvswitch.org
> Cc: blp at ovn.org; jan.scheurich at ericsson.com; u9012063 at gmail.com; Stokes,
> Ian <ian.stokes at intel.com>; O Mahony, Billy <billy.o.mahony at intel.com>;
> Wang, Yipeng1 <yipeng1.wang at intel.com>; Gobriel, Sameh
> <sameh.gobriel at intel.com>; Tai, Charlie <charlie.tai at intel.com>
> Subject: [PATCH v3 0/6] dpif-netdev: Combine CD and DFC patch for datapath
> This patch set is the V3 implementation to combine the CD and DFC design.
> Both patches intend to refactor datapath to avoid costly sequential subtable
> CD and DFC patch sets:
> CD: [PATCH v2 0/5] dpif-netdev: Cuckoo-Distributor implementation
> DFC: [PATCH] dpif-netdev: Refactor datapath flow cache
> 1. The first commit is a rebase of Jan Scheurich's patch of [PATCH] dpif-netdev:
> Refactor datapath flow cache with a couple of bug fixes. The patch include EMC
> improvements together with the new DFC structure.
> 2. The second commit is to incorporate CD's way-associative design into DFC to
> improve the hit rate.
> 3. The third commit is to change the distributor to cache an index of flow_table
> entry to improve memory efficiency.
> 4. The fourth commit is to split DFC into EMC and SMC for better organization.
> Also the lookup function is rewritten to do batching processing.
> 5. The fifth commit is to automatically turn off DFC/CD when there is a very
> large number of megaflows.
> 6. The sixth commit modifies a unit test to avoid failure.
> We did a phy-2-phy test to evaluate the performance improvement with this
> patch set. The traffic pattern we use is based on Billy's original TREX script:
> We augment the script to generate power law distribution of flows to have
> different bandwidth and to access different subtables.
> For example, there are n flows each has bandwidth of w, while n/4 flows each
> has bandwidth of 2w, while n/9 flows each has bandwidth of 3w, and so on
> (Power Law distribution, y = Cx^-2). For subtable, the second most accessed
> subtable has
> 1/2 accesses of the first most accessed subtable, the third most accessed
> subtable has 1/3 accesses of the first most accessed subtable and so on (Zipf's
> The CD/DFC size is 1 million entries. The speedup results are listed below:
> #flow #subtable speedup
> 1000 1 1.015523746
> 1000 5 1.032199838
> 1000 10 1.050814738
> 1000 20 1.081794454
> 10000 1 1.201704118
> 10000 5 1.31634144
> 10000 10 1.402493331
> 10000 20 1.531133279
> 100000 1 1.11088487
> 100000 5 1.458748559
> 100000 10 1.683044348
> 100000 20 2.034441401
> 1000000 1 1.004339563
> 1000000 5 1.256745291
> 1000000 10 1.444329892
> 1000000 20 1.666275853
> Both flow traffic and subtable accesses are skewed. The table shows the total
> The most performance improvement happens when flow can totally hit DFC/CD
> thus bypass the megaflow cache, and when there are multiple subtables.
> When all flows hit EMC or flow count is larger than CD/DFC size, the
> performance improvement reduces.
> 1. Add the 5th commit: it is to automatically turn off DFC/CD when the number
> of megaflow is larger than 2^16 since we use 16bits in the distributor to index
> 2. Add the 6th commit: since the pmd stats now print out the DFC/CD statistics
> one of the unit test has mismatch output. This commit fixed this issue.
> 3. In first commit, the char key array is changed to uint64_t key
> because of the OSX compilation warning that char array is 1 byte alligned while
> 8-byte alignment is required during type conversion.
> 1. Add comment and follow code style for cmap code (Ben's comment) 2. Fix a
> bug in the first commit that fails multiple unit tests. Since DFC is
> per PMD not per port, the port mask should be included in rule.
> 3. Added commit 4. This commit separates DFC to be EMC cache and SMC
> match cache) for easier optimization and readability.
> 4. In commit 4, DFC lookup is refactored to do batching lookup.
> 5. Rebase and other minor changes.
> 1. rebase to master head.
> 2. The last commit is totally rewritten to use the flow_table as indirect table.
> The CD/DFC distributor will cache the index of flow_table entry.
> 3. Incorporate commit 2 into commit 1. (Bhanu's comment) 4. Change DFC to be
> always on in commit 1. (Bhanu's comment)
> RFC of this patch set:
> Yipeng Wang (5):
> dpif-netdev: Use way-associative cache
> dpif-netdev: use flow_table as indirect table
> dpif-netdev: Split DFC cache and code optimization
> dpif-netdev: Adaptive turn on/off SMC
> tests: Fix unit test case caused by SMC cache.
> Jan Scheurich (1):
> dpif-netdev: Refactor datapath flow cache
> lib/cmap.c | 73 ++++++++
> lib/cmap.h | 5 +
> lib/dpif-netdev-perf.h | 1 +
> lib/dpif-netdev.c | 449 +++++++++++++++++++++++++++++++++++-------------
> tests/pmd.at | 10 +-
> 5 files changed, 405 insertions(+), 133 deletions(-)
More information about the dev