[ovs-dev] Discontiguous bit mask in Megaflow

Peng He mailist at yeah.net
Mon Mar 21 01:11:51 UTC 2016


Hi, Ben, 


>We're always excited to improve the performance of OVS, so I hope you
>will pass along your results.
     We did some evaluations on DPDK-based OVS. We use ClassBench[1] to generate 
1K and 10K rules, and also generate synthetic traffic for these rules. We choose to generate 
low locality because we think when the locality is high, the cache works well. 


    We use OVS 2.4 release, with DPDK 2.0. The network card is Intel 82599, 10G. We run 
OVS on Intel Xeon processors (2.2GHz, 4 cores), all the evaluations are performed on a single core. 
All the rules are associated with the actions that forward packets from its input port to 
a fixed output port. The test traffic is one-way. 


    The results are follows:
    
|

ruleset

|

tx rate/port(Mbps)

|

rx rate/port(Mbps)

|
|

fw_1k

|

9990

|

49

|
|

fw_10k

|

9991

|

16

|
|

acl_1k

|

9995

|

207

|
|

acl_10k

|

9994

|

94

|
|

ipc_1k

|

9991

|

81

|
|

ipc_10k

|

9995

|

18

|


    This results show that under the low locality traffic, the performance of OVS is low. We further check the cache miss rate,
it shows that about 50% of the packets miss the first layer of cache and are matched against the second layer, however, very few 
packets are sent to the upcalls. We also check the number of the megaflow tuples, the number is quite large (around 100 ~ 1000 tuples). 


    So we decide to use a trie to prune these tuples and accelerate the performance. That is the place we found the bit mask could be 
discontiguous. We fill these discontiguous bits, and use a fast trie (Tree Bitmap) algorithm on destination IP addresses to prune the tuples. 
The results are as below:


|

rule

|

Rx rate/port(Mbps)

|

speedup

|
|

native_ovs

|

ovs_trie

|
|

acl_1k

|

207

|

582

|

2.81

|
|

acl_10k

|

94

|

245

|

2.61

|
|

fw_1k

|

49

|

196

|

4.00

|
|

fw_10k

|

16

|

113

|

7.06

|
|

ipc_1k

|

81

|

510

|

6.30

|
|

ipc_10k

|

18

|

256

|

14.22

|


    We also check the effect of enlarging the size of the first layer cache. We enlarge the cache size into 32K entries, the results show 
that the performance improvement is limited, for around 30%. 


    Any feedback is welcome. Thank you.
[1] Taylor, David E., and Jonathan S. Turner. "ClassBench: a packet classification benchmark." INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE. Vol. 3. IEEE, 2005.


More information about the dev mailing list