[ovs-discuss] max mega flow 64k per pmd or per dpcls?

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Mon Jul 3 09:00:05 UTC 2017


It’s a long weekend in US and will try answering some of your questions in Darrell's absence.

>Why do think having more than 64k per PMD would be optimal ?
>I originally thought that the bottleneck in classifier because it is saturated full
>so that look up has to be going to flow table, so I think why not just increase
>the dpcls flows per PMD, but seems I am wrong based on your explanation.

For few use cases much of the bottleneck moves to Classifier when EMC is saturated. You may have
to add more  PMD threads (again this depends on the availability of cores in your case.)
As your initial investigation proved classifier is bottleneck, just curious about few things.
     -  In the 'dpif-netdev/pmd-stats-show' output, what does the ' avg. subtable lookups per hit:'  looks like?
     -  In steady state do 'dpcls_lookup()' top the list of functions with 'perf top'.

>What is your use case(s) ?
>My usecase might be setup a VBRAS VNF with OVS-DPDK as an NFV normal
>case, and it requires a good performance, however, OVS-DPDK seems still not
>reach its needs compared with  hardware offloading, we are evaluating VPP as
>well, 
As you mentioned VPP here, It's worth looking at the benchmarks that were carried comparing
OvS and VPP for L3-VPN use case by Intel, Ericsson and was presented in OvS Fall conference.
The slides can be found @ http://openvswitch.org/support/ovscon2016/8/1400-gray.pdf.

basically I am looking to find out what's the bottleneck so far in OVS-
>DPDK (seems in flow look up), and if there are some solutions being discussed
>or working in progress.

I personally did some investigation in this area. One of the bottlenecks in classifier is due to sub-table lookup.
Murmur hash is used in OvS and it is  recommended enabling intrinsics with -march=native/CFLAGS="-msse4.2"  if not done. 
If you have more subtables, the lookups may be taking significant cycles.  I presume you are using OvS 2.7. Some optimizations
were done to  improve classifier  performance(subtable ranking, hash optimizations). 
If emc_lookup()/emc_insert() show up in top 5 functions taking significant cycles, worth disabling EMC as below.
          'ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=0'

>Are you wanting for this number to be larger by default ?
>I am not sure, I need to understand whether it is good or bad to set it larger.
>Are you wanting for this number to be configurable ?
>Probably good.
>
>BTW, after reading part of DPDK document, it strengthens to decrease to copy
>between cache and memory and get cache hit as much as possible to get
>fewer cpu cycles to fetch data, but now I am totally lost on how does OVS-
>DPDK emc and classifier map to the LLC.

I didn't get your question here. PMD is like any other thread and has EMC and a classifier per ingress port.
The EMC,  classifier subtables and other data structures will make it to LLC when accessed. 

As already mentioned using RDT Cache Allocation Technology(CAT), one can assign cache ways to high priority threads
https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology

- Bhanuprakash.



More information about the discuss mailing list