[ovs-discuss] max mega flow 64k per pmd or per dpcls?

Hui Xiang xianghuir at gmail.com
Mon Jul 3 14:41:29 UTC 2017

Thanks much Bodireddy again! comment inline.

On Mon, Jul 3, 2017 at 5:00 PM, Bodireddy, Bhanuprakash <
bhanuprakash.bodireddy at intel.com> wrote:

> It’s a long weekend in US and will try answering some of your questions in
> Darrell's absence.
> >Why do think having more than 64k per PMD would be optimal ?
> >I originally thought that the bottleneck in classifier because it is
> saturated full
> >so that look up has to be going to flow table, so I think why not just
> increase
> >the dpcls flows per PMD, but seems I am wrong based on your explanation.
> For few use cases much of the bottleneck moves to Classifier when EMC is
> saturated. You may have
> to add more  PMD threads (again this depends on the availability of cores
> in your case.)
> As your initial investigation proved classifier is bottleneck, just
> curious about few things.
>      -  In the 'dpif-netdev/pmd-stats-show' output, what does the ' avg.
> subtable lookups per hit:'  looks like?
>      -  In steady state do 'dpcls_lookup()' top the list of functions with
> 'perf top'.
> Those are great advices, I'll check more.

> >What is your use case(s) ?
> >My usecase might be setup a VBRAS VNF with OVS-DPDK as an NFV normal
> >case, and it requires a good performance, however, OVS-DPDK seems still
> not
> >reach its needs compared with  hardware offloading, we are evaluating VPP
> as
> >well,
> As you mentioned VPP here, It's worth looking at the benchmarks that were
> carried comparing
> OvS and VPP for L3-VPN use case by Intel, Ericsson and was presented in
> OvS Fall conference.
> The slides can be found @ http://openvswitch.org/
> support/ovscon2016/8/1400-gray.pdf.
In above pdf page 12, why does classifier showed a constant throughput with
increasing concurrent L4 flows? shouldn't the performance get degradation
with more subtable look up as you mentioned.

> basically I am looking to find out what's the bottleneck so far in OVS-
> >DPDK (seems in flow look up), and if there are some solutions being
> discussed
> >or working in progress.
> I personally did some investigation in this area. One of the bottlenecks
> in classifier is due to sub-table lookup.
> Murmur hash is used in OvS and it is  recommended enabling intrinsics with
> -march=native/CFLAGS="-msse4.2"  if not done.
> If you have more subtables, the lookups may be taking significant cycles.
> I presume you are using OvS 2.7. Some optimizations
> were done to  improve classifier  performance(subtable ranking, hash
> optimizations).
> If emc_lookup()/emc_insert() show up in top 5 functions taking significant
> cycles, worth disabling EMC as below.
>           'ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-
> prob=0'
Thanks much for your advice.

> >Are you wanting for this number to be larger by default ?
> >I am not sure, I need to understand whether it is good or bad to set it
> larger.
> >Are you wanting for this number to be configurable ?
> >Probably good.
> >
> >BTW, after reading part of DPDK document, it strengthens to decrease to
> copy
> >between cache and memory and get cache hit as much as possible to get
> >fewer cpu cycles to fetch data, but now I am totally lost on how does OVS-
> >DPDK emc and classifier map to the LLC.
> I didn't get your question here. PMD is like any other thread and has EMC
> and a classifier per ingress port.
> The EMC,  classifier subtables and other data structures will make it to
> LLC when accessed.

> As already mentioned using RDT Cache Allocation Technology(CAT), one can
> assign cache ways to high priority threads
> https://software.intel.com/en-us/articles/introduction-to-
> cache-allocation-technology
> - Bhanuprakash.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170703/1b3245ac/attachment.html>

More information about the discuss mailing list