[ovs-dev] criteria/benchmarks for an improved userspace datapath classifier?

Ilya Maximets i.maximets at samsung.com
Thu Jul 4 12:43:02 UTC 2019

On 03.07.2019 0:03, Michal Orsák wrote:
> Hello everyone,

Hi Michal.

> On 02/07/2019 19:43, Ian Stokes wrote:
>> On 7/2/2019 6:00 PM, Ben Pfaff wrote:
>>> Hi Ilya and Ian.  Please allow me to introduce Michal Orsak, a grad
>>> student currently looking at packet classifiers.  He's implemented a
>>> novel classifier that is faster than the one already in OVS in the
>>> benchmarks that he's run.  His classifier is tree-based, like most
>>> high-performance classifiers, but also incremental so that flows can be
>>> inserted and deleted individually without undue delay. Ultimately, it
>>> might make sense to replace the OVS userspace datapath classifier by one
>>> based on the concepts that he's come up with.
>> Thanks for the introduction Ben. I wasn't aware of Michals work. The timing is quite apt as currently I've been looking at Harrys (CCd) approach to improving performance for the current DPCLS.
>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358790.html
>> I was hoping to have this approach in place for OVS 2.12 as there's been ongoing testing/reviews across the community with general improvements of ~ 15% or more for both x86 and ARM architectures which looks promising.
>> I'd be interested in seeing Michals proposals if there are patches available that would be great.
> Now I am implementing the DPCLS API so we can test the thing. There are also some differences and problems like:
> * support for priority of the rules
> * only single matching rule
> * c++
> * my limited knowledge of DPCLS

One important thing about datapath flow rules you need to consider
is that datapath flows, unlike OF rules, has no priorities, i.e. only
one flow in dpcls could match with your packet. So the order in which
you will check flows doesn't matter for correctness, however it matters
for performance. Sooner you'll find the right flow, faster your
classifier will be.

Current implementation of DPCLS in dpif-netdev looks like this:

  * Each thread has its own set of dpcls classifiers (pmd->classifiers),
    one classifier per in_port.

  * Each dpcls classifier is a set of subtables (cls->subtables).
    One subtable per key mask (bitmask of a fields that matches the flow).

  * Each subtable is a set of rules (subtable->rules). All rules in a
    subtable matches different values of the same fields (from the key mask).

Lookup method:

  1. Get the dpcls from pmd->classifiers by in_port.
  2. For each subtable in dpcls:
     3. Try to find the matching rule in a subtable->rules.
     4. If found --> break.

Important optimizations:

  * pmd->classifiers is a hash map that provides a fast lookup.
  * subtable->rules is a hash map too.
  * Lookup in subtable->rules is batched. i.e. we're trying to
    find rules for up to 32 packets at the same time.
  * dpcls collects hit counts for subtables and sorts them, so
    the 'hottest' subtables always checked first. This speeds up
    the lookup process.

>>> A difficulty with classifiers, however, is coming up with an appropriate
>>> set of benchmarks to compare them fairly.  The userspace datapath
>>> focuses on performance, so do you have a set of benchmarks that you
>>> recommend for comparison?  Are there other criteria that would be
>>> important (besides correctness)?
>> Agreed, that's something we've been looking for feedback on from the community to date. Most testing we have seen have been performance focused.
> There are so many corner cases... Any test is appreciated.
>> The areas I think that would be good to measure would be flow addition/flow deletion overhead in CPU cycles. Also memory and time complexity comparisons for existing and proposed techniques might be nice to see.
> Currently I do have comparison with OVS TSS, DPDK librte_acl, PartitionSort, HyperSplit, HyperCuts and few others. (C based benchmarks. Some of them not optimized implementations. Originally for devel. of hardware based classifier.). All previously mentioned except TSS do have worst case complexities from the same class. However it is more complicated as it depends on size of the matching field and other factors which have different meanings between algs. I do not have understandable and short definitions yet.

Scenarios for performance testing is always a hard topic as we
still didn't found the common set of cases that will work for

Similar to Flavio, I'm mostly working with OVS in OpenStack setups.
So, the cases are similar. Not very big number of flows, mostly
NORMAL actions. What I'm trying to test while evaluating performance
related changes (beside the typical PHY-PHY and PVP forwarding) is
the PHY-VM-PHY forwarding while PHY is the balanced-tcp bonding port.
In this case all the packets that goes from VM to PHY are going through
recirculation and matches twice in datapath. Here is the speed of
classifier matters a lot even if we have not much rules installed.
I'm usually testing the matrix of packet sizes (64 - 1518) and number
of IPv4 traffic flows (1 - 1M).
However, OF rules differs from datapath rules in dpcls. They usually
more simple due to flow translation algorithms. In case of prevailing
of NORMAL actions, datapath flow tables looks more like a learning
table of L2 switch, where you mostly need to just match source and
destination MACs. But there are still cases where dpcls rules could
be complex anyway.

Best regards, Ilya Maximets.

>> Thanks
>> Ian
>>> (I'd take answers from anyone, not just Ian and Ilya!)
>>> Thanks,
>>> Ben.
> Thanks,
> Michal

More information about the dev mailing list