[ovs-dev] Wildcard Matching optimization idea

Fischetti, Antonio antonio.fischetti at intel.com
Tue Jan 12 10:37:53 UTC 2016


Thanks Zoltan for your questions.

Anyone else has any new feedback?

> -----Original Message-----
> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Fischetti,
> Antonio
> Sent: Friday, December 18, 2015 3:44 PM
> To: Zoltan Kiss; dev at openvswitch.org
> Subject: Re: [ovs-dev] Wildcard Matching optimization idea
> 
> 
> > -----Original Message-----
> > From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> > Sent: Friday, December 18, 2015 12:38 PM
> > To: Fischetti, Antonio; dev at openvswitch.org
> > Subject: Re: [ovs-dev] Wildcard Matching optimization idea
> >
> >
> >
> > On 17/12/15 16:23, Fischetti, Antonio wrote:
> > > Hi Zoltan, thanks for your questions.
> > > Please find below my answers inline.
> > >
> > >> -----Original Message-----
> > >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
> > >> Sent: Thursday, December 17, 2015 2:33 PM
> > >> To: Fischetti, Antonio; dev at openvswitch.org
> > >> Subject: Re: [ovs-dev] Wildcard Matching optimization idea
> > >>
> > >>
> > >>
> > >> On 17/12/15 10:41, Fischetti, Antonio wrote:
> > >>> Hi All,
> > >>> Here's an optimization idea for the datapath classifier table.
> > >>> I'd like to get some feedback.
> > >>>
> > >>> I used the DPDK ACL tables. They can perform a wildcarded matching
> > and
> > >> each
> > >>> lookup requires less CPU cycles than the Classifier.
> > >>> Anyway there's a negative aspect with ACLs. They take a very long time
> > to
> > >>> insert a new Rule.
> > >>> It can be 50 times greater than an insertion into the Classifier. See
> Note
> > >> below
> > >>> for further details.
> > >>>
> > >>> So a simple 1:1 replacement of the Classifier with an ACL table is not a
> > >> viable
> > >>> solution.
> > >>>
> > >>> The idea described below is instead to replace the Classifier with 2 ACL
> > >>> tables. One is the 'Operating', while the other is a 'Shadow' table.
> > >>>
> > >>> Any lookup will be performed on the Operating table.
> > >>>
> > >>> Instead any new insertion will be executed on the Shadow table by
> > means
> > >> of a
> > >>> separate thread.
> > >>> After the insertion is done, the 2 tables will be swapped.
> >
> > Are you swapping after each insertion, or in batches?
> 
> In batches.
> 
> > The new shadow table needs to get updated first to be sync with
> Operating, does it take
> > a similar amount of time?
> 
> No, the shadow table acts as a 'mirror' of the operating. So the 2
> tables are supposed to contain exactly the same entries.
> An exception is during the transient insertion procedure. But after it
> is completed the 2 tables will contain again the same entries.
> 
> > Instead of having this 2 table, how about have one, and make it possible
> > that you can look up while an insertion is on place? Something in an RCU
> > fashion?
> 
> Unluckly not. While an insertion is taking place it is not possible to access the
> ACL
> to read its entries.
> More precisely, an ACL insertion means 2 actions: add + rebuild. The rebuild
> takes
> the 95% cpu cycles of all the insertion.
> You could read while the 'add' is in progress. Instead you can't read while the
> 'rebuild' is still happening.
> That's why I'm using 2 ACL tables.
> 
> >
> > >>
> > >> So while this insertion happens, you still look up in the actual
> > >> Operating table.
> > >
> > > Yes, while insertion is in progress any lookup will still be carried out on
> the
> > > Operating table.
> >
> > I don't know how the classifier works exactly, but is the following
> > scenario possible?:
> >
> > Rule A matches a flow and specifies an action. A new insertion would
> > essentially remove Rule A and add B which matches the same flow but
> > specifies a different action. While that happens, packets would still
> > match A, while the expectation probably would be to match B.
> >
> 
> The same issue can happen with the Classifier, in this case it will be worse.
> The solution with 2 ACLs have an insertion latency much longer.
> That is because an ACL insertion can be about 50 times greater than an
> insertion into the Classifier.
> 
> > >
> > >> What happens if you have a new insertion in the meantime?
> > >
> > > The new Rule gets buffered into a 'wait' queue.
> > >
> > >> Especially, what happens if your lookup yields the same rule
> > >> which is inserted at the moment?
> > >
> > > That's a good point.  At the current stage it is simply added into the wait
> > queue.
> > > So I could potentially have duplications where different rules into the ACL
> > are
> > > referring to the same netdev-flow.
> > > To avoid these duplications there could be 2 approaches.
> > > One option would be to check that in the wait queue that rule is not
> > present.
> > > Another option would be to store it into the wait queue anyway and then
> > check
> > > that the ACL does not already contain that rule.
> >
> >
> >
> >
> > >
> > >>
> > >>> Thus the Shadow table will now become the Operating one, and
> > viceversa.
> > >>>
> > >>>
> > >>> Is the following ok with real use cases?
> > >>> ========================================
> > >>> An Assumption was made: new sets of Rules arrive with a frequency
> > lower
> > >>> than 1 (Rule Sets)/sec.
> > >>> Would this be ok with real use cases?
> > >>>
> > >>>
> > >>> Performance Figures
> > >>> ===================
> > >>> The table below refers to a mono-directional test where the
> > performance
> > >> is
> > >>> compared between the 2 implementations.
> > >>> Some Flows were installed so that the Classifier was using 7 SubTables.
> > >>> The ACL Rule format was {Protocol, IPdest, MACsrc, UdpPortDest, ToS,
> > >> VlanTci}.
> > >>> The performance figures are expressed in Mpps.
> > >>>
> > >>>                    +------------+------------+
> > >>>                    | Classifier |   2 ACLs   |
> > >>> +----------------+------------+------------+
> > >>> | Max Throughput |    2.2     |    5.4     |
> > >>> |     [Mpps]     |            |            |
> > >>> +----------------+------------+------------+
> > >>>
> > >>>
> > >>> Conclusions
> > >>> ===========
> > >>> At this stage it would really be helpful to have an initial feedback from
> > the
> > >>> Community. Any comment or suggestion will be useful to drive further
> > >>> developments.
> > >>>
> > >>>
> > >>> References
> > >>> ==========
> > >>> DPDK ACL Rules, how to:
> > >>>
> > http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html
> > >>>
> > >>>
> > >>> Notes
> > >>> =====
> > >>> When an ACL table contains about 2000 Rules with a structure like
> > >>> {Protocol, IPsource, IPdest, PortSource, PortDest}
> > >>> a new insertion costs about 69000 CPUcycles/Rule.
> > >>> Instead under similar operating conditions the Classifier would require
> > >> about
> > >>> 1300 CPUcycles/Rule.
> > >>>
> > >>>
> > >>> Thanks,
> > >>> Antonio
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> dev mailing list
> > >>> dev at openvswitch.org
> > >>> http://openvswitch.org/mailman/listinfo/dev
> > >>>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev


More information about the dev mailing list