[ovs-dev] Wildcard Matching optimization idea

Wei Li liw at dtdream.com
Thu Jan 14 06:37:34 UTC 2016


insert a rule to "mirror" and swap "mirror" and "real"

then real does not contain the new inserted rule

how to sync?  take the "long" time as inserting to "mirror"?

在 2016/1/12 18:37, Fischetti, Antonio 写道:
> Thanks Zoltan for your questions.
>
> Anyone else has any new feedback?
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Fischetti,
>> Antonio
>> Sent: Friday, December 18, 2015 3:44 PM
>> To: Zoltan Kiss; dev at openvswitch.org
>> Subject: Re: [ovs-dev] Wildcard Matching optimization idea
>>
>>
>>> -----Original Message-----
>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>>> Sent: Friday, December 18, 2015 12:38 PM
>>> To: Fischetti, Antonio; dev at openvswitch.org
>>> Subject: Re: [ovs-dev] Wildcard Matching optimization idea
>>>
>>>
>>>
>>> On 17/12/15 16:23, Fischetti, Antonio wrote:
>>>> Hi Zoltan, thanks for your questions.
>>>> Please find below my answers inline.
>>>>
>>>>> -----Original Message-----
>>>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org]
>>>>> Sent: Thursday, December 17, 2015 2:33 PM
>>>>> To: Fischetti, Antonio; dev at openvswitch.org
>>>>> Subject: Re: [ovs-dev] Wildcard Matching optimization idea
>>>>>
>>>>>
>>>>>
>>>>> On 17/12/15 10:41, Fischetti, Antonio wrote:
>>>>>> Hi All,
>>>>>> Here's an optimization idea for the datapath classifier table.
>>>>>> I'd like to get some feedback.
>>>>>>
>>>>>> I used the DPDK ACL tables. They can perform a wildcarded matching
>>> and
>>>>> each
>>>>>> lookup requires less CPU cycles than the Classifier.
>>>>>> Anyway there's a negative aspect with ACLs. They take a very long time
>>> to
>>>>>> insert a new Rule.
>>>>>> It can be 50 times greater than an insertion into the Classifier. See
>> Note
>>>>> below
>>>>>> for further details.
>>>>>>
>>>>>> So a simple 1:1 replacement of the Classifier with an ACL table is not a
>>>>> viable
>>>>>> solution.
>>>>>>
>>>>>> The idea described below is instead to replace the Classifier with 2 ACL
>>>>>> tables. One is the 'Operating', while the other is a 'Shadow' table.
>>>>>>
>>>>>> Any lookup will be performed on the Operating table.
>>>>>>
>>>>>> Instead any new insertion will be executed on the Shadow table by
>>> means
>>>>> of a
>>>>>> separate thread.
>>>>>> After the insertion is done, the 2 tables will be swapped.
>>> Are you swapping after each insertion, or in batches?
>> In batches.
>>
>>> The new shadow table needs to get updated first to be sync with
>> Operating, does it take
>>> a similar amount of time?
>> No, the shadow table acts as a 'mirror' of the operating. So the 2
>> tables are supposed to contain exactly the same entries.
>> An exception is during the transient insertion procedure. But after it
>> is completed the 2 tables will contain again the same entries.
>>
>>> Instead of having this 2 table, how about have one, and make it possible
>>> that you can look up while an insertion is on place? Something in an RCU
>>> fashion?
>> Unluckly not. While an insertion is taking place it is not possible to access the
>> ACL
>> to read its entries.
>> More precisely, an ACL insertion means 2 actions: add + rebuild. The rebuild
>> takes
>> the 95% cpu cycles of all the insertion.
>> You could read while the 'add' is in progress. Instead you can't read while the
>> 'rebuild' is still happening.
>> That's why I'm using 2 ACL tables.
>>
>>>>> So while this insertion happens, you still look up in the actual
>>>>> Operating table.
>>>> Yes, while insertion is in progress any lookup will still be carried out on
>> the
>>>> Operating table.
>>> I don't know how the classifier works exactly, but is the following
>>> scenario possible?:
>>>
>>> Rule A matches a flow and specifies an action. A new insertion would
>>> essentially remove Rule A and add B which matches the same flow but
>>> specifies a different action. While that happens, packets would still
>>> match A, while the expectation probably would be to match B.
>>>
>> The same issue can happen with the Classifier, in this case it will be worse.
>> The solution with 2 ACLs have an insertion latency much longer.
>> That is because an ACL insertion can be about 50 times greater than an
>> insertion into the Classifier.
>>
>>>>> What happens if you have a new insertion in the meantime?
>>>> The new Rule gets buffered into a 'wait' queue.
>>>>
>>>>> Especially, what happens if your lookup yields the same rule
>>>>> which is inserted at the moment?
>>>> That's a good point.  At the current stage it is simply added into the wait
>>> queue.
>>>> So I could potentially have duplications where different rules into the ACL
>>> are
>>>> referring to the same netdev-flow.
>>>> To avoid these duplications there could be 2 approaches.
>>>> One option would be to check that in the wait queue that rule is not
>>> present.
>>>> Another option would be to store it into the wait queue anyway and then
>>> check
>>>> that the ACL does not already contain that rule.
>>>
>>>
>>>
>>>>>> Thus the Shadow table will now become the Operating one, and
>>> viceversa.
>>>>>>
>>>>>> Is the following ok with real use cases?
>>>>>> ========================================
>>>>>> An Assumption was made: new sets of Rules arrive with a frequency
>>> lower
>>>>>> than 1 (Rule Sets)/sec.
>>>>>> Would this be ok with real use cases?
>>>>>>
>>>>>>
>>>>>> Performance Figures
>>>>>> ===================
>>>>>> The table below refers to a mono-directional test where the
>>> performance
>>>>> is
>>>>>> compared between the 2 implementations.
>>>>>> Some Flows were installed so that the Classifier was using 7 SubTables.
>>>>>> The ACL Rule format was {Protocol, IPdest, MACsrc, UdpPortDest, ToS,
>>>>> VlanTci}.
>>>>>> The performance figures are expressed in Mpps.
>>>>>>
>>>>>>                     +------------+------------+
>>>>>>                     | Classifier |   2 ACLs   |
>>>>>> +----------------+------------+------------+
>>>>>> | Max Throughput |    2.2     |    5.4     |
>>>>>> |     [Mpps]     |            |            |
>>>>>> +----------------+------------+------------+
>>>>>>
>>>>>>
>>>>>> Conclusions
>>>>>> ===========
>>>>>> At this stage it would really be helpful to have an initial feedback from
>>> the
>>>>>> Community. Any comment or suggestion will be useful to drive further
>>>>>> developments.
>>>>>>
>>>>>>
>>>>>> References
>>>>>> ==========
>>>>>> DPDK ACL Rules, how to:
>>>>>>
>>> http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html
>>>>>>
>>>>>> Notes
>>>>>> =====
>>>>>> When an ACL table contains about 2000 Rules with a structure like
>>>>>> {Protocol, IPsource, IPdest, PortSource, PortDest}
>>>>>> a new insertion costs about 69000 CPUcycles/Rule.
>>>>>> Instead under similar operating conditions the Classifier would require
>>>>> about
>>>>>> 1300 CPUcycles/Rule.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Antonio
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev at openvswitch.org
>>>>>> http://openvswitch.org/mailman/listinfo/dev
>>>>>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev




More information about the dev mailing list