[ovs-dev] Tunable flow eviction threshold

Jesse Gross jesse at nicira.com
Thu Jul 28 19:11:46 UTC 2011


On Thu, Jul 28, 2011 at 10:16 AM, Pravin Shelar <pshelar at nicira.com> wrote:
> On Wed, Jul 27, 2011 at 4:13 PM, Jesse Gross <jesse at nicira.com> wrote:
>> On Wed, Jul 27, 2011 at 2:24 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>>> On Wed, Jul 27, 2011 at 12:17 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>> On Wed, Jul 27, 2011 at 11:14 AM, Ethan Jackson <ethan at nicira.com> wrote:
>>>>>> One strategy that I have considered is to be able to ask only for flows
>>>>>> that have a non-zero packet count.  That would help with the common case
>>>>>> where, when there is a large number of flows, they are caused by a port
>>>>>> scan or some other activity with 1-packet flows.  It wouldn't help at
>>>>>> all in your case.
>>>>>
>>>>> You could also have the kernel pass down to userspace what logically
>>>>> amounts to a list of the flows  which have had their statistics change
>>>>> in the past 10 seconds.  A bloom filter would be a sensible approach.
>>>>> Again, probably won't help at all in Simon's case, and may or may-not
>>>>> be a useful optimization above simply not pushing down statistics for
>>>>> flows which have a zero packet count.
>>>>
>>>> I don't think that you could implement a Bloom filter like this in a
>>>> manner that wouldn't cause cache contention.  Probably you would still
>>>> need to iterate over every flow in the kernel, you would just be
>>>> comparing last used time to current time - 10 instead of packet count
>>>> not equal to zero.
>>>>
>>> cpu cache contention can be fixed by partitioning all flow by
>>> something (e.g. port no)  and assigning cache replacement processing
>>> to a cpu. replacement algo could simple as active and inactive LRU
>>> list. this is how kernel page cache replacement looks like from high
>>> level.
>>
>> This isn't really a cache replacement problem though.  Maybe that's
>> the high level goal that's being solved but I wouldn't want to make
>> that assumption in the kernel as it would likely impose too many
>> restrictions on what userspace can do if it wants to implement
>> something completely different in the future.  Anything the kernel
>> provides should just be a simple primitive, potentially analogous to
>> the referenced bit that you would find in a page table.
>>
>> You also can't impose a CPU partitioning scheme on flows because we
>> don't control the CPU that packets are being processed on.  That's
>> determined by the originator of the packet (such as RSS on the NIC)
>> and then we just handle it on the same CPU.  However, you can use a
>> per-CPU data structure to store information regardless of flow and
>> then merge them later.  This actually works well enough for something
>> like a Bloom filter because you can superimpose the results on top of
>> each other without a problem.
>
> I am not sure why packet CPU can not be controlled by using interrupt
> affinity/RSS.
>
> I think partitioning on basis of cpu or port number is good for scalability.

If you're using a NIC with RSS you'll get partitioning on a per-flow
basis, which will provide better load balancing than doing it on the
basis of port.

If you're receiving packets from a virtual interface from a VM, the
CPU depends on what the interface wants to do.  It could be pretty
much anything, though for example Xen does it on a per-port basis.

If send from an application on the local port, the CPU will be
whichever one is executing the application.

So I'm not saying that it can't be controlled or isn't already
partitioned, just that you can't rely on any particular scheme within
OVS unless you want to use an IPI, which certainly isn't going to help
performance.



More information about the dev mailing list