[ovs-dev] Tunable flow eviction threshold

Thu Jul 28 17:16:30 UTC 2011

On Wed, Jul 27, 2011 at 4:13 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Wed, Jul 27, 2011 at 2:24 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>> On Wed, Jul 27, 2011 at 12:17 PM, Jesse Gross <jesse at nicira.com> wrote:
>>> On Wed, Jul 27, 2011 at 11:14 AM, Ethan Jackson <ethan at nicira.com> wrote:
>>>>> One strategy that I have considered is to be able to ask only for flows
>>>>> that have a non-zero packet count.  That would help with the common case
>>>>> where, when there is a large number of flows, they are caused by a port
>>>>> scan or some other activity with 1-packet flows.  It wouldn't help at
>>>>> all in your case.
>>>>
>>>> You could also have the kernel pass down to userspace what logically
>>>> amounts to a list of the flows  which have had their statistics change
>>>> in the past 10 seconds.  A bloom filter would be a sensible approach.
>>>> Again, probably won't help at all in Simon's case, and may or may-not
>>>> be a useful optimization above simply not pushing down statistics for
>>>> flows which have a zero packet count.
>>>
>>> I don't think that you could implement a Bloom filter like this in a
>>> manner that wouldn't cause cache contention.  Probably you would still
>>> need to iterate over every flow in the kernel, you would just be
>>> comparing last used time to current time - 10 instead of packet count
>>> not equal to zero.
>>>
>> cpu cache contention can be fixed by partitioning all flow by
>> something (e.g. port no)  and assigning cache replacement processing
>> to a cpu. replacement algo could simple as active and inactive LRU
>> list. this is how kernel page cache replacement looks like from high
>> level.
>
> This isn't really a cache replacement problem though.  Maybe that's
> the high level goal that's being solved but I wouldn't want to make
> that assumption in the kernel as it would likely impose too many
> restrictions on what userspace can do if it wants to implement
> something completely different in the future.  Anything the kernel
> provides should just be a simple primitive, potentially analogous to
> the referenced bit that you would find in a page table.
>
> You also can't impose a CPU partitioning scheme on flows because we
> don't control the CPU that packets are being processed on.  That's
> determined by the originator of the packet (such as RSS on the NIC)
> and then we just handle it on the same CPU.  However, you can use a
> per-CPU data structure to store information regardless of flow and
> then merge them later.  This actually works well enough for something
> like a Bloom filter because you can superimpose the results on top of
> each other without a problem.

I am not sure why packet CPU can not be controlled by using interrupt
affinity/RSS.

I think partitioning on basis of cpu or port number is good for scalability.

Thanks,
Pravin.