[ovs-dev] [flow-compat 6/7] ofproto-dpif: Factor NetFlow active timeouts out of flow expiration.

Jesse Gross jesse at nicira.com
Mon Nov 7 23:54:40 UTC 2011


On Mon, Nov 7, 2011 at 1:41 PM, Ben Pfaff <blp at nicira.com> wrote:
> On Mon, Nov 07, 2011 at 11:18:04AM -0800, Jesse Gross wrote:
>> On Mon, Nov 7, 2011 at 9:24 AM, Ben Pfaff <blp at nicira.com> wrote:
>> > On Sun, Nov 06, 2011 at 09:56:10PM -0800, Jesse Gross wrote:
>> >> On Fri, Nov 4, 2011 at 4:43 PM, Ben Pfaff <blp at nicira.com> wrote:
>> >> > NetFlow active timeouts were only mixed in with flow expiration for
>> >> > convenience: both processes need to iterate all the facets. ??But
>> >> > an upcoming commit will change flow expiration to work in terms of
>> >> > a new "subfacet" entity, so they will no longer fit together well.
>> >> >
>> >> > This change could be seen as an optimization, since NetFlow active
>> >> > timeouts don't ordinarily have to run as often as flow expiration,
>> >> > especially when the flow expiration rate is stepped up due to a
>> >> > large volume of flows.
>> >>
>> >> This has a pretty significant effect on the accuracy of the timeouts
>> >> that I'm not sure is intended. ??Currently, active timeouts are done on
>> >> a per-flow basis starting from time of first use. ??However, this
>> >> essentially starts a per-bridge timer on first configuration that must
>> >> first expire in order to check the per-flow timer. ??So with the
>> >> default timeout of 10 minutes, the first active timeout will occur
>> >> somewhere between 10 and 20 minutes after first use. ??This only
>> >> happens for the first one though since they will tend to synchronize.
>> >> However, I think that there is a potential for the two timers to
>> >> desynchronize, resulting in apparently random doubling of intervals.
>> >> For example, netflow_run() is also called from gen_netflow_rec() when
>> >> it fills up a packet but does not check the return code, skipping the
>> >> active timeout if a timer tick occurred in that window. ??Finally, the
>> >> current active timeout code distributes reporting over a large span of
>> >> time but this concentrates all of them at once, which could cause a
>> >> load spike in the collector if a number of switches are brought up at
>> >> the same time.
>> >
>> > Hmm.
>> >
>> > Maybe I should just do NetFlow reporting once a second (as it was
>> > before). ??What do you think?
>>
>> I think either that or actually tracking when the next timeout will
>> occur are the only real solutions.  However, I think the only
>> efficient way to do correct timeouts is to again combine this with the
>> flow expiration code, which gets us back to where we were before.
>
> I don't understand.  NetFlow active timeouts are essentially
> independent of flow expiration, except to the extent that if a flow
> expires then it doesn't need active timeouts.

Sorry, when I said correct timeouts I meant calculating the timeout
for the next flow.  I think this needs to be integrated with flow
expiration because if a flow expires from inactivity then you have to
check whether it was the cause of the next active timeout interval and
if so calculate a new one.

>> When you say do reporting once a second do you mean essentially the
>> same as in this patch but use 1 second instead of the active timeout
>> interval or go back to the original version?
>
> The same as in this patch but go back to 1 second, which is the
> minimum rate at which we call the main "expire()" function in
> ofproto-dpif.c that actually runs the loop above.

So that doubles the number of times that we are iterating over the
facets.  Do you think that will be a problem for large numbers of
flows?



More information about the dev mailing list