[ovs-dev] [flow-compat 6/7] ofproto-dpif: Factor NetFlow active timeouts out of flow expiration.

Ben Pfaff blp at nicira.com
Tue Nov 8 00:08:31 UTC 2011


On Mon, Nov 07, 2011 at 03:54:40PM -0800, Jesse Gross wrote:
> On Mon, Nov 7, 2011 at 1:41 PM, Ben Pfaff <blp at nicira.com> wrote:
> > On Mon, Nov 07, 2011 at 11:18:04AM -0800, Jesse Gross wrote:
> >> On Mon, Nov 7, 2011 at 9:24 AM, Ben Pfaff <blp at nicira.com> wrote:
> >> > On Sun, Nov 06, 2011 at 09:56:10PM -0800, Jesse Gross wrote:
> >> >> On Fri, Nov 4, 2011 at 4:43 PM, Ben Pfaff <blp at nicira.com> wrote:
> >> >> > NetFlow active timeouts were only mixed in with flow expiration for
> >> >> > convenience: both processes need to iterate all the facets. ??But
> >> >> > an upcoming commit will change flow expiration to work in terms of
> >> >> > a new "subfacet" entity, so they will no longer fit together well.
> >> >> >
> >> >> > This change could be seen as an optimization, since NetFlow active
> >> >> > timeouts don't ordinarily have to run as often as flow expiration,
> >> >> > especially when the flow expiration rate is stepped up due to a
> >> >> > large volume of flows.
> >> >>
> >> >> This has a pretty significant effect on the accuracy of the timeouts
> >> >> that I'm not sure is intended. ??Currently, active timeouts are done on
> >> >> a per-flow basis starting from time of first use. ??However, this
> >> >> essentially starts a per-bridge timer on first configuration that must
> >> >> first expire in order to check the per-flow timer. ??So with the
> >> >> default timeout of 10 minutes, the first active timeout will occur
> >> >> somewhere between 10 and 20 minutes after first use. ??This only
> >> >> happens for the first one though since they will tend to synchronize.
> >> >> However, I think that there is a potential for the two timers to
> >> >> desynchronize, resulting in apparently random doubling of intervals.
> >> >> For example, netflow_run() is also called from gen_netflow_rec() when
> >> >> it fills up a packet but does not check the return code, skipping the
> >> >> active timeout if a timer tick occurred in that window. ??Finally, the
> >> >> current active timeout code distributes reporting over a large span of
> >> >> time but this concentrates all of them at once, which could cause a
> >> >> load spike in the collector if a number of switches are brought up at
> >> >> the same time.
> >> >
> >> > Hmm.
> >> >
> >> > Maybe I should just do NetFlow reporting once a second (as it was
> >> > before). ??What do you think?
> >>
> >> I think either that or actually tracking when the next timeout will
> >> occur are the only real solutions. ??However, I think the only
> >> efficient way to do correct timeouts is to again combine this with the
> >> flow expiration code, which gets us back to where we were before.
> >
> > I don't understand. ??NetFlow active timeouts are essentially
> > independent of flow expiration, except to the extent that if a flow
> > expires then it doesn't need active timeouts.
> 
> Sorry, when I said correct timeouts I meant calculating the timeout
> for the next flow.  I think this needs to be integrated with flow
> expiration because if a flow expires from inactivity then you have to
> check whether it was the cause of the next active timeout interval and
> if so calculate a new one.

Off-hand, I don't think it would pay to be *that* precise about the
interval.  I think that would require something like a min-heap or
other priority queue data structure.

> >> When you say do reporting once a second do you mean essentially the
> >> same as in this patch but use 1 second instead of the active timeout
> >> interval or go back to the original version?
> >
> > The same as in this patch but go back to 1 second, which is the
> > minimum rate at which we call the main "expire()" function in
> > ofproto-dpif.c that actually runs the loop above.
> 
> So that doubles the number of times that we are iterating over the
> facets.  Do you think that will be a problem for large numbers of
> flows?

I guess that it is strictly less expensive that what we have now for
large numbers of flows, since in the limit we currently run both flow
expiration and NetFlow active timeouts 10 times/s, whereas if we drop
NetFlow active timeouts back to once a second then we iterate only
1/10th as often.

For small numbers of flows, I don't think it matters.

I guess that the average flow expires long before its first active
timeout with the default settings (10 minutes is a long time).  The
default socket buffer size is about 128 kB.  We can fit about 2,600
NetFlow v5 expirations in that socket buffer by my
back-of-the-envelope calculation.  So I think that we could probably
increase the NetFlow active timeout interval well beyond a second and
still have a hard time losing any of them.




More information about the dev mailing list