[ovs-dev] Tunable flow eviction threshold
Simon Horman
horms at verge.net.au
Thu Jul 28 01:54:57 UTC 2011
On Wed, Jul 27, 2011 at 09:31:19AM -0700, Ben Pfaff wrote:
> On Wed, Jul 27, 2011 at 03:20:48PM +0900, Simon Horman wrote:
> > On Tue, Jul 26, 2011 at 05:02:25PM -0700, Ben Pfaff wrote:
> > > 1. To determine what flows should be retained in the kernel, and
> > > what flows should be discarded, based on the number of packets
> > > that have recently passed through for that flow. We also need to
> > > do this to maintain accurate OpenFlow flow statistics.
> > >
> > > Currently this is implemented by periodically dumping stats for
> > > every kernel flow, clearly O(n). I don't know a better way.
> >
> > I guess this hurts a lot as that code's purpose
> > is to mitigate the effects of large numbers of flows.
> >
> > Off the top of my head, I wonder if an approach that would
> > work would be to only ask the kernel for flows over a certain age.
> > Or otherwise push some of the logic into the kernel where
> > access to the data ought to be cheaper.
> > Or perhaps to have an algorithm that can deal with
> > a partial dump from the kernel - e.g. a dump of up to 10,000 flows.
>
> One strategy that I have considered is to be able to ask only for flows
> that have a non-zero packet count. That would help with the common case
> where, when there is a large number of flows, they are caused by a port
> scan or some other activity with 1-packet flows. It wouldn't help at
> all in your case.
>
> It's already possible to do a partial dump. Just read, e.g., the first
> 10,000 flows from the Netlink socket and leave the rest unread. Then
> later read the next 10,000 flows from the socket, and so on. But I
> don't see how it really solves a problem. It divides the dump into
> pieces but it doesn't make obtaining it any faster.
>
> By "flows over a certain age", do you mean flows that have not been used
> in a certain amount of time? I can see how that helps with eviction,
> although it doesn't help maintaining OpenFlow flow stats.
>
> Pushing some of the logic into the kernel might make sense. I wouldn't
> want to make the kernel implementation less general-purpose in the
> process, though.
Agreed. This conversation has continued else where in this thread.
I'll read those emails more thoroughly before responding in a more
meaningful fashion (hopefully).
> > > > My current test is rather abusive - I'm using the kernel packet generator
> > > > to send packets for many flows at roughly the same per-flow packet rate.
> > > > In that scenario all flows are equal, so I think that classification would
> > > > probably break down. Indeed, it plays havoc with the current eviction algorithm.
> > > > However, I do agree that some kind of prioritisation algorithm could work
> > > > well with less abusive (and more common?) workloads.
> > >
> > > What does your flow table look like?
> >
> > Does this answer the question?
> >
> > sudo ovs-ofctl add-flow br3 "in_port=1 ip nw_dst=10.0.0.0/8 idle_timeout=0 action=output:3"
> >
> > Where 3 is a dummy interface.
> >
> > My test involves sending 128byte packets at 450,000 pps (~1/2Gigabit/s).
> >
> > By tuning the eviction threshold and with the table resize fix
> > I sent a week or so back I can get up to 128k flows and still
> > a have a little of one CPU left over.
> >
> > Empirically it seems that there is some hard limit of 128k flows
> > that I have hit.
>
> Thanks for the info. I might want to try to replicate your results. I
> have never used the kernel packet generator: would you mind giving me a
> recipe for producing this behavior?
Sure, its a bit raw. but its attached and is exactly what I have been using.
In particular, the bit about "eth4" is left over from earlier
testing work.
> Your test case really strikes at all the assumptions I've always made
> about network traffic, which is essentially that when there is heavy
> traffic there are identifiable "heavy hitters", so that one may obtain
> good performance by focusing on keeping the CPU% used to process those
> to a minimum.
>
> The limit should be higher than 128 kflow. Comments in datapath/table.h
> imply that the limit should be 256 kflow with a 64-bit kernel or 1 Mflow
> with a 32-bit kernel. Maybe there's another bug there.
I was also expecting the limit to be 1M flows and to run
out of CPU at around 200,000 for the particular test I am using.
It is of course possible that I am miss-reading the results.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pktgen.sh
Type: application/x-sh
Size: 1476 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20110728/f8265f0e/attachment-0004.sh>
More information about the dev
mailing list