[ovs-dev] [PATCH v2] Simplify kernel sFlow implementation

Fri Aug 19 18:36:02 UTC 2011

On Aug 18, 2011, at 6:55 PM, Jesse Gross wrote:

> * Atomic operations are quite slow, which means that enabling sFlow results in a major performance hit.

I was alarmed to read this.  What is the hit?  (I trust your test had the sampling-probability set so that it only take a handful of samples per second?).

Looking at actions.c: sflow_sample(),  is it really just the "atomic_inc(&p->sflow_pool);" line that does the damage?  

What about net_random().  I don't know where to look for the details on this one.  I think Ben said it was about 40 cycles.   How does it avoid using a lock or atomic instruction?  Does it maintain separate random-number seeds per thread or per cpu?

Does the compiler tend to inline the sflow_sample() function?

Should we sprinkle some more "unlikely()" branch-prediction hints?

For another project we've been experimenting with an approach that looks like this:

if(atomic_decrement(&countdown) == 0) {
   <take sample>
   for(;;) {
     if(atomic_add(&countdown, compute_next_skip()) > 0) break;
     drops++;
   }
}

Only one thread will see the countdown transition from 1->0 so it's the same as having a lock.  That means you can use whatever random number generator you want in compute_next_skip().  In the very rare corner case where your next skip doesn't get "countdown" back above 0 again,  then you just register a dropped-sample and try again.    The only step in the critical path is the atomic_decrement(),  but it sounds like we need to rethink this and try to avoid that atomic_decrement any way we can?

Neil

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20110819/cc129064/attachment-0003.html>