[ovs-dev] [PATCH 1/1] sFlow: adds sFlow export

Mon Nov 9 21:20:48 UTC 2009

On Nov 9, 2009, at 12:46 PM, Ben Pfaff wrote:

> Thank you for your response!
>
> Neil McKee <neil.mckee at inmon.com> writes:
>
>> On Nov 9, 2009, at 11:02 AM, Ben Pfaff wrote:
>>
>>> Neil McKee <neil.mckee at inmon.com> writes:
>>> In general, I think that we can do most of what
>>> sflow_take_sample() does in userspace.  Userspace should have the
>>> set of flow actions (since it added them in the first place).  I
>>> guess the only trick, if there is one, is that sFlow packets
>>> might show up in userspace for a flow after it's been removed.
>>> However, that's not very likely (since flows usually get removed
>>> only after they've gone idle) so perhaps we could ignore it on a
>>> first pass.
>>
>> I'll defer to you on that choice,  though it does seem like it might
>> cause an awkward race someday.   sFlow samples should be processed at
>> a low priority (e.g. compared to spanning-tree updates) so they could
>> languish in a queue for some time.  I think the standard allows you
>> take as long as a second.  If that ever happened then the forwarding
>> details might have changed or disappeared.
>
> I think we can avoid that problem.  It's only an initial, very
> simple implementation that would have it.  Even if we cannot in
> every case, for whatever reason, I am sure that we could detect
> it, and then we would, I guess, just increment the "drops"
> counter in struct flow_sample (would that be reasonable
> behavior?).
>

The purpose of the "drops" counter is to indicate that you couldn't keep
up.  So if you try to queue a sample for processing but the queue is
full (because it hasn't been serviced for a while),  then you would
increment it.   This could be done at the user-space end by detecting
any time that you went to read from the queue and discovered it was  
full.

It seems like a lot of trouble to go to to look up the forwarding  
decision
again in user-space (and check to see that it hasn't changed in the  
interim)
when you already have it right there,  at the very moment that the
packet is forwarded.   What reference would you pass up,  anyway?
The hash key?  I guess I'm not understanding what's wrong with
doing it in the datapath.  Is there another spin-lock problem here,  or
is it more a general rule-of-thumb that you always try to minimize
changes there?

>>> This code can run any number of CPUs at once, so at a minimum
>>> we'd need use to use a spinlock to protect the new "sflow_"
>>> members of 'dp'.  But that will likely introduce unnecessary
>>> cache line bouncing, so it would be better to use per-CPU data.
>>>
>>> But could we just use net_random() and avoid global data
>>> entirely?  I see that this is allowed by the sFlow specification.
>>> net_random() is already implemented in terms of per-CPU data, and
>>> the net_random() PRNG's randomness should be suitable for the
>>> requirements.
>>>
>>> I'm really tempted, in fact, to make sFlow sampling a flow
>>> action.  The ODPAT_CONTROLLER action (struct
>>> odp_action_controller), which sends a packet to userspace, has 16
>>> bits of "reserved" space.  We could use those bits for specifying
>>> a probability: 0xffff means always send, 0x8000 means send half
>>> the time, and so on.  Then userspace could turn sFlow on and off
>>> on a per-flow basis (that wouldn't necessarily be useful in the
>>> general case, but I could see wanting sFlow enabled for certain
>>> switch ports and not others, at the very least).
>>>
>>> Then we could also drop the configuration ioctls that you added.
>>>
>>
>> Yes.  The code we added was assuming that a call to net_random()
>> is much more expensive than a decrement-and-test countdown,  but
>> if it works better to just test every packet against a sampling
>> probability then that is fine.   I think this may restrict you to
>> choosing
>> powers-of-2 for your effective-sampling-rate settings,  but that is
>> OK.   If someone asks for 1-in-100 it is fine to choose 1-in-128,
>> and report that in the sFlow samples that you send.
>
> If we were to adopt the strategy I have in mind, then a 1-in-100
> rate would get converted by ovs-vswitchd into a 655-in-65536
> rate, which is <1% error.  Not too bad really.

Sounds good.

It's an interesting property of sFlow that you don't have to be
particularly close to the intended mean.   If the collector
is implemented correctly it will ignore the advertised number and
compute the "effective sampling rate" by looking at the number of
samples actually received and comparing it to the change in the
value of the sample_pool.   That way you are more or less immune
to packet loss in transit,  and that's why sFlow can get away with
using UDP as the transport.   That's also why it's important to
increment the sample_pool right at the "coal-face":  it allows for
samples to be dropped/lost at any point after that,  even on the
queue up to user-space.  Is that going to be a problem for the
per-CPU thing?  How expensive is an atomic increment on a
32-bit number?  Do we need each CPU to increment it's own
sample_pool and then always add them together to synthesize
the total at the point where a packet-sample is sent?

>
>>>> +/* sFlow library callback to allocate memory. */
>>>> +static void *
>>>> +sflow_agent_alloc_cb(void *magic, SFLAgent *agent, size_t bytes)
>>>> +{
>>>> +    return calloc(1, bytes);
>>>> +}
>>>> +
>>>
>>> I guess that the sFlow library gracefully handles allocation
>>> failure?  If not, there is also an xcalloc() function that will
>>> abort on allocation failure.
>>>
>>
>> The sFlow library only ever allocates memory using a callback,
>> so we can change it to use xcalloc in:
>> ofproto.c:sflow_agent_alloc_cb()
>
> Well, I guess the question is which is a better choice.  Probably
> it doesn't matter much.

Better to use xcalloc,  because I doubt if the sFlow library handles
a failure gracefully.

Neil