[ovs-dev] [PATCH v2] Simplify kernel sFlow implementation
Neil McKee
neil.mckee at inmon.com
Wed Aug 17 22:04:29 UTC 2011
On Aug 17, 2011, at 2:30 PM, Ben Pfaff wrote:
> On Wed, Aug 17, 2011 at 02:18:52PM -0700, Neil McKee wrote:
>>
>> On Aug 17, 2011, at 12:25 PM, Ben Pfaff wrote:
>>
>>> On Wed, Aug 17, 2011 at 11:51:05AM -0700, Pravin Shelar wrote:
>>>> On Tue, Aug 16, 2011 at 11:17 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>>> The use of the checksum for actions surprised me a little bit, as it
>>>>> is semantically equivalent to what we have today but perhaps not as
>>>>> accurate. ?Ben made a couple of good suggestions in the previous
>>>>> thread about how to do this cleanly and generically. ?Did you run into
>>>>> problems with those?
>>>>
>>>> AFAIU, Ben was suggesting to have cookie set from userspace. that
>>>> cookie can be returned on upcall. so that userspace can validate flow
>>>> used in kernel datapath. Other part was about keeping flows which are
>>>> deleted and updated. so that sflow can lookup those flows if required.
>>>> I thought cookie can be generated in kernel using checksum and return
>>>> it to userspace. In this way we do not need to change user-kernel
>>>> interface when sflow need extra information as it has exact flow that
>>>> kernel used.
>>>> Actually it is mentioned in previous mail thread.
>>>
>>> I was proposing two steps. In the first step, userspace would set a
>>> cookie that directly encodes information needed for sflow output.
>>
>> And you have to do that every time, regardless of whether any
>> packets from that flow end up being sampled?
>
> It's very cheap, so that's hardly worth worrying about.
>
>>> In
>>> the second step, userspace would set a cookie that uniquely identifies
>>> a version of a flow. The second step is harder because userspace has
>>> to keep around old versions of a flow for a while (the hardest part is
>>> figuring out when they can be discarded, I think).
>>>
>>> A checksum ties the set of actions to a flow. That's all we need for
>>> sflow, although it means that we either need to keep around old
>>> versions of a flow, as in the second step above, or just discard
>>> sampled packets for old versions (I think that your patch does the
>>> latter).
>>
>> Better to send the sample out with "unknown" egress port/vlan than
>> to just drop it. Otherwise you are likely to systematically drop
>> samples from flows that are short-lived, and the results will be
>> biased. (You do still know the ingress-port, right?)
>
> Thanks for the advice. I had not realized that sending a sample with
> unknown egress data was an option. Yes, we still know the ingress
> port.
>
> Our flows only expire through timeout (after about 5 seconds of
> inactivity), so I don't think that this would systematically bias
> against such flows.
>
>>> It is not as general a solution as a unique identifier,
>>> because when a flow's actions change from A to B and then back to A
>>> there is no way to distinguish whether a sampled packet corresponds to
>>> the first or second time that A was set. (That doesn't matter for
>>> sflow.) That's a corner case; I don't know if it's important. And,
>>> of course, the IP checksum only probabilistically tells us whether
>>> there was a change.
>>
>> This may be stating the obvious, but it's important that you never
>> send out a sample with the *wrong* egress port/vlan. That could
>> break all kinds of things (topology discovery, end-host location,
>> policy violations etc.)
>
> Yes, I understand.
>
>> Perhaps we could know more about what you are trying to achieve by
>> changing this? Every suggestion seems to involve more complexity,
>> more overhead or less accuracy. What's the upside?
>
> We're trying to minimize the amount of code and complexity in the
> kernel.
In the datapath directory I can see a total of about 30 lines of code that is related to sampling. The code to include the actions in the upcall seems to be just this:
upcall.actions = acts->actions;
upcall.actions_len = acts->actions_len;
how many lines of user-space code are you willing to write to reduce this? I must be missing something. What am I missing?
Neil
More information about the dev
mailing list