[ovs-dev] [PATCH] Simplify kernel sFlow implementation

Ben Pfaff blp at nicira.com
Thu Aug 11 20:58:47 UTC 2011


The 64-bit packed field is just an intermediate step.  I agree that it
is somewhat ugly.  It will be replaced by a unique ID of one kind or
another (an incrementing counter? random number?) either when it gets
to be too small, or when we decide to fix up the stats race window I
described earlier.

On Thu, Aug 11, 2011 at 01:52:12PM -0700, Neil McKee wrote:
> OK.  Makes sense.
> 
> It might be OK from an sFlow perspective if the user-space lookup
> occasionally fails.  Just as long as it never retrieves the wrong
> action-list.  That would be a problem.
> 
> If you can look up the actions in user-space, do you still need to
> generate that 64-bit packed-field?  It seems like it would be better
> if you can avoid doing that.
> 
> Neil
> 
> 
> On Aug 11, 2011, at 11:39 AM, Ben Pfaff wrote:
> 
> > Neil, don't worry!  This really is a step forward and not a step back.
> > The actions passed to the kernel and back to userspace are not really
> > as helpful as you think, because they discard a lot of semantic
> > information.  These actions do say *what* happened but they do not say
> > *why*.  One off-the-cuff example of the difference is that, if packets
> > in some stream or streams are being mirrored ("spanned") to a
> > monitoring port separate from sFlow, then the kernel actions will just
> > show two outputs and we'll just report to sFlow that there were two,
> > whereas it might be a better idea to ignore the mirror output and
> > report the "primary" output port since that will presumably be more
> > useful for analysis at the sflow collector.
> > 
> > All that we were calculating from the actions before were the egress
> > ports, VLAN IDs, and egress priority.  That's only 48 bits total.  The
> > rest we can (and already do) recover from the flow data or the packet,
> > which will still be available.
> > 
> > When (if?) we do need more than 64 bits, the way forward is more work,
> > but perfectly doable:
> > 
> > 	1. Instead of using the 64-bit data to store the sFlow
> > 	   information itself, switch to using it to store a unique
> > 	   ID.  (The unique ID could even be kernel-generated.)  Track
> > 	   this along with the userspace data associated with each
> > 	   flow.
> > 
> > 	2. When an sflow message arrives from the kernel, look up the
> > 	   userspace data for it based on the unique ID and use it to
> > 	   formulate the sflow message.
> > 
> > 	3. When a flow changes or gets deleted, keep a copy of the old
> > 	   version around in userspace as long as there might still be
> > 	   a sflow message coming along for it from the kernel.
> > 
> > Only step #3 is any trouble at all.  We might need to do this kind of
> > thing anyway, for other reasons, if we want to have perfectly accurate
> > flow stats in userspace.  That's because, currently, there is a (very
> > brief) window of time, after the kernel deletes a flow from its flow
> > table and sends the final statistics to userspace, in which packets
> > can still arrive and be processed according to the flow's previous set
> > of actions.  The same construct as #3 above is one possible solution.
> > 
> > On Thu, Aug 11, 2011 at 11:09:36AM -0700, Neil McKee wrote:
> >> I think the current behavior where the whole actions-list is serialized and passed up with the sample is actually really elegant and future-proof.   New actions are being added all the time,  and having access to a sampled-stream of "exactly-what-is-happening-in-there" is always going to be good to have.  Not just for sFlow reporting,  but for any analysis or optimization that you might want to apply.   There is a symmetry to this solution:  what you push down to the kernel is a list of actions,  and what you get back on the sampling feed is a packet and the list of actions that were applied to it.  What's not to like about that?
> >> 
> >> Furthermore,  whereas your current sFlow implementation may only be reporting the ingress and egress ports, VLAN Ids,  and L2 priorities, plus the number of outbound interfaces (can you get all that into 64-bits?),  that doesn't mean you will never want to report more than that.   Perhaps there is already something else that should be exported with the sFlow feed?   If so,  a new sFlow structure can be designed and it can go out with the sampled header as another annotation.  It's open-ended,  and adding new structures here will never require any further change to the kernel module.
> >> 
> >> In particular,  if an open-flow controller is receiving this sFlow feed so it can make network-wide optimization decisions in real time,  then it seems likely that you would want to export details about all the actions.   Restricting the output might result in awkward blind-spots, which could make these optimization algorithms exponentially harder to write.   The HPLabs article referenced in this blog post is a good example of the sort of network-wide optimization I am referring to:
> >> http://blog.sflow.com/2011/05/openflow-and-sflow.html
> >> 
> >> And a first draft to define new structures for reporting OpenFlow-related data over sFlow is here:
> >> http://www.sflow.org/draft-sflow-openflow.txt
> >> but given the rapid pace of OpenFlow development,  this probably needs to be reworked.
> >> 
> >> Neil
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On Aug 9, 2011, at 9:42 PM, Ben Pfaff wrote:
> >> 
> >>> On Tue, Aug 09, 2011 at 10:46:38AM -0700, pravin shelar wrote:
> >>>> This patch is first step toward generic/simple sampling
> >>>> action which will be completely independent of sflow.
> >>>> Following patch cleanup sampling upcall by striping it down to
> >>>> minimum essential data that userspace can not figure out for
> >>>> itself.
> >>>> 
> >>>> Signed-off by: Pravin Shelar <pshelar at nicira.com>
> >>> 
> >>> (That's "-by", not " by".)
> >>> 
> >>> I like the general approach here--the sflow support is a big wart--but I
> >>> think that the userspace interface changes represent a regression.
> >>> Before, by passing the actions down to userspace, there was always a
> >>> consistent view of those actions.  With this change (as comments in the
> >>> code make clear) the actions might have changed or the facet might even
> >>> have been deleted.
> >>> 
> >>> A really good fix would probably require being able to somehow keep
> >>> old copies of flows that have changed around for a while in userspace,
> >>> link them to the sflow messages, and free them when they are known not
> >>> to be needed anymore.  I think that's probably feasible, but in fact I
> >>> think that we can avoid regression with something simpler.
> >>> 
> >>> Suppose we keep the "sample pool" netlink option that this patch
> >>> removes, and add a way for userspace to set a small amount of user data
> >>> on a flow, and pass that along as the "user data" netlink option.  64
> >>> bits of user data should be plenty of space to store what the userspace
> >>> sflow implementation actually needs (16 bits of final VLAN TCI plus 32
> >>> bits for sFlow's idea of output port).  Plus, those same 64 bits would
> >>> be more than enough for the sequence number that the "really good fix"
> >>> mentioned in the previous paragraph would probably need.
> >>> 
> >>> With the current patch, the obvious place to put this user data would be
> >>> in the flow itself.  Another approach (that you have probably already
> >>> considered) is to make sFlow sampling a flow action, in which case the
> >>> user data could be that action's argument.
> >>> 
> >>> Thanks,
> >>> 
> >>> Ben.
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev at openvswitch.org
> >>> http://openvswitch.org/mailman/listinfo/dev
> >> 
> 



More information about the dev mailing list