[ovs-dev] sFlow extension for tunnels / MPLS - question about user-space flow-cache

Neil McKee neil.mckee at inmon.com
Fri Apr 24 04:06:03 UTC 2015


I looked into what it would take to have the kernel pass up the actions
when it samples a packet.   The changes seemed straightforward (see
attached diff) and worked well.   I prefer this approach because:

(1) These really are the actions that the kernel is applying to the sampled
packet.  No assumptions. No circuitous lookup. No race conditions as flows
are added and removed.  It's straight from the horse's mouth.
(2)  This avoids burdening any of the complex userspace data-structures
with new requirements.  This way the revalidator is not under any pressure
to have the compiled actions at his fingertips for any packet that might be
sampled.

Thoughts?

Neil


------
Neil McKee
InMon Corp.
http://www.inmon.com

On Mon, Apr 6, 2015 at 3:23 PM, Jesse Gross <jesse at nicira.com> wrote:

> On Sat, Apr 4, 2015 at 8:45 AM, Neil McKee <neil.mckee at inmon.com> wrote:
> >
> > On Fri, Apr 3, 2015 at 3:14 PM, Jesse Gross <jesse at nicira.com> wrote:
> >>
> >> On Wed, Apr 1, 2015 at 10:14 PM, Neil McKee <neil.mckee at inmon.com>
> wrote:
> >> > I've been looking at filling in the sFlow structures to report on
> tunnel
> >> > encap and other transformations.   sFlow sampling is best done on
> >> > ingress
> >> > only,  so I can't use the egress-sampling action that the IPFIX
> >> > implementation uses to get the tunnel info.   So how should I look up
> >> > the
> >> > list of actions for a flow when an sFlow sample appears in
> >> > ofproto-dpif-upcall.c:process_upcall()?  There is not enough room for
> >> > all
> >> > the fields we need in the (8-byte) userdata-cookie that goes with the
> >> > compiled actions into the kernel and is passed back on the upcall.
>  But
> >> > it
> >> > should be possible to look up the whole list of actions in user-space
> >> > when
> >> > we process the upcall.   Here are some possible ways.   Please
> comment:
> >>
> >> If you're trying to get the outer tunnel information for both ingress
> >> and where a packet will eventually egress, I don't think that having
> >> the list of kernel actions is sufficient. It can tell you some things,
> >> like the tunnel type and destination IP address but not UDP source
> >> port, for example.
> >
> >
> > The UDP source port of the tunnel is not so important for sFlow
> monitoring.
> > The usage-model for sFlow is to run it on all ports of all switches,  so
> if
> > you need to know those details you can always pick them up at the next
> > switch where the encapsulated packets will be sampled at ingress.
> >
> > sFlow is strict about some things (such as the nature of the
> > packet-count-based random sampling or the freshness of the
> pushed-counters)
> > but the annotation structures allow you to encode "unknown" for any
> field,
> > especially if getting that field is more trouble than it's worth.  I
> think
> > the UDP source port of a tunnel is a prime example.  It's just a random
> > number.  It doesn't add much to the story.  It's no problem to leave it
> out.
> >
> > Even if we can only supply the outer ip_dst,  outer ip-proto, and VNI
> (where
> > applicable),  that's most of the value.
> >
> >>
> >> Can you explain why sampling must be done at the beginning of the
> >> pipeline? Is it just trying to avoid other transforms like vlan tags?
> >
> >
> > While there is no strict rule about ingress v. egress sampling in the
> > standard,  the practical considerations are compelling.  The most
> obvious is
> > that ingress-sampling is your one chance to see and count the packet as
> it
> > came from the host,  before it gets mangled in various ways and sent on,
> > possibly going out via multiple interfaces and tunnels.
> >
> > A less obvious consideration is the value of uniformity from hop to hop
> > through the fabric.  I know of only one legacy product family that used
> > egress-sampling (by necessity given the ASIC design - it was egress or
> > nothing).  Every other implementation has ingress-sampling.   When all
> the
> > devices have ingress sampling then the solution tessellates well and you
> end
> > up with good visibility in both directions on every link in the network.
> >
> > Parsing the actions to extract tunnel info is working well for me in
> > prototype.  I can report egress tunnels whether they were tunnel-SET or
> > tunnel-PUSH operations,  and  I can populate some of the sFlow-MPLS
> > structures too.  There are even some sFlow v4/v6 NAT structures that
> might
> > be worth exporting if it looks like actions are rewriting addresses and
> > ports.  Hard to predict all the crazy things OVS will be used for,  so
> it's
> > nice to have a mechanism that is future-proofed.
> >
> > Just need a reliable way to get the actions.
>
> If we have the actions in userspace and it works well enough to get
> them from there, then that seems like the preferred way to go.
> However, at this point, I don't see a huge problem with having the
> kernel pass them up either.
>
-------------- next part --------------
diff --git a/datapath/actions.c b/datapath/actions.c
index 5a1dbe2..ce59ef9 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -620,7 +620,8 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
 }
 
 static int output_userspace(struct datapath *dp, struct sk_buff *skb,
-			    struct sw_flow_key *key, const struct nlattr *attr)
+			    struct sw_flow_key *key, const struct nlattr *attr,
+			    const struct nlattr *actions, int actions_len)
 {
 	struct ovs_tunnel_info info;
 	struct dp_upcall_info upcall;
@@ -631,6 +632,8 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 	upcall.userdata = NULL;
 	upcall.portid = 0;
 	upcall.egress_tun_info = NULL;
+	upcall.actions = actions;
+	upcall.actions_len = actions_len;
 
 	for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
 		 a = nla_next(a, &rem)) {
@@ -666,7 +669,8 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 }
 
 static int sample(struct datapath *dp, struct sk_buff *skb,
-		  struct sw_flow_key *key, const struct nlattr *attr)
+		  struct sw_flow_key *key, const struct nlattr *attr,
+		  const struct nlattr *actions, int actions_len)
 {
 	const struct nlattr *acts_list = NULL;
 	const struct nlattr *a;
@@ -700,7 +704,7 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
 	 */
 	if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE &&
 		   nla_is_last(a, rem)))
-		return output_userspace(dp, skb, key, a);
+		return output_userspace(dp, skb, key, a, actions, actions_len);
 
 	skb = skb_clone(skb, GFP_ATOMIC);
 	if (!skb)
@@ -859,7 +863,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			break;
 
 		case OVS_ACTION_ATTR_USERSPACE:
-			output_userspace(dp, skb, key, a);
+			output_userspace(dp, skb, key, a, NULL, 0);
 			break;
 
 		case OVS_ACTION_ATTR_HASH:
@@ -900,7 +904,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-			err = sample(dp, skb, key, a);
+			err = sample(dp, skb, key, a, attr, len);
 			break;
 		}
 
diff --git a/datapath/datapath.c b/datapath/datapath.c
index 3607170..791fc4d 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -398,6 +398,10 @@ static size_t upcall_msg_size(const struct dp_upcall_info *upcall_info,
 	if (upcall_info->egress_tun_info)
 		size += nla_total_size(ovs_tun_key_attr_size());
 
+	/* OVS_PACKET_ATTR_ACTIONS */
+	if(upcall_info->actions_len) {
+		size += nla_total_size(upcall_info->actions_len);
+	}
 	return size;
 }
 
@@ -485,6 +489,19 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 		nla_nest_end(user_skb, nla);
 	}
 
+	if(upcall_info->actions_len) {
+		nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_ACTIONS);
+		err = ovs_nla_put_actions(upcall_info->actions,
+					  upcall_info->actions_len,
+					  user_skb);
+		if(!err) {
+			nla_nest_end(user_skb, nla);
+		}
+		else {
+			nla_nest_cancel(user_skb, nla);
+		}
+	}
+
 	/* Only reserve room for attribute header, packet data is added
 	 * in skb_zerocopy()
 	 */
diff --git a/datapath/datapath.h b/datapath/datapath.h
index fdf35f0..526ddad 100644
--- a/datapath/datapath.h
+++ b/datapath/datapath.h
@@ -120,6 +120,8 @@ struct ovs_skb_cb {
 struct dp_upcall_info {
 	const struct ovs_tunnel_info *egress_tun_info;
 	const struct nlattr *userdata;
+	const struct nlattr *actions;
+	int actions_len;
 	u32 portid;
 	u8 cmd;
 };


More information about the dev mailing list