[ovs-dev] [PATCH for comment only] Allow datapath to pass packets back to the kernel for non-OVS handling

Chris Luke chrisy at flirble.org
Mon Dec 23 05:13:31 UTC 2013


Open vSwitch handles the OPFF_NORMAL action by passing packets
into a simple layer 2 learning switch. This commit adds the option to have
packets passed back to the kernel as though Open vSwitch never touched
them. This allows, for instance, bridge member ports to have IP addresses
and for the host to run routing protocols on those ports.

Extend the datapath on kernels since 2.6.39 to pass it back by
providing a special output port number ('OVSP_NORMAL') which causes
the hook to return RX_HANDLER_PASS. Alter ovs-dpctl to be able to parse
and display these flows.

Add a flag to the userspace bridge so that it is in either OVS mode
or kernel mode. This flag is inspected when a packet action is
OPFF_NORMAL and, if true, the datapath is instructed to use the
port 'OVSP_NORMAL'.

Extend the OVSDB schema to store this flag and extend ovs-vsctl
to manipulate it:
     ovs-vsctl set-port-normal-mode br0 kernel
vs.
     ovs-vsctl set-port-normal-mode br0 ovs

---
This patch is *not* finished! There's at least a couple of areas it changes
in ugly ways, but I wanted to get it out there for comment. I also have at
least one man page to update, too.

My rationale here is to provide an OpenFlow switch that acts more like 
a router than an Ethernet switch; my primary interest area right now is in
how to incrementally support OpenFlow on a backbone IP network and
I want to model that with OVS. It also relates directly to a 'behaviour I
expect' question in the FAQ.

The code works (for me, anyway), but it has drawbacks. Not least:

-   To be able to have the kernel hook work when we tell it RX_HANDLER_PASS
    the skb has to be cloned on the way in since OVS alters it. This has a
    performance penalty.

    I'm hoping I can unmangle it - assuming the mangling is slight. Otherwise,
    somehow bubbling up whether OVSP_NORMAL is in use to determine
    the clone method.

-   I couldn't see a clean way to have ofproto/ofproto-dpif-xlate.c see the
    new flag 'port_normal_mode' in struct ofproto, so I provided a way to
    expose the 'up' member of ofproto-dpif. I'm hoping I have missed a better
    way to do this.

-   Upcalling the first packet in a flow back to the kernel seems to work, but
    I'm not sure why and it's making me uncomfortable - does it reinsert it
    into the kernel input queue? This would be ideal; even though it means
    the hook sees the packet twice, on the second pass it would be returned
    with RX_HANDLER_PASS and thus carry on as intended.

And it's not thoroughly tested yet either. :)

---
FAQ                          |   27 ++++++++++++------
  datapath/actions.c           |   38 +++++++++++++++++-------
  datapath/datapath.c          |   12 +++++---
  datapath/datapath.h          |    4 +--
  datapath/flow_netlink.c      |    4 ++-
  datapath/vport-netdev.c      |   21 ++++++++++----
  datapath/vport.c             |    4 +--
  datapath/vport.h             |    2 +-
  include/linux/openvswitch.h  |    3 ++
  lib/odp-util.c               |   21 ++++++++++++--
  lib/odp-util.h               |    5 ++--
  ofproto/ofproto-dpif-xlate.c |   26 +++++++++++++++++
  ofproto/ofproto-dpif-xlate.h |    5 ++++
  ofproto/ofproto-dpif.c       |   10 +++++++
  ofproto/ofproto-dpif.h       |    2 ++
  ofproto/ofproto-provider.h   |    1 +
  ofproto/ofproto.c            |    9 ++++++
  ofproto/ofproto.h            |    7 +++++
  utilities/ovs-vsctl.c        |   65 +++++++++++++++++++++++++++++++++++++++++-
  vswitchd/bridge.c            |    8 ++++++
  vswitchd/vswitch.ovsschema   |   11 +++++--
  vswitchd/vswitch.xml         |   10 +++++++
  22 files changed, 251 insertions(+), 44 deletions(-)


diff --git a/FAQ b/FAQ
index 1edcd94..7d7075b 100644
--- a/FAQ
+++ b/FAQ
@@ -509,9 +509,9 @@ Q: I created a bridge and added my Ethernet port to it, using commands
     and as soon as I ran the "add-port" command I lost all connectivity
     through eth0.  Help!
  
-A: A physical Ethernet device that is part of an Open vSwitch bridge
-   should not have an IP address.  If one does, then that IP address
-   will not be fully functional.
+A: In the default Open vSwitch model, a physical Ethernet device that
+   is part of an Open vSwitch bridge should not have an IP address.
+   If one does, then that IP address will not be fully functional.
  
     You can restore functionality by moving the IP address to an Open
     vSwitch "internal" device, such as the network device named after
@@ -534,12 +534,21 @@ A: A physical Ethernet device that is part of an Open vSwitch bridge
     (e.g. br0).  You might still need to manually clear the IP address
     from the physical interface (e.g. with "ifconfig eth0 0.0.0.0").
  
-   There is no compelling reason why Open vSwitch must work this way.
-   However, this is the way that the Linux kernel bridge module has
-   always worked, so it's a model that those accustomed to Linux
-   bridging are already used to.  Also, the model that most people
-   expect is not implementable without kernel changes on all the
-   versions of Linux that Open vSwitch supports.
+   This was the way that the Linux kernel bridge module always worked,
+   so it's a model that those accustomed to Linux bridging are already
+   used to.
+
+   Alternatively, Linux kernels since 2.6.39 allow packets that were
+   delivered to a module to be passed back to the kernel as though the
+   datapath never saw it; Open vSwitch allows you to configure a bridge
+   to make use of this by setting the mode used for the OFPP_NORMAL
+   port:
+
+       ovs-vsctl set-port-normal-mode br0 kernel
+
+   You would then install a flow with action=normal and Open vSwitch
+   will, instead of using its learning switch, simply pass the matching
+   packets back to the kernel for normal Linux processing.
  
     By the way, this issue is not specific to physical Ethernet
     devices.  It applies to all network devices except Open vswitch
diff --git a/datapath/actions.c b/datapath/actions.c
index 30ea1d2..8fdc5ea 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -39,7 +39,8 @@
  #include "vport.h"
  
  static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
-			      const struct nlattr *attr, int len, bool keep_skb);
+			      const struct nlattr *attr, int len, bool keep_skb,
+			      int *normal_action);
  
  static int make_writable(struct sk_buff *skb, int write_len)
  {
@@ -457,7 +458,7 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
  	}
  
  	return do_execute_actions(dp, skb, nla_data(acts_list),
-				  nla_len(acts_list), true);
+				  nla_len(acts_list), true, NULL);
  }
  
  static int execute_set_action(struct sk_buff *skb,
@@ -508,13 +509,14 @@ static int execute_set_action(struct sk_buff *skb,
  
  /* Execute a list of actions against 'skb'. */
  static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
-			const struct nlattr *attr, int len, bool keep_skb)
+			const struct nlattr *attr, int len, bool keep_skb,
+			int *normal_action)
  {
  	/* Every output action needs a separate clone of 'skb', but the common
  	 * case is just a single output action, so that doing a clone and
  	 * then freeing the original skbuff is wasteful.  So the following code
  	 * is slightly obscure just to avoid that. */
-	int prev_port = -1;
+	u32 prev_port = OVSP_NONE;
  	const struct nlattr *a;
  	int rem;
  
@@ -522,14 +524,27 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
  	     a = nla_next(a, &rem)) {
  		int err = 0;
  
-		if (prev_port != -1) {
-			do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
-			prev_port = -1;
+		if (prev_port != OVSP_NONE) {
+			do_output(dp, skb_clone(skb, GFP_ATOMIC), (int)prev_port);
+			prev_port = OVSP_NONE;
  		}
  
  		switch (nla_type(a)) {
  		case OVS_ACTION_ATTR_OUTPUT:
  			prev_port = nla_get_u32(a);
+			if(unlikely(prev_port > DP_MAX_PORTS)) {
+				switch(prev_port) {
+				case OVSP_NORMAL:
+					if (likely(normal_action != NULL))
+						*normal_action = true;
+					break;
+
+				default:
+					err = -EINVAL;
+					break;
+				}
+				prev_port = OVSP_NONE;
+			}
  			break;
  
  		case OVS_ACTION_ATTR_USERSPACE:
@@ -561,11 +576,11 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
  		}
  	}
  
-	if (prev_port != -1) {
+	if (prev_port != OVSP_NONE) {
  		if (keep_skb)
  			skb = skb_clone(skb, GFP_ATOMIC);
  
-		do_output(dp, skb, prev_port);
+		do_output(dp, skb, (int)prev_port);
  	} else if (!keep_skb)
  		consume_skb(skb);
  
@@ -593,7 +608,7 @@ static int loop_suppress(struct datapath *dp, struct sw_flow_actions *actions)
  }
  
  /* Execute a list of actions against 'skb'. */
-int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
+int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, int *normal_action)
  {
  	struct sw_flow_actions *acts = rcu_dereference(OVS_CB(skb)->flow->sf_acts);
  	struct loop_counter *loop;
@@ -611,7 +626,8 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
  
  	OVS_CB(skb)->tun_key = NULL;
  	error = do_execute_actions(dp, skb, acts->actions,
-					 acts->actions_len, false);
+					 acts->actions_len, false,
+					 normal_action);
  
  	/* Check whether sub-actions looped too much. */
  	if (unlikely(loop->looping))
diff --git a/datapath/datapath.c b/datapath/datapath.c
index b42fd8b..37d5281 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -215,7 +215,7 @@ void ovs_dp_detach_port(struct vport *p)
  }
  
  /* Must be called with rcu_read_lock. */
-void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
+int ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
  {
  	struct datapath *dp = p->dp;
  	struct sw_flow *flow;
@@ -224,6 +224,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
  	u64 *stats_counter;
  	u32 n_mask_hit;
  	int error;
+	int normal_action;
  
  	stats = this_cpu_ptr(dp->stats_percpu);
  
@@ -231,7 +232,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
  	error = ovs_flow_extract(skb, p->port_no, &key);
  	if (unlikely(error)) {
  		kfree_skb(skb);
-		return;
+		return false;
  	}
  
  	/* Look up flow. */
@@ -251,9 +252,10 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
  
  	OVS_CB(skb)->flow = flow;
  	OVS_CB(skb)->pkt_key = &key;
+	normal_action = false;
  
  	ovs_flow_stats_update(OVS_CB(skb)->flow, skb);
-	ovs_execute_actions(dp, skb);
+	ovs_execute_actions(dp, skb, &normal_action);
  	stats_counter = &stats->n_hit;
  
  out:
@@ -262,6 +264,8 @@ out:
  	(*stats_counter)++;
  	stats->n_mask_hit += n_mask_hit;
  	u64_stats_update_end(&stats->sync);
+
+	return normal_action;
  }
  
  static struct genl_family dp_packet_genl_family = {
@@ -552,7 +556,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
  		goto err_unlock;
  
  	local_bh_disable();
-	err = ovs_execute_actions(dp, packet);
+	err = ovs_execute_actions(dp, packet, NULL);
  	local_bh_enable();
  	rcu_read_unlock();
  
diff --git a/datapath/datapath.h b/datapath/datapath.h
index b3ae7cd..019c24a 100644
--- a/datapath/datapath.h
+++ b/datapath/datapath.h
@@ -186,7 +186,7 @@ static inline struct vport *ovs_vport_ovsl(const struct datapath *dp, int port_n
  extern struct notifier_block ovs_dp_device_notifier;
  extern struct genl_multicast_group ovs_dp_vport_multicast_group;
  
-void ovs_dp_process_received_packet(struct vport *, struct sk_buff *);
+int ovs_dp_process_received_packet(struct vport *, struct sk_buff *);
  void ovs_dp_detach_port(struct vport *);
  int ovs_dp_upcall(struct datapath *, struct sk_buff *,
  		  const struct dp_upcall_info *);
@@ -195,7 +195,7 @@ const char *ovs_dp_name(const struct datapath *dp);
  struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 portid, u32 seq,
  					 u8 cmd);
  
-int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb);
+int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, int *normal_action);
  void ovs_dp_notify_wq(struct work_struct *work);
  
  #define OVS_NLERR(fmt, ...) \
diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index 9b26528..d2c2ce0 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -1520,6 +1520,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
  		const struct ovs_action_push_vlan *vlan;
  		int type = nla_type(a);
  		bool skip_copy;
+		u32 port;
  
  		if (type > OVS_ACTION_ATTR_MAX ||
  		    (action_lens[type] != nla_len(a) &&
@@ -1538,7 +1539,8 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
  			break;
  
  		case OVS_ACTION_ATTR_OUTPUT:
-			if (nla_get_u32(a) >= DP_MAX_PORTS)
+			port = nla_get_u32(a);
+			if (port >= DP_MAX_PORTS && port != OVSP_NORMAL)
  				return -EINVAL;
  			break;
  
diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c
index c15923b..f4ae265 100644
--- a/datapath/vport-netdev.c
+++ b/datapath/vport-netdev.c
@@ -34,7 +34,7 @@
  #include "vport-internal_dev.h"
  #include "vport-netdev.h"
  
-static void netdev_port_receive(struct vport *vport, struct sk_buff *skb);
+static int netdev_port_receive(struct vport *vport, struct sk_buff *skb);
  
  #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,39)
  /* Called with rcu_read_lock and bottom-halves disabled. */
@@ -48,7 +48,8 @@ static rx_handler_result_t netdev_frame_hook(struct sk_buff **pskb)
  
  	vport = ovs_netdev_get_vport(skb->dev);
  
-	netdev_port_receive(vport, skb);
+	if(unlikely(netdev_port_receive(vport, skb)))
+		return RX_HANDLER_PASS;
  
  	return RX_HANDLER_CONSUMED;
  }
@@ -190,7 +191,7 @@ const char *ovs_netdev_get_name(const struct vport *vport)
  }
  
  /* Must be called with rcu_read_lock. */
-static void netdev_port_receive(struct vport *vport, struct sk_buff *skb)
+static int netdev_port_receive(struct vport *vport, struct sk_buff *skb)
  {
  	if (unlikely(!vport))
  		goto error;
@@ -198,22 +199,30 @@ static void netdev_port_receive(struct vport *vport, struct sk_buff *skb)
  	if (unlikely(skb_warn_if_lro(skb)))
  		goto error;
  
+#if 0 /* chris test */
  	/* Make our own copy of the packet.  Otherwise we will mangle the
  	 * packet for anyone who came before us (e.g. tcpdump via AF_PACKET).
  	 * (No one comes after us, since we tell handle_bridge() that we took
  	 * the packet.) */
  	skb = skb_share_check(skb, GFP_ATOMIC);
+#else
+	/* Make a clone of the skb. Since we can sometimes return control
+	 * of the packet to the kernel, and we clobber the skb, we need our
+	 * own copy.
+	 */
+	skb = skb_clone(skb, GFP_ATOMIC);
+#endif
  	if (unlikely(!skb))
-		return;
+		return false;
  
  	skb_push(skb, ETH_HLEN);
  	ovs_skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
  
-	ovs_vport_receive(vport, skb, NULL);
-	return;
+	return ovs_vport_receive(vport, skb, NULL);
  
  error:
  	kfree_skb(skb);
+	return false;
  }
  
  static unsigned int packet_length(const struct sk_buff *skb)
diff --git a/datapath/vport.c b/datapath/vport.c
index 7f12acc..8dd4e3d 100644
--- a/datapath/vport.c
+++ b/datapath/vport.c
@@ -359,7 +359,7 @@ int ovs_vport_get_options(const struct vport *vport, struct sk_buff *skb)
   * skb->data should point to the Ethernet header.  The caller must have already
   * called compute_ip_summed() to initialize the checksumming fields.
   */
-void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
+int ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
  		       struct ovs_key_ipv4_tunnel *tun_key)
  {
  	struct pcpu_tstats *stats;
@@ -371,7 +371,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
  	u64_stats_update_end(&stats->syncp);
  
  	OVS_CB(skb)->tun_key = tun_key;
-	ovs_dp_process_received_packet(vport, skb);
+	return ovs_dp_process_received_packet(vport, skb);
  }
  
  /**
diff --git a/datapath/vport.h b/datapath/vport.h
index 2cf2b18..f24e774 100644
--- a/datapath/vport.h
+++ b/datapath/vport.h
@@ -191,7 +191,7 @@ static inline struct vport *vport_from_priv(const void *priv)
  	return (struct vport *)(priv - ALIGN(sizeof(struct vport), VPORT_ALIGN));
  }
  
-void ovs_vport_receive(struct vport *, struct sk_buff *,
+int ovs_vport_receive(struct vport *, struct sk_buff *,
  		       struct ovs_key_ipv4_tunnel *);
  
  /* List of statically compiled vport implementations.  Don't forget to also
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index 5137c2f..179d529 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -139,7 +139,10 @@ struct ovs_vport_stats {
  #define OVS_DP_F_UNALIGNED	(1 << 0)
  
  /* Fixed logical ports. */
+#define OVSP_MAX        (4294967295U)
  #define OVSP_LOCAL      ((__u32)0)
+#define OVSP_NONE       ((__u32)OVSP_MAX)
+#define OVSP_NORMAL     ((__u32)OVSP_MAX-1)
  
  /* Packet transfer. */
  
diff --git a/lib/odp-util.c b/lib/odp-util.c
index f44c7d4..fe12d0b 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -385,8 +385,17 @@ format_odp_action(struct ds *ds, const struct nlattr *a)
      }
  
      switch (type) {
-    case OVS_ACTION_ATTR_OUTPUT:
-        ds_put_format(ds, "%"PRIu32, nl_attr_get_u32(a));
+    case OVS_ACTION_ATTR_OUTPUT: {
+            uint32_t p = nl_attr_get_u32(a);
+            switch(p) {
+            case OVSP_NORMAL:
+                ds_put_format(ds, "output(normal)");
+                break;
+            default:
+                ds_put_format(ds, "%"PRIu32, p);
+                break;
+            }
+        }
          break;
      case OVS_ACTION_ATTR_USERSPACE:
          format_odp_userspace_action(ds, a);
@@ -476,6 +485,14 @@ parse_odp_action(const char *s, const struct simap *port_names,
          }
      }
  
+    {
+        int len = strcspn(s, delimiters);
+        if (strncmp(s, "output(normal)", len) == 0) {
+            nl_msg_put_u32(actions, OVS_ACTION_ATTR_OUTPUT, OVSP_NORMAL);
+            return len;
+        }
+    }
+
      if (port_names) {
          int len = strcspn(s, delimiters);
          struct simap_node *node;
diff --git a/lib/odp-util.h b/lib/odp-util.h
index 821b2c4..79b0327 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -65,8 +65,9 @@ enum slow_path_reason {
  
  const char *slow_path_reason_to_explanation(enum slow_path_reason);
  
-#define ODPP_LOCAL ODP_PORT_C(OVSP_LOCAL)
-#define ODPP_NONE  ODP_PORT_C(UINT32_MAX)
+#define ODPP_LOCAL  ODP_PORT_C(OVSP_LOCAL)
+#define ODPP_NONE   ODP_PORT_C(OVSP_NONE)
+#define ODPP_NORMAL ODP_PORT_C(OVSP_NORMAL)
  
  void format_odp_actions(struct ds *, const struct nlattr *odp_actions,
                          size_t actions_len);
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 848c778..96a3f83 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1425,6 +1425,13 @@ xlate_normal(struct xlate_ctx *ctx)
          return;
      }
  
+    /* Are we using the OVS or the kernel 'normal' mode? */
+    if(ctx->xin->normal_uses_kernel) {
+        ctx->xout->nf_output_iface = NF_OUT_MULTI;
+        compose_output_action(ctx, OFPP_NORMAL);
+        return;
+    }
+
      /* Learn source MAC. */
      if (ctx->xin->may_learn) {
          update_learning_table(ctx->xbridge, flow, wc, vlan, in_xbundle);
@@ -1689,6 +1696,23 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
       * before traversing a patch port. */
      BUILD_ASSERT_DECL(FLOW_WC_SEQ == 23);
  
+    /* If we get this far and the output port is OFPP_NORMAL then
+     * we want to direct the packet back to the kernel. This can
+     * probably be done more gracefully, but for now just send the
+     * message to the datapath.  */
+    if (ofp_port == OFPP_NORMAL) {
+        ctx->xout->slow |= commit_odp_actions(flow, &ctx->base_flow,
+                                              &ctx->xout->odp_actions,
+                                              &ctx->xout->wc,
+                                              &ctx->mpls_depth_delta);
+        nl_msg_put_odp_port(&ctx->xout->odp_actions, OVS_ACTION_ATTR_OUTPUT,
+                           ODPP_NORMAL);
+        ctx->sflow_odp_port = ODPP_NORMAL;
+        ctx->sflow_n_outputs++;
+        ctx->xout->nf_output_iface = ofp_port;
+        return;
+    }
+
      if (!xport) {
          xlate_report(ctx, "Nonexistent output port");
          return;
@@ -2874,6 +2898,7 @@ xlate_in_init(struct xlate_in *xin, struct ofproto_dpif *ofproto,
                const struct flow *flow, struct rule_dpif *rule,
                uint16_t tcp_flags, const struct ofpbuf *packet)
  {
+    struct ofproto *ofproto_ = ofproto_dpif_uncast(ofproto);
      xin->ofproto = ofproto;
      xin->flow = *flow;
      xin->packet = packet;
@@ -2886,6 +2911,7 @@ xlate_in_init(struct xlate_in *xin, struct ofproto_dpif *ofproto,
      xin->report_hook = NULL;
      xin->resubmit_stats = NULL;
      xin->skip_wildcards = false;
+    xin->normal_uses_kernel = (ofproto_->port_normal_mode == OFPROTO_PORT_NORMAL_MODE_KERNEL);
  }
  
  void
diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h
index 68076ca..cd7cd97 100644
--- a/ofproto/ofproto-dpif-xlate.h
+++ b/ofproto/ofproto-dpif-xlate.h
@@ -73,6 +73,11 @@ struct xlate_in {
       * not if we are just revalidating. */
      bool may_learn;
  
+    /* The model OFPP_NORMAL is treated with. If true then we use Kernel
+     * processing. If false then we use the OVS learning switch model.
+     */
+    bool normal_uses_kernel;
+
      /* If the caller of xlate_actions() doesn't need the flow_wildcards
       * contained in struct xlate_out.  'skip_wildcards' can be set to true
       * disabling the expensive wildcard computation.  When true, 'wc' in struct
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index befa9f7..81a06f1 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -314,6 +314,16 @@ ofproto_dpif_cast(const struct ofproto *ofproto)
      return CONTAINER_OF(ofproto, struct ofproto_dpif, up);
  }
  
+/* Extract the pointer to an ofproto from an ofproto-dpif.
+ * (I am probably missing a way for dpif-xlate to read a flag
+ * inside ofproto.)  */
+struct ofproto *
+ofproto_dpif_uncast(struct ofproto_dpif *ofproto)
+{
+    ovs_assert(ofproto->up.ofproto_class == &ofproto_dpif_class);
+    return &(ofproto->up);
+}
+
  static struct ofport_dpif *get_ofp_port(const struct ofproto_dpif *ofproto,
                                          ofp_port_t ofp_port);
  static void ofproto_trace(struct ofproto_dpif *, const struct flow *,
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index 51cb38f..8d0da1b 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -62,6 +62,8 @@ struct OVS_LOCKABLE group_dpif;
   *   Ofproto-dpif-xlate is responsible for translating translating OpenFlow
   *   actions into datapath actions. */
  
+struct ofproto *ofproto_dpif_uncast(struct ofproto_dpif *ofproto);
+
  void rule_dpif_lookup(struct ofproto_dpif *, const struct flow *,
                        struct flow_wildcards *, struct rule_dpif **rule);
  
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index cc318ee..c2c505d 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -71,6 +71,7 @@ struct ofproto {
      uint64_t datapath_id;       /* Datapath ID. */
      bool forward_bpdu;          /* Option to allow forwarding of BPDU frames
                                   * when NORMAL action is invoked. */
+    enum ofproto_port_normal_mode port_normal_mode;       /* OVS or Kernel handles OFPP_NORMAL. */
      char *mfr_desc;             /* Manufacturer (NULL for default)b. */
      char *hw_desc;              /* Hardware (NULL for default). */
      char *sw_desc;              /* Software version (NULL for default). */
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 75461e2..9b61ede 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -705,6 +705,15 @@ ofproto_set_flow_miss_model(unsigned model)
      flow_miss_model = model;
  }
  
+/* Chooses between OVS learning switch or native kernel based
+ * behavior for the OFPP_NORMAL action.
+ */
+void
+ofproto_set_port_normal_mode(struct ofproto *p, enum ofproto_port_normal_mode port_normal_mode)
+{
+    p->port_normal_mode = port_normal_mode;
+}
+
  /* If forward_bpdu is true, the NORMAL action will forward frames with
   * reserved (e.g. STP) destination Ethernet addresses. if forward_bpdu is false,
   * the NORMAL action will drop these frames. */
diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index 3034d32..677c980 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -134,6 +134,12 @@ enum ofproto_band {
      OFPROTO_OUT_OF_BAND         /* Out-of-band connection to controller. */
  };
  
+/* Behavior for OFPP_NORMAL */
+enum ofproto_port_normal_mode {
+    OFPROTO_PORT_NORMAL_MODE_OVS,   /* OVS' learning switch default */
+    OFPROTO_PORT_NORMAL_MODE_KERNEL /* Pass packet back to the kernel */
+};
+
  struct ofproto_controller {
      char *target;               /* e.g. "tcp:127.0.0.1" */
      int max_backoff;            /* Maximum reconnection backoff, in seconds. */
@@ -244,6 +250,7 @@ void ofproto_set_extra_in_band_remotes(struct ofproto *,
  void ofproto_set_in_band_queue(struct ofproto *, int queue_id);
  void ofproto_set_flow_limit(unsigned limit);
  void ofproto_set_flow_miss_model(unsigned model);
+void ofproto_set_port_normal_mode(struct ofproto *, enum ofproto_port_normal_mode port_normal_mode);
  void ofproto_set_forward_bpdu(struct ofproto *, bool forward_bpdu);
  void ofproto_set_mac_table_config(struct ofproto *, unsigned idle_time,
                                    size_t max_entries);
diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index 528b40c..4520238 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -653,6 +653,11 @@ Controller commands:\n\
    del-fail-mode BRIDGE       delete the fail-mode for BRIDGE\n\
    set-fail-mode BRIDGE MODE  set the fail-mode for BRIDGE to MODE\n\
  \n\
+Port normal commands:\n\
+  get-port-normal-mode BRIDGE  print the port-normal-mode for BRIDGE\n\
+  del-port-normal-mode BRIDGE  delete the port-normal-mode for BRIDGE\n\
+  set-port-normal-mode BRIDGE MODE  set the port-normal-mode for BRIDGE to MODE\n\
+\n\
  Manager commands:\n\
    get-manager                print the managers\n\
    del-manager                delete the managers\n\
@@ -1002,6 +1007,7 @@ pre_get_info(struct vsctl_context *ctx)
      ovsdb_idl_add_column(ctx->idl, &ovsrec_port_col_interfaces);
  
      ovsdb_idl_add_column(ctx->idl, &ovsrec_interface_col_name);
+    ovsdb_idl_add_column(ctx->idl, &ovsrec_bridge_col_port_normal_mode);
  }
  
  static void
@@ -1265,7 +1271,7 @@ cmd_init(struct vsctl_context *ctx OVS_UNUSED)
  struct cmd_show_table {
      const struct ovsdb_idl_table_class *table;
      const struct ovsdb_idl_column *name_column;
-    const struct ovsdb_idl_column *columns[3];
+    const struct ovsdb_idl_column *columns[4];
      bool recurse;
  };
  
@@ -1281,6 +1287,7 @@ static struct cmd_show_table cmd_show_tables[] = {
       &ovsrec_bridge_col_name,
       {&ovsrec_bridge_col_controller,
        &ovsrec_bridge_col_fail_mode,
+      &ovsrec_bridge_col_port_normal_mode,
        &ovsrec_bridge_col_ports},
       false},
  
@@ -1463,6 +1470,7 @@ pre_cmd_emer_reset(struct vsctl_context *ctx)
                            &ovsrec_interface_col_ingress_policing_rate);
      ovsdb_idl_add_column(ctx->idl,
                            &ovsrec_interface_col_ingress_policing_burst);
+    ovsdb_idl_add_column(ctx->idl, &ovsrec_bridge_col_port_normal_mode);
  }
  
  static void
@@ -1495,6 +1503,7 @@ cmd_emer_reset(struct vsctl_context *ctx)
          ovsrec_bridge_set_sflow(br, NULL);
          ovsrec_bridge_set_ipfix(br, NULL);
          ovsrec_bridge_set_flood_vlans(br, NULL, 0);
+        ovsrec_bridge_set_port_normal_mode(br, NULL);
  
          /* We only want to save the "hwaddr" key from other_config. */
          hwaddr = smap_get(&br->other_config, "hwaddr");
@@ -2301,6 +2310,55 @@ cmd_set_fail_mode(struct vsctl_context *ctx)
  }
  
  static void
+cmd_get_port_normal_mode(struct vsctl_context *ctx)
+{
+    struct vsctl_bridge *br;
+    const char *port_normal_mode;
+
+    vsctl_context_populate_cache(ctx);
+    br = find_bridge(ctx, ctx->argv[1], true);
+
+    if (br->parent) {
+        br = br->parent;
+    }
+    ovsrec_bridge_verify_port_normal_mode(br->br_cfg);
+
+    port_normal_mode = br->br_cfg->port_normal_mode;
+    if (port_normal_mode && strlen(port_normal_mode)) {
+        ds_put_format(&ctx->output, "%s\n", port_normal_mode);
+    }
+}
+
+static void
+cmd_del_port_normal_mode(struct vsctl_context *ctx)
+{
+    struct vsctl_bridge *br;
+
+    vsctl_context_populate_cache(ctx);
+
+    br = find_real_bridge(ctx, ctx->argv[1], true);
+
+    ovsrec_bridge_set_port_normal_mode(br->br_cfg, NULL);
+}
+
+static void
+cmd_set_port_normal_mode(struct vsctl_context *ctx)
+{
+    struct vsctl_bridge *br;
+    const char *port_normal_mode = ctx->argv[2];
+
+    vsctl_context_populate_cache(ctx);
+
+    br = find_real_bridge(ctx, ctx->argv[1], true);
+
+    if (strcmp(port_normal_mode, "ovs") && strcmp(port_normal_mode, "kernel")) {
+        vsctl_fatal("port-normal-mode must be \"ovs\" or \"kernel\"");
+    }
+
+    ovsrec_bridge_set_port_normal_mode(br->br_cfg, port_normal_mode);
+}
+
+static void
  verify_managers(const struct ovsrec_open_vswitch *ovs)
  {
      size_t i;
@@ -4195,6 +4253,11 @@ static const struct vsctl_command_syntax all_commands[] = {
      {"del-fail-mode", 1, 1, pre_get_info, cmd_del_fail_mode, NULL, "", RW},
      {"set-fail-mode", 2, 2, pre_get_info, cmd_set_fail_mode, NULL, "", RW},
  
+    /* Port normal commands. */
+    {"get-port-normal-mode", 1, 1, pre_get_info, cmd_get_port_normal_mode, NULL, "", RO},
+    {"del-port-normal-mode", 1, 1, pre_get_info, cmd_del_port_normal_mode, NULL, "", RW},
+    {"set-port-normal-mode", 2, 2, pre_get_info, cmd_set_port_normal_mode, NULL, "", RW},
+
      /* Manager commands. */
      {"get-manager", 0, 0, pre_manager, cmd_get_manager, NULL, "", RO},
      {"del-manager", 0, 0, pre_manager, cmd_del_manager, NULL, "", RW},
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 6311ff3..b55ae5d 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2918,6 +2918,7 @@ bridge_configure_remotes(struct bridge *br,
      size_t n_controllers;
  
      enum ofproto_fail_mode fail_mode;
+    enum ofproto_port_normal_mode port_normal_mode;
  
      struct ofproto_controller *ocs;
      size_t n_ocs;
@@ -3013,6 +3014,13 @@ bridge_configure_remotes(struct bridge *br,
                      : OFPROTO_FAIL_SECURE;
      ofproto_set_fail_mode(br->ofproto, fail_mode);
  
+    /* Set the port-normal-mode. */
+    port_normal_mode = br->cfg->port_normal_mode
+                && !strcmp(br->cfg->port_normal_mode, "kernel")
+                    ? OFPROTO_PORT_NORMAL_MODE_KERNEL
+                    : OFPROTO_PORT_NORMAL_MODE_OVS; 
+    ofproto_set_port_normal_mode(br->ofproto, port_normal_mode);
+
      /* Configure OpenFlow controller connection snooping. */
      if (!ofproto_has_snoops(br->ofproto)) {
          struct sset snoops;
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 9eb21ed..1c01068 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
  {"name": "Open_vSwitch",
- "version": "7.4.0",
- "cksum": "951746691 20389",
+ "version": "7.4.1",
+ "cksum": "1796841126 20564",
   "tables": {
     "Open_vSwitch": {
       "columns": {
@@ -107,7 +107,12 @@
                            "maxInteger": 254},
                    "value": {"type": "uuid",
                              "refTable": "Flow_Table"},
-                  "min": 0, "max": "unlimited"}}},
+                  "min": 0, "max": "unlimited"}},
+       "port_normal_mode": {
+         "type": {"key": {"type": "string",
+           "enum": ["set", ["ovs",
+                            "kernel"]]},
+	   "min": 0, "max": 1}}},
       "indexes": [["name"]]},
     "Port": {
       "columns": {
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 5fd82fc..a831135 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -591,6 +591,16 @@
          connection with a controller.  A default value of
          <code>OpenFlow10</code> will be used if this column is empty.
        </column>
+
+      <column name="port_normal_mode">
+        Behavior when handling a flow whose destination action port is
+        OFPP_NORMAL. Default Open vSwitch behavior, or when this value
+        is <code>ovs</code>, is to act like a learning layer 2 switch.
+        When this value is <code>kernel</code> then matching flows are
+        handed back to the kernel as though Open vSwitch never saw them,
+        allowing normal layer 3 processing by the kernel. This latter
+        behavior might be described as Hybrid mode.
+      </column>
      </group>
  
      <group title="Spanning Tree Configuration">





More information about the dev mailing list