[ovs-dev] [RFC] Proposal for enhanced select groups

Simon Horman simon.horman at netronome.com
Wed Aug 27 01:26:14 UTC 2014


On Fri, Aug 22, 2014 at 08:30:08AM -0700, Ben Pfaff wrote:
> On Fri, Aug 22, 2014 at 09:19:41PM +0900, Simon Horman wrote:
> > I have been working with Netronome on examining the possibilities of
> > providing (richer) load balancing facilities in Open vSwitch.
> > 
> > It seems to us that the current select group provides for some load
> > balancing functionality. And that in particular the way that it is
> > implemented in Open vSwitch provides L2 destination load balancing (it
> > hashes on the destination ethernet address). Our ideas so fare are as follows:
> > 
> > 1. Provide a richer and ideally extendible select group in the
> >    form of an OpenFlow extension to groups.
> > 
> >    * Allow the fields used to be selected.
> > 
> >      In the case of a hash this would be the fields that are hashed.
> > 
> >      An implication of this is that the pre-requisites of these
> >      fields would need to be present in the flow's match.
> >      In masking of the fields would be allowed but not
> >      required for fields whose TLVs allow masking.
> > 
> >    * Allow designation of the selection method used.
> > 
> >      For example hash.
> > 
> >    * Allow passing a parameter to the selection method.
> > 
> >      For example an initial value key for hashes.
> 
> There is an outstanding patch on this topic already:
>         http://patchwork.openvswitch.org/patch/5424/

I have no particular objections to that change, though I have
not thought about it deeply. However, I think its more valuable
in the long run to make select groups configurable rather than
tweaking what would be the default setting.

> It sounds reasonable to make select groups configurable.  The way to do
> that would be to implement the OpenFlow 1.5 (draft) proposal to add
> properties to groups and group buckets, which is filed in the ONF JIRA
> as EXT-350.

I'm happy to look at implementing EXT-350 (which I now have access to :)
however it seems to me that while it makes groups more configurable
it does not address the ability to configure the selection method.

At the end of this email I have included a fleshed-out version of the
enhanced select group proposal that we have been discussing at Netronome.

> > 2. Investigate allowing selection of buckets to occur in the datapath. Or
> >    in other words a megaflow with a select group action.  The current
> >    select group implementation seems to be a good candidate for this
> >    investigation.
> 
> Maybe this could be implemented via recirculation without datapath
> changes, in the same way that bonds are implemented.

I think that would allow a megaflow to handle the actions
before the select group. But it seems that the recirculation
action would result in much more fine-grained post-recirculation flows.

What we would like to do is to provide something generally useful
which may be used as appropriate to:

* Reduce flow-setup overhead by using a megaflow to handle
  many flows and in turn provide something that lends itself to offloads.

  Working on a prototype to add current hash-based select group
  behaviour that is present in ovs-vswtichd to the datapath we have
  come to realise there may be situations where the cost of selecting
  the bucket for each packet may outweigh the reduced flow-setup cost:

  In the particular case of hash this may be avoidable by using
  the RSS hash which I believe is pre-calculated. But regardless we
  do see that it may be better to use the current user-space
  approach in some cases.

  But we also think there are very likely cases where performing selection
  in the datapath is a win. And we think that things could be arranged such
  that ovs-vswtichd would only use the datapath select action when it is a
  win.

* Allow use of existing kernel infrastructure to implement selection.
  I am particularly thinking in terms of using IPVS (which I maintain)
  to provide stateful connection-based load balancing by using
  its "schedulers" as a selection method.

  That is not to say that we am trying to propose a solution tailored
  to allowing the use of IPVS. But rather I think its an example of
  how a datapath select action could be useful.

* Allow the possibility of offloading selection beyond the datapath
  and into hardware via hooks in the datapath. For example offloading to
  a Netronome flow processor (I am sure there are other examples).

  Again, we are not trying to propose something that is only
  useful to Netronome. Rather that this is an example of
  how a datapath select action could be useful.

  In relation to hooks for offloading, I plan to start a public discussion
  on that separately.


----------------------------------------------------------------------
Proposal: Proposal for enhanced select groups
Version: 0.0.1


Contents
========

1. Introduction
2. How it Works
3. Experimenter Id
4. Experimenter Messages
5. History


1. Introduction
===============

This text describes a Netronome Extension to OpenFlow 1.4 that allows a
controller to provide more information on the selection method for select
groups.  This proposal is in the form of an enhanced select group type.

This may subsequently be proposed as an extension or update to
the OpenFlow specification.


2. How it works
===============

A new Netronome extension group mod message is defined which provides
compatibility with the group mod message defined in Open Flow 1.4 and
allows extra parameters to be passed by the controller. In particular it
allows controllers to:

* Specify the fields used for bucket selection by the select group.

* Designate the selection method used.

* Provide a non-field parameter to the selection method.


3. Experimenter ID
==================

The Experimenter ID of this extension is:

NMX_VENDOR_ID = 0x00001540


4. Experimenter Messages
========================

The following message subtype defined by this extension.

enum nmx_group_mod_subtype {
	NMXT_GROUP_MOD = 1
}


Modifications to the group table from the controller may be done with a
NMXT_GROUP_MOD message. The behaviour of this is analogous to that of the
OFPT_GROUP_MOD message described in Open Flow 1.4 section 7.3.4.3 Modify
Group Entry Message.

The NMXT_GROUP_MOD is intended to cover all configurations covered by
OFPT_GROUP_MOD and to allow new configurations through the new
selection_method, selection_method_param and fields members of
nmx_group_mod.

struct nmx_group_mod {
	struct ofp_header header;
	ovs_be32 vendor;		/* NMX_VENDOR_ID. */
	ovs_be32 subtype;		/* OFPRAW_NMXT_GROUP_MOD. */

	ovs_be16 command;		/* One of OFPGC_*. */
	uint8_t type;			/* One of OFPGT_*. */
	uint8_t pad;			/* Pad to 64 bits. */
	ovs_be32 group_id;		/* Group identifier. */
	char selection_method[NXM_MAX_SELECTION_METHOD_LEN];
					/* Null-terminated */
	ovs_be64 selection_method_param;/* Non-Field parameter for
					   bucket selection. */
	struct ofp_match fields;	/* Fields used for bucket selection.
					   Variable size. */
	// struct ofp_buckets[0];	/* The length of the bucket array is
					   inferred from the length field in
					   header and fields */
}
OVS_ASSERT(sizeof(struct nmx_group_mod) == 48);


The vendor field is the Experimenter ID (see 3).


The subtype field is NMXT_GROUP_MOD.


The command field must be one of the OFPGC_* values defined in
Open Flow 1.4 section 7.3.4.3 Modify Group Entry Message.


The group type field must be one of the OFPGT_* values defined in
Open Flow 1.4 section 7.3.4.3 Modify Group Entry Message.


The group selection_method is a null-terminated string which if non-zero
length specifies a selection method known to an underlying layer of the
switch. The value of NXM_MAX_SELECTION_METHOD_LEN is 16.

The group selection_method must be zero-length (i.e. the first byte must be
null) if type is not OFPGT_SELECT. It may be zero-length if type is
OFPGT_SELECT to request compatibility with Open Flow 1.4.


The selection_method_param provides a non-field parameter for
the group selection_method. It must be all-zeros unless the
group selection_method is non-zero length.

The selection_method_param may for example be used as an initial value for
the hash of a hash group selection method.


The fields field is an ofp_match structure which includes the fields which
should be used as inputs to bucket selection. ofp_match is described in
Open Flow 1.4 section 7.2.2 Flow Match Structures.

Fields must not be specified unless the group selection_method is non-zero.

The pre-requisites for fields specified must be satisfied in the match for
any flow that uses the group.

Masking is allowed but not required for fields whose TLVs allow masking.

The fields may for example be used as the fields that are hashed
by a hash group selection method.


The buckets field is an array of buckets the structure and schematics of
which is described in Open Flow 1.4 section 7.3.4.3 Modify Group Entry
Message.


5. History
==========

This proposal has been developed independently of any similar work in this
area. No such work is known.




More information about the dev mailing list