[ovs-dev] [PATCH v4 6/6] Classifier: Track address prefixes.
Ben Pfaff
blp at nicira.com
Tue Dec 3 22:58:47 UTC 2013
On Tue, Nov 26, 2013 at 11:24:52AM -0800, Jarno Rajahalme wrote:
> Add a prefix tree (trie) structure for tracking the used address
> space, enabling skipping classifier tables containing longer masks
> than necessary for an address field value in a flow being classified.
> This enables less unwildcarding for datapath flows in parts of the
> address space without host routes.
>
> Trie lookup is interwoven to the staged lookup, so that a trie is
> searched only when the configured trie field field becomes relevant
> for the lookup. The trie lookup results are retained so that each
> trie is checked at most once for each classifier lookup.
>
> This implementation tracks the number of rules at each address prefix
> for the whole classifier. More aggressive table skipping would be
> possible by maintaining lists of tables that have prefixes at the
> lengths encountered on tree traversal, or by maintaining separate
> tries for subsets of rules separated by metadata fields.
>
> Prefix tracking is configured via OVSDB. A new column "fieldspec" is
> added to the database table "Flow_Table". "fieldspec" is a string map
> with only the "prefix" key defined so far (other keys can be added in
> future):
>
> "prefix" A comma separated list of field names for which prefix
> lookup should be used.
>
> As of now, the fields for which prefix lookup can be enabled are:
> - tun_id, tun_src, tun_dst
> - metadata
> - nw_src, nw_dst (or aliases ip_src and ip_dst)
> - ipv6_src, ipv6_dst
>
> There is a maximum number of fields that can be enabled for any one
> flow table. Currently this limit is 3.
>
> Examples:
>
> ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
> --id=@N1 create Flow_Table name=table0
> ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
> --id=@N1 create Flow_Table name=table1
>
> ovs-vsctl set Flow_Table table0 fieldspec:prefix=tun_id
> ovs-vsctl set Flow_Table table1 fieldspec:prefix=ip_dst,ip_src
>
> ovs-vsctl remove Flow_Table table1 fieldspec prefix
>
> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
Here are some preliminary comments. I think I'd like a v5, if you're
OK with that.
Bugs (?)
========
I think the intent is that the 'prefix' member of struct trie_node is
stored "left justified", so that the first bit of the prefix is
shifted into the MSB position of the uint32_t and the LSB position of
the uint32_t is only nonzero if 'nbits' is 32. But... I don't
understand this line of code in trie_equal_bits() and
trie_prefix_equal_bits():
uint32_t diff = node->prefix ^ ntohl(value[ofs / 32]) << ofs % 32;
and similarly in trie_get_prefix():
prefix = ntohl(*pr) << ofs % 32;
because both of these seem to shift by the wrong amount for this
interpretation. If I've got 10 bits of prefix with ofs % 32 == 0, I
need to shift left by 22 bits, not by 0 bits. So... explain?
Style
=====
The comment in trie_lookup() is missing a word or two ("Check that can
skip based...").
I think that the check for !mf in trie_lookup() is unnecessary. (Is
there any way to have a trie without a match field?)
The indentation in trie_init() here made this statement really
confusing at first glance:
/* Initialize subtable's prefix length on this field. */
subtable->trie_plen[trie_idx] = field
? minimask_get_prefix_len(&subtable->mask, field)
: 0;
I see why you did it that way but maybe a larger rewrite would make
the whole loop body easier to understand, e.g.:
uint8_t *plen = &subtable->trie_plen[trie_idx];
*plen = field ? minimask_get_prefix_len(&subtable->mask, field) : 0;
if (*plen) {
struct cls_rule *head;
HMAP_FOR_EACH (head, hmap_node, &subtable->rules) {
struct cls_rule *rule;
FOR_EACH_RULE_IN_LIST (rule, head) {
trie_insert(trie, rule, *plen);
}
}
}
but who knows...
The 'tries' parameter to check_tries() is kind of funny. This
parameter is a function of the cls_subtable, that is, it could be
stored in the cls_subtable at trie creation time and never touched
again. But we recalculate it on every call to find_match_wc(). I
came up with various other ways to do it (e.g. move to struct
cls_subtable), but I'm not sure in the end that it's actually a
valuable optimization worth keeping.
I'm also not sure that the be32ofs member of struct trie_ctx is
valuable and worth keeping. I think that it can be trivially
eliminated:
diff --git a/lib/classifier.c b/lib/classifier.c
index 338872b..2c34cf2 100644
--- a/lib/classifier.c
+++ b/lib/classifier.c
@@ -471,7 +471,6 @@ classifier_remove(struct classifier *cls, struct cls_rule *rule)
struct trie_ctx {
const struct cls_trie *trie;
bool lookup_done; /* Status of the lookup. */
- uint8_t be32ofs; /* U32 offset of the field in question. */
uint8_t match_plen; /* Longest prefix than could possibly match. */
uint8_t maskbits; /* Prefix length needed to avoid false matches. */
};
@@ -480,7 +479,6 @@ static void
trie_ctx_init(struct trie_ctx *ctx, const struct cls_trie *trie)
{
ctx->trie = trie;
- ctx->be32ofs = trie->field->flow_be32ofs;
ctx->lookup_done = false;
}
@@ -989,10 +987,11 @@ check_tries(const struct flow *flow, struct trie_ctx trie_ctx[CLS_MAX_TRIES],
* needed to avoid folding in additional bits to the wildcards mask. */
for (j = tries->start; j < tries->end; j++) {
struct trie_ctx *ctx = &trie_ctx[j];
+ int be32ofs = ctx->trie->field->flow_be32ofs;
/* Is the trie field relevant for this subtable and range of fields? */
if (field_plen[j] &&
- ctx->be32ofs >= ofs->start && ctx->be32ofs < ofs->end) {
+ be32ofs >= ofs->start && be32ofs < ofs->end) {
/* On-demand trie lookup. */
if (!ctx->lookup_done) {
ctx->match_plen = trie_lookup(ctx->trie, flow, &ctx->maskbits);
@@ -1010,14 +1009,14 @@ check_tries(const struct flow *flow, struct trie_ctx trie_ctx[CLS_MAX_TRIES],
* relevant for the current flow. */
/* Can skip if the field is already unwildcarded. */
- if (mask_prefix_bits_set(wc, ctx->be32ofs, ctx->maskbits)) {
+ if (mask_prefix_bits_set(wc, be32ofs, ctx->maskbits)) {
return true;
}
/* Check that the trie result will not unwildcard more bits
* than this stage will. */
if (ctx->maskbits <= field_plen[j]) {
/* Unwildcard the bits and skip the rest. */
- mask_set_prefix_bits(wc, ctx->be32ofs, ctx->maskbits);
+ mask_set_prefix_bits(wc, be32ofs, ctx->maskbits);
/* Note: Prerequisite already unwildcarded, as the only
* prerequisite of the supported trie lookup fields is
* the ethertype, which is currently always unwildcarded.
In the trie_lookup_value() prototype, I'd appreciate it if 'value'
were declared as "const ovs_be32 value[]". Even though that's
semantically the same as "const ovs_be32 *value", it makes it clear
why it's a pointer instead of a single ovs_be32.
Documentation
=============
I wrote some comments for classifier.h:
diff --git a/lib/classifier.h b/lib/classifier.h
index f5fc323..032c573 100644
--- a/lib/classifier.h
+++ b/lib/classifier.h
@@ -24,15 +24,79 @@
* =====
*
* A flow classifier holds any number of "rules", each of which specifies
- * values to match for some fields or subfields and a priority. The primary
- * design goal for the classifier is that, given a packet, it can as quickly as
- * possible find the highest-priority rule that matches the packet.
- *
- * Each OpenFlow table is implemented as a flow classifier.
- *
- *
- * Basic Design
- * ============
+ * values to match for some fields or subfields and a priority. Each OpenFlow
+ * table is implemented as a flow classifier.
+ *
+ * The classifier has two primary design goals. The first is obvious: given a
+ * set of packet headers, as quickly as possible find the highest-priority rule
+ * that matches those headers. The following section describes the second
+ * goal.
+ *
+ *
+ * "Un-wildcarding"
+ * ================
+ *
+ * A primary goal of the flow classifier is to produce, as a side effect of a
+ * packet lookup, a wildcard mask that indicates which bits of the packet
+ * headers were essential to the classification result. Ideally, a 1-bit in
+ * any position of this mask means that, if the corresponding bit in the packet
+ * header were flipped, then the classification result might change. A 0-bit
+ * means that changing the packet header bit would have no effect. Thus, the
+ * wildcarded bits are the ones that played no role in the classification
+ * decision.
+ *
+ * Such a wildcard mask is useful with datapaths that support installing flows
+ * that wildcard fields or subfields. If an OpenFlow lookup for a TCP flow
+ * does not actually look at the TCP source or destination ports, for example,
+ * then the switch may install into the datapath a flow that wildcards the port
+ * numbers, which in turn allows the datapath to handle packets that arrive for
+ * other TCP source or destination ports without additional help from
+ * ovs-vswitchd. This is useful for the Open vSwitch software and,
+ * potentially, for ASIC-based switches as well.
+ *
+ * Some properties of the wildcard mask:
+ *
+ * - "False 1-bits" are acceptable, that is, setting a bit in the wildcard
+ * mask to a 1 will never cause a packet to be forwarded the wrong way.
+ * As a corollary, a wildcard mask composed of all 1-bits will always
+ * yield correct (but often needlessly inefficient) behavior.
+ *
+ * - "False 0-bits" can cause problems, so they must be avoided. In the
+ * extreme case, a mask of all 0-bits is only correct if the classifier
+ * contains only a single flow that matches all packets.
+ *
+ * - 0-bits are desirable because they allow the datapath to act more
+ * autonomously, relying on ovs-vswitchd to process fewer flow setups,
+ * thereby improving performance.
+ *
+ * - We don't know a good way to generate wildcard masks with the maximum
+ * (correct) number of 0-bits. We use various approximations, described
+ * in later sections.
+ *
+ * - Wildcard masks for lookups in a given classifier yield a
+ * non-overlapping set of rules. More specifically:
+ *
+ * Consider an classifier C1 filled with an arbitrary collection of rules
+ * and an empty classifier C2. Now take a set of packet headers H and
+ * look it up in C1, yielding a highest-priority matching rule R1 and
+ * wildcard mask M. Form a new classifier rule R2 out of packet headers
+ * H and mask M, and add R2 to C2 with a fixed priority. If one were to
+ * do this for every possible set of packet headers H, then this
+ * process would not attempt to add any overlapping rules to C2, that is,
+ * any packet lookup using the rules generated by this process matches at
+ * most one rule in C2.
+ *
+ * During the lookup process, the classifier starts out with a wildcard mask
+ * that is all 0-bits, that is, fully wildcarded. As lookup proceeds, each
+ * step tends to add constraints to the wildcard mask, that is, change
+ * wildcarded 0-bits into exact-match 1-bits. We call this "un-wildcarding".
+ * A lookup step that examines a particular field must unwildcard that field.
+ * In general, un-wildcarding is necessary for correctness but undesirable for
+ * performance.
+ *
+ *
+ * Basic Classifier Design
+ * =======================
*
* Suppose that all the rules in a classifier had the same form. For example,
* suppose that they all matched on the source and destination Ethernet address
@@ -51,8 +115,10 @@
* cls_subtable" in the classifier and tracks the highest-priority match that
* it finds. The subtables are kept in a descending priority order according
* to the highest priority rule in each subtable, which allows lookup to skip
- * over subtables that can't possibly have a higher-priority match than
- * already found.
+ * over subtables that can't possibly have a higher-priority match than already
+ * found. Eliminating lookups through priority ordering aids both classifier
+ * primary design goals: skipping lookups saves time and avoids un-wildcarding
+ * fields that those lookups would have examined.
*
* One detail: a classifier can contain multiple rules that are identical other
* than their priority. When this happens, only the highest priority rule out
@@ -61,24 +127,28 @@
* list inside that highest-priority rule.
*
*
- * Staged Lookup
- * =============
+ * Staged Lookup (Wildcard Optimization)
+ * =====================================
*
* Subtable lookup is performed in ranges defined for struct flow, starting
* from metadata (registers, in_port, etc.), then L2 header, L3, and finally
* L4 ports. Whenever it is found that there are no matches in the current
- * subtable, the rest of the subtable can be skipped. The rationale of this
- * logic is that as many fields as possible can remain wildcarded.
+ * subtable, the rest of the subtable can be skipped.
+ *
+ * Staged lookup does not reduce lookup time, and it may increase it, because
+ * it changes a single hash table lookup into multiple hash table lookup. It
+ * reduces un-wildcarding significantly in important use cases.
*
*
- * Prefix Tracking
- * ===============
+ * Prefix Tracking (Wildcard Optimization)
+ * =======================================
*
* Classifier uses prefix trees ("tries") for tracking the used
* address space, enabling skipping classifier tables containing
- * longer masks than necessary for the given address. This enables
- * less unwildcarding for datapath flows in parts of the address space
- * without host routes.
+ * longer masks than necessary for the given address. This reduces
+ * unwildcarding for datapath flows in parts of the address space
+ * without host routes, but consulting extra data structures (the
+ * tries) may slightly increase lookup time.
*
* Trie lookup is interwoven with staged lookup, so that a trie is
* searched only when the configured trie field becomes relevant for
@@ -100,8 +170,8 @@
* flow table. Currently this limit is 3.
*
*
- * Partitioning
- * ============
+ * Partitioning (Lookup Time and Wildcard Optimization)
+ * ====================================================
*
* Suppose that a given classifier is being used to handle multiple stages in a
* pipeline using "resubmit", with metadata (that is, the OpenFlow 1.1+ field
@@ -133,6 +203,9 @@
* any cls_subtable that doesn't match on metadata. We handle that by giving
* any such cls_subtable TAG_ALL as its 'tags' so that it matches any tag.)
*
+ * Partitioning saves lookup time by reducing the number of subtable lookups.
+ * Each eliminated subtable lookup also reduces the amount of un-wildcarding.
+ *
*
* Thread-safety
* =============
More information about the dev
mailing list