[ovs-dev] [PATCH v5 4/4] Classifier: Track address prefixes.

Jarno Rajahalme jrajahalme at nicira.com
Mon Dec 9 23:52:39 UTC 2013


On Dec 7, 2013, at 2:04 PM, Ben Pfaff <blp at nicira.com> wrote:

> On Thu, Dec 05, 2013 at 04:27:26PM -0800, Jarno Rajahalme wrote:
>> Add a prefix tree (trie) structure for tracking the used address
>> space, enabling skipping classifier tables containing longer masks
>> than necessary for an address field value in a packet header being
>> classified.  This enables less unwildcarding for datapath flows in
>> parts of the address space without host routes.
>> 
>> Trie lookup is interwoven to the staged lookup, so that a trie is
>> searched only when the configured trie field becomes relevant
>> for the lookup.  The trie lookup results are retained so that each
>> trie is checked at most once for each classifier lookup.
>> 
>> This implementation tracks the number of rules at each address prefix
>> for the whole classifier.  More aggressive table skipping would be
>> possible by maintaining lists of tables that have prefixes at the
>> lengths encountered on tree traversal, or by maintaining separate
>> tries for subsets of rules separated by metadata fields.
>> 
>> Prefix tracking is configured via OVSDB.  A new column "prefixes" is
>> added to the database table "Flow_Table".  "prefixes" is a set of
>> string values listing the field names for which prefix lookup should
>> be used.
>> 
>> As of now, the fields for which prefix lookup can be enabled are:
>> - tun_id, tun_src, tun_dst
>> - nw_src, nw_dst (or aliases ip_src and ip_dst)
>> - ipv6_src, ipv6_dst
>> 
>> There is a maximum number of fields that can be enabled for any one
>> flow table.  Currently this limit is 3.
>> 
>> Examples:
>> 
>> ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
>> --id=@N1 create Flow_Table name=table0
>> ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
>> --id=@N1 create Flow_Table name=table1
>> 
>> ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src
>> ovs-vsctl set Flow_Table table1 prefixes=[]
>> 
>> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
> 
> If trie_lookup() returns UINT8_MAX, then it does not write anything to
> its *checkbits argument.  After looking a little further, I guess that
> this does not matter because it will never be read in that case.  Still,
> I really like functions whose interfaces always write to output
> arguments, because it makes later behavior of the program more
> predictable in corner cases.
> 

OK

> I am not certain that uint8_t is the best possible type for the width of
> a field.  It is more than enough for the longest field that Open vSwitch
> or OpenFlow has now, and it may be enough for the longest field that
> either one will ever support.  However: NXM and OXM both in theory
> support fields up to 1,016 bits wide.  I'll leave it to your judgment
> whether it's worth worrying about that.
> 

I changed the relevant uint8_t types to unsigned int types, even though having a prefix longer than 255 bits is a rather remote possibility.

> Did you consider making be_get_bit_at() and get_bit_at() return bool?
> 

If they return bool, the caller must use !! to get 1 from the non-zero return value to use it as an index. I’d rather have these return the bit itself.

> Every use of be_get_bit_at() is in an expression of the form
> node->edges[be_get_bit_at(prefix, ofs)].  I don't know whether it would
> be better to write a function to do that job.
> 

I added functions trie_next_node() and trie_next_edge() for this purpose.

> The { should be on a separate line:
>    trie_is_leaf(const struct trie_node *trie) {
> and there should not be a ; after the } on the final line of the same
> function.

Oops :-)

> 
> In mask_set_prefix_bits(), I would write:
>        mask[i] = OVS_BE32_MAX;
> in place of:
>        mask[i] = CONSTANT_HTONL(~0u);
> 

Done.

> I don't normally mark by-value parameters as const, as done on "const
> uint8_t nbits" on mask_set_prefix_bits(), because I regard const on
> parameters as mainly a promise to the caller not to modify an object
> visible to the caller, but this kind of "const" does not make such a
> promise.  Similarly for 'ofs' and 'plen' in trie_prefix_equal_bits() and
> 'prefix' in get_bit_at().
> 

Cleaned up.

> I think that the loop condition "while (node)" in trie_insert() can
> never be false.  Can we equivalently write "for (;;)”?
> 

Right.

> There is a typo in a comment in trie_insert(): "it's" should be "its”.
> 

Fixed, thanks!

> Can we eliminate the special case for the root in trie_insert()?  The
> code appears to be written with this in mind but without following
> through.  Something like this:
> 
> static void
> trie_insert(struct cls_trie *trie, const struct cls_rule *rule, int mlen)
> {
>    const ovs_be32 *prefix = minimatch_get_prefix(&rule->match, trie->field);
>    struct trie_node **edge;
>    struct trie_node *node;
>    int ofs;
> 
>    ofs = 0;
>    for (edge = &trie->root;
>         (node = *edge) != NULL;
>         edge = &node->edges[be_get_bit_at(prefix, ofs)]) {
>        unsigned int eqbits = trie_prefix_equal_bits(node, prefix, ofs, mlen);
>        ofs += eqbits;
>        if (eqbits < node->nbits) {
>            /* Mismatch, new node needs to be inserted above. */
>            int old_branch = get_bit_at(node->prefix, eqbits);
> 
>            /* New parent node. */
>            *edge = trie_branch_create(prefix, ofs - eqbits, eqbits,
>                                       ofs == mlen ? 1 : 0);
> 
>            /* Adjust old node for its new position in the tree. */
>            node->prefix <<= eqbits;
>            node->nbits -= eqbits;
>            (*edge)->edges[old_branch] = node;
> 
>            /* Check if need a new branch for the new rule. */
>            if (ofs < mlen) {
>                (*edge)->edges[!old_branch]
>                    = trie_branch_create(prefix, ofs, mlen - ofs, 1);
>            }
>            return;
>        }
>        /* Full match so far. */
> 
>        if (ofs == mlen) {
>            /* Full match at the current node, rule needs to be added here. */
>            node->n_rules++;
>            return;
>        }
>    }
>    *edge = trie_branch_create(prefix, ofs, mlen - ofs, 1);
> }
> 

This is a nice rearrangement, thank you! I did the corresponding change to trie_remove().

> I probably owe you an apology for fussing with your code too much.  This
> is one way I study tricky code: I tweak it until I'm satisfied I
> understand it.
> 

Not at all, I’m all for better code :-)

> In trie_remove(), should we log an error here?
>            /* Mismatch, nothing to be removed.  This should never happen, as
>             * only rules in the classifier are ever removed. */
> and here?
>    /* Cannot go deeper. This should never happen, since only rules
>     * that actually exist in the classifier are ever removed. */


I had ovs_asserts here previously, so it makes sense to log warnings here.
 
> 
> It might be possible to simplify the backtracking logic in
> trie_remove(), by keeping the stack full of the edges that we followed
> rather than the nodes that we visited.  Then "*edge = new_node;" would
> suffice instead of needing to distinguish the root from other cases and
> re-figure which direction to follow.  But I am not sure that making this
> change is worth it.
> 

Done.

> Acked-by: Ben Pfaff <blp at nicira.com>

You’ll get to say what you think of the result before I push, though. I’d like you to review trie_lookup_value(), trie_insert() and trie_remove() once more. I’d like your OK on the trie manipulation logic (adding and removing nodes). Both of these are constrained by the maximum number of prefix bits a single node can hold (32). Some of the node addition detail is in the trie_branch_create(), which can add create a chain of nodes.

  Jarno

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20131209/888b2622/attachment-0003.html>


More information about the dev mailing list