[ovs-dev] [PATCH v3 2/2] Classifier: Track address prefixes.

Ben Pfaff blp at nicira.com
Fri Nov 22 01:10:41 UTC 2013


On Thu, Nov 21, 2013 at 02:25:31PM -0800, Jarno Rajahalme wrote:
> Add a prefix tree (trie) structure for tracking the used address
> space, enabling skipping classifier tables containing longer masks
> than necessary for an address field value in a flow being classified.
> This enables less unwildcarding for datapath flows in parts of the
> address space without host routes.
> 
> Trie lookup is interwoven to the staged lookup, so that a trie is
> searched only when the configured trie field field becomes relevant
> for the lookup.  The trie lookup results are retained so that each
> trie is checked at most once for each classifier lookup.
> 
> This implementation tracks the number of rules at each address prefix
> for the whole classifier.  More aggressive table skipping would be
> possible by maintaining lists of tables that have prefixes at the
> lengths encountered on tree traversal, or by maintaining separate
> tries for subsets of rules separated by metadata fields.
> 
> Prefix tracking is configured via OVSDB.  A new column "fieldspec" is
> added to the database table "Flow_Table".  "fieldspec" is a string map
> with only the "prefix" key defined so far (other keys can be added in
> future):
> 
> "prefix"   A comma separated list of field names for which prefix
>            lookup should be used.
> 
> The fields for which prefix lookup can be enabled are:
> - tun_id, tun_src, tun_dst
> - metadata
> - reg0 - reg7
> - nw_src, nw_dst (or aliases ip_src and ip_dst)
> - ipv6_src, ipv6_dst
> 
> There is a maximum number of fields that can be enabled for any one
> flow table.  Currently this limit is 3.
> 
> Examples:
> 
> ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
>  --id=@N1 create Flow_Table name=table0
> ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
>  --id=@N1 create Flow_Table name=table1
> 
> ovs-vsctl set Flow_Table table0 fieldspec:prefix=tun_id
> ovs-vsctl set Flow_Table table1 fieldspec:prefix=ip_dst,ip_src
> 
> ovs-vsctl remove Flow_Table table1 fieldspec prefix
> 
> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>

I am not done studying this code, but I have some preliminary
comments.

Some of the fields that allow prefix length are in network byte order,
and others (the registers) are in host byte order, but the code that
looks at the fields doesn't seem to distinguish.  "Prefixes" therefore
turn into suffixes for the registers, but only on little-endian
machines.

vswitch.xml should document the new Flow_Table column.

Thanks for correcting the comment on miniflow_init__().

I'm not sure that I would list the fields that can be used for prefix
lookup so explicitly in the classifier.h comment, because lists of
this kind tend to get out of date.

trie_destroy() doesn't follow the usual OVS convention that a destroy
function should accept NULL, but making it accept NULL would allow
some callers to be slightly simplified.

I don't see anything in classifier_set_prefix_fields() or its callers
to detect and disallow listing the same field twice.  Does it matter?

mf_from_id() assert-fails on a bad id, but
classifier_set_prefix_fields() still checks for a null return value.

minimask_get_prefix_len() adds restriction to the fields that can be
used for prefix matching, that is not mentioned elsewhere: it must be
a multiple of 4 bytes long.  trie_insert() adds a subtle variation
that it must be a multiple of 32 bits long.  The latter seems more
appropriate, since we could have, say, 24-bit field that is aligned on
a 4-byte boundary, and none of the code seems to deal with anything
that is not a multiple of 32 bits.

minimask_get_prefix_len() doesn't seem to be on a fast path, so it
might be better to write really clear code by using minimask_get() in
place of directly dealing with 'map' and values'.  

In minimask_get_prefix_len(), I think that:
            if (mask) {
                mask_tz = raw_ctz(mask);
                if (~mask & (~0u << mask_tz)) {
                    return 0; /* Mask not contiguous. */
                }
                nbits += 32 - mask_tz;
            } else {
                mask_tz = 32; /* End of mask seen. */
            }
could be simplified to just:
            if (~mask & (~mask + 1)) {
                return 0; /* Mask not contiguous. */
            }
            mask_tz = ctz32(mask);
            nbits += 32 - mask_tz;
using the alternative test for a non-CIDR mask found in ip_is_cidr().
(I did not test this change.)

The !mf test at the top of trie_insert() seems odd.  Is it really
possible for mf to be null here?

In trie_insert(), is the call to minimask_get_prefix_len() necessary?
I may be missing something but I'm not sure how the rule's prefix len
could be different from the rule's subtable's prefix length.

trie_prefix_equal_bits() looks to me like it doesn't handle the case
where ofs...plen crosses a 32-bit boundary, since it only looks at a
single word in the prefix[] array.  But the code for that function is
really opaque to me so maybe I just misunderstand.

trie_equal_bits() and trie_prefix_equal_bits() are very similar, I
suspect that they could be written as trivial wrappers around a common
function.

Thanks,

Ben.



More information about the dev mailing list