[ovs-dev] [PATCH v2 21/21] ovn: Change strategy for tunnel keys.

Mon Aug 3 08:19:01 UTC 2015

> On Jul 28, 2015, at 8:44 AM, Ben Pfaff <blp at nicira.com> wrote:
> 

Thanks, Ben.  This is a great patch.  I still owe you a review on the function physical_run(), but it's fairly complicated, and my reviewing powers are rapidly weakening.  I'll finish the review in the morning, but wanted to give you some early feedback so you could start looking at it.

> /* Finds and returns the logical_datapath with the given 'uuid', or NULL if
>  * no such logical_datapath exists. */
> static struct logical_datapath *
> -ldp_lookup(const struct uuid *uuid)
> +ldp_lookup(const struct sbrec_datapath_binding *binding)

The comment describing this function needs to be updated to match the new argument.

> /* Creates a new logical_datapath with the given 'uuid'. */
> static struct logical_datapath *
> -ldp_create(const struct uuid *uuid)
> +ldp_create(const struct sbrec_datapath_binding *binding)

This also needs a comment update.

> +        /* Translate logical table ID to physical table ID. */
> +        bool ingress = !strcmp(rule->pipeline, "ingress");
> +        uint8_t phys_table = rule->table_id + (ingress ? 16 : 48);
> +        uint8_t next_phys_table = rule->table_id < 15 ? phys_table + 1 : 0;
> +        uint8_t output_phys_table = ingress ? 32 : 64;

I think these numbers could use some #define's.  In addition to making it easier to change later on, the code will be easier to understand.

> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index eac5546..5ecd13e 100644
> 
> ...
> 
> +static void
> +keys_destroy(struct hmap *keys)

Very minor, but with add_key() and allocate_key(), it would be more consistent to call this destroy_keys().  And, since I'm already commenting on this function, all the other functions use "set" instead of "keys" for this argument.  :-)

> +/* The 'key' comes from nb->header_.uuid or sb->external_ids's ' */
> +struct ovn_datapath {
> +    struct hmap_node key_node;  /* Index on 'key'. */
> +    struct uuid key;            /* nb->header_.uuid. */

The name "key" seems a bit confusing based on all the functions above referring to a "key" that's the tunnel key.

> +    uint32_t max_port_key;

I think "prev_port_key" would be a more accurate name.

> +static void
> +ovn_datapath_destroy(struct hmap *dp_map, struct ovn_datapath *od)
> +{
> +    if (od) {
> +        /* Don't remove od->list, it's only safe and only used within
> +         * build_datapaths(). */

I find this comment kind of confusing.

> +static void
> +join_datapaths(struct northd_context *ctx, struct hmap *dp_map,
> +               struct ovs_list *sb_only, struct ovs_list *nb_only,
> +               struct ovs_list *both)
> +{
> +    hmap_init(dp_map);

I don't think "dp_map" gets destroyed.

> +static void
> +build_datapaths(struct northd_context *ctx, struct hmap *dp_map)

The argument "dp_map" is different from when it was called as "datapaths".  I'm not trying to be pedantic, but it makes it a bit confusing when tracing through the calls.

> +    struct ovs_list sb_dps, nb_dps, both_dps;
> +
> +    join_datapaths(ctx, dp_map, &sb_dps, &nb_dps, &both_dps);

The implementation of join_datapaths() uses different variable names.  Since this is the only caller, it might be nice to use the same names.  The "_only" names used in join_datapaths() seem a bit better.

> +    if (!list_is_empty(&nb_dps)) {
> +        /* First index the in-use datapath tunnel keys. */
> +        struct hmap dp_keys = HMAP_INITIALIZER(&dp_keys);

I don't think "dp_keys" is ever destroyed.

> +static void
> +ovn_port_destroy(struct hmap *port_map, struct ovn_port *port)
> +{
> +    if (port) {
> +        /* Don't remove port->list, it's only safe and only used within
> +         * build_ports(). */

I also find this comment kind of confusing.

> +static uint32_t
> +ovn_port_allocate_key(struct ovn_datapath *od)
> +{
> +    return allocate_key(&od->port_keys, "port",
> +                        (1u << 16) - 1, &od->max_port_key);

Shouldn't this be 2**15, not 2**16?

> +static void
> +join_logical_ports(struct northd_context *ctx,
> +                   struct hmap *dp_map, struct hmap *port_map,
> +                   struct ovs_list *sb_only, struct ovs_list *nb_only,
> +                   struct ovs_list *both)
> +{
> +    hmap_init(port_map);

I don't think this gets destroyed.

> +static void
> +build_ports(struct northd_context *ctx, struct hmap *dp_map,
> +            struct hmap *port_map)

The "port_map" argument name is different from the only caller which uses "ports".

> +{
> +    struct ovs_list sb_ports, nb_ports, both_ports;
> +
> +    join_logical_ports(ctx, dp_map, port_map,
> +                       &sb_ports, &nb_ports, &both_ports);

Same comment about the argument names as join_datapaths().  Once again, I think "_only" names are clearer here.

> +#define MC_FLOOD "_MC_flood"
> +static const struct multicast_group mc_flood = { MC_FLOOD, 65535 };
> +
> +#define MC_UNKNOWN "_MC_unknown"
> +static const struct multicast_group mc_unknown = { MC_UNKNOWN, 65534 };

Should we note somewhere (maybe in architecture) that these two multicast groups and their values?  It might be useful for debugging.

Should we mention the apparent convention of starting a group name with an underscore?

> +/* Multicast group entry. */
> +struct ovn_multicast {
> +    struct hmap_node hmap_node; /* Index on 'datapath', 'key', */

The comment ends with a comma.

> +rule_add(struct hmap *rule_map, struct ovn_datapath *od,
> +         enum ovn_pipeline pipeline, uint8_t table_id, uint16_t priority,
> +         const char *match, const char *actions)
> +{

All the other rule functions start with "ovn_rule_".  Also, they all refer to "rules" instead of "rule_map".

> +build_rule(struct northd_context *ctx, struct hmap *datapaths,
> +           struct hmap *ports)
> {
> ...
> +    /* Ingress table 0: Admission control framework. */

In most of the other comments the priority is indicated.  It might be nice to do it here, too.

> +    /* Ingress table 0: Ingress port security. */

Ditto.  There are a few more below, too, that I won't continue to call out.

> static void
> ovnnb_db_changed(struct northd_context *ctx)
> {
>     VLOG_DBG("ovn-nb db contents have changed.");
> 
> +    struct hmap datapaths, ports;
> +    build_datapaths(ctx, &datapaths);
> +    build_ports(ctx, &datapaths, &ports);
> +    build_rule(ctx, &datapaths, &ports);

These "build_" functions initialize "datapaths" and "ports", but I don't think anything destroys them.

I think build_rules() might be a more accurate name.

> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index 0334d82..0af96a0 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> 
> +    This section describes how a packet travels from ingress into OVN from one
> +    virtual machine or container to another.  This description focuses on the

I found this first sentence confusing.  Can you just drop "from ingress"?

> +        Packets that originate from a container nested within a VM are treated
> +        in a slightly different way.  The originating container can be
> +        distinguished based on the VLAN ID, so the physical-to-logical

Do you think it's worth adding "VIF-specific" before "VLAN ID".  That way it's a bit clearer that it's not a generic VLAN?

> +        Table 0 also processes packets that arrive from other hypervisors.  It

And gateways, too, right?

> +          If the pipeline can execute more than one <code>output</code> action,
> +          then each one is separately resubmitted to table 32.  This can be
> +          used to send multiple copies to the packet to multiple ports.  (If

s/to the packet/of the packet/

> +    For connecting to gateways, in addition to Geneve and STT, OVN supports
> +    VXLAN, because only VXLAN support is common on top-of-rack (ToR) switch.

s/switch/switches/

> +      The <code>output</code> action also introduces recursion.  Its effect
> +      depends on the current value of the <code>outport</code> field.  Suppose
> +      <code>outport</code> designates a logical port.  First, OVN compares
> +      <code>inport</code> to <code>outport</code>; if they are equal, it treats
> +      the <code>output</code> as a no-op.  In the common case, where they are
> +      different, the packet enters the egress pipeline.  This transition to the
> +      egress pipeline discards register data (<code>reg0</code>
> +      ... <code>reg5</code>).

It might be worth mentioning this is done because the egress pipeline may be on a different hypervisor, and the registers wouldn't survive.

BTW, I didn't notice the implementation doing that, but it's possible that I missed it in my quick look in physical_run().

>     <column name="logical_datapath">
>       The logical datapath to which the logical flow belongs.  A logical
>       datapath implements a logical pipeline among the ports in the <ref
> -      table="Port_Binding"/> table associated with it.  (No table represents a
> -      logical datapath.)  In practice, the pipeline in a given logical datapath
> -      implements either a logical switch or a logical router, and
> -      <code>ovn-northd</code> reuses the UUIDs for those logical entities from
> -      the <code>OVN_Northbound</code> for logical datapaths.
> +      table="Port_Binding"/> table associated with it.  In practice, the
> +      pipeline in a given logical datapath implements either a logical switch
> +      or a logical router, and <code>ovn-northd</code> reuses the UUIDs for
> +      those logical entities from the <code>OVN_Northbound</code> for logical
> +      datapaths.

Is this point about ovn-northd reusing the UUIDs still relevant here?  This discussion seems more appropriate for the new Datapath_Bindings table.

> @@ -524,10 +641,21 @@
> 
>       <p><em>Symbols</em></p>
> 
> +      <p>
> +        Most of the symbols below have integer type.  Only <code>inport</code>
> +        and <code>outport</code> have string type.  <code>inport</code> names a
> +        logical port.  Thus, its value is a <ref column="logical_port"/> names
> +        from the <ref table="Port_Binding"/> and <ref table="Gateway"/> tables
> +        in a logical flow's <ref column="logical_datapath"/>.
> +        <code>outport</code> may name a logical port, as <code>inport</code>.
> +        It may also name a logical multicast group defined in the <ref
> +        table="Multicast_Group"/> table.

I found the sentence beginning with "Thus" confusing.

Also, do we know that "queue" will be an integer (and not a string)?  Should it be deleted until it's implemented?  

> @@ -562,17 +690,32 @@
>       </p>
> 
>       <p>
> -        The following actions will be initially supported:
> +	The following actions are defined:
>       </p>
> ...
> +          <p>
> +	    In an <code>ingress</code> flow, this action executes the

I think it might be clearer and more consistent if this were changed to "the ingress pipeline".

> +	    <code>egress</code> pipeline as a subroutine.  If
> +	    <code>outport</code> names a logical port, the egress pipeline
> +	    executes once; if it is a multicast group, the egress pipeline runs
> +	    once for each logical port in the group.
> +          </p>
> +
> +          <p>
> +            In an <code>egress</code> flow, this action performs the actual

Same comment here but with "the egress pipeline".

> +            Not all fields are modifiable (e.g. <code>eth.type</code> and
> +            <code>ip.proto</code> are read-only), and not all modifiable fields
> +            may be partially modified (e.g. <code>ip.ttl</code> must assigned
> +            as a whole).  The <code>outport</code> field is modifiable for an
> +            <code>ingress</code> flow but not an <code>egress</code> flow.

Less important here, but it might be more consistent to refer to these as pipelines instead of flows.

> +      In the <ref table="Rule"/> table, multicast groups may be used for output
> +      just as for individual logical ports, by assigning the group's name to
> +      <code>outport</code>,

It might be worth mentioning this only works in the ingress pipeline.

> +    <column name="external_ids" key="logical-switch" type='{"type": "uuid"}'>
> +      Each row in <ref table="Datapath_Binding"/> is associated with some
> +      logical datapath.  <code>ovn-northd</code> uses this key to store the
> +      UUID of the logical datapath <ref table="Logical_Switch"
> +      db="OVN_Northbound"/> row in the <ref db="OVN_Northbound"/> database.

It could also be a "Logical_Router", right?

>     <column name="tunnel_key">
>       <p>
> -        A number that represents the logical port in the key (e.g. VXLAN VNI or
> -        STT key) field carried within tunnel protocol packets.  (This avoids
> +        A number that represents the logical port in the key (e.g. STT key or
> +        Geneve TLV) field carried within tunnel protocol packets.  This avoids
>         wasting space for a whole UUID in tunneled packets.  It also allows OVN
>         to support encapsulations that cannot fit an entire UUID in their
> -        tunnel keys.)
> +        tunnel keys (i.e. every encapsulation other than Geneve).

I don't know if this discussion about UUIDs in necessary here.  If we want to keep it, it seems like it would make more sense in the ovn-architecture document.

> -        Tunnel ID 0 is reserved for internal use within OVN.
> +        The tunnel ID must be unique within the scope of a logical datapath.
>       </p>
> +
> +      <p>
> +        Logical port tunnel IDs form a 16-bit space:
> +      </p>
> +
> +      <ul>
> +        <li>Tunnel ID 0 is reserved for internal use within OVN.</li>
> +        <li>Tunnel IDs 1 through 32767, inclusive, may be assigned to logical
> +        ports.</li>
> +        <li>Tunnel IDs 32768 through 65535, inclusive, may be assigned to
> +        logical multicast groups (see the <ref table="Multicast_Group"/>
> +        table).</li>
> +      </ul>

I think this is valuable information, but it's location is a little odd.  For example, this is in the Port_Binding table, which doesn't allow a tunnel ID above 32767, but the description for a tunnel id of  >32K is here.  I feel like a lot of this discussion could be moved to ovn-architecture, and just provide a pointer to it like the other definitions of "tunnel_key" in this document.

I'll finish the review in the morning, but any additional comments should only be around physical_run().

--Justin