[ovs-dev] [PATCH 3/4] tc: Add header rewrite using tc pedit action

Paul Blakey paulb at mellanox.com
Tue Aug 15 08:19:11 UTC 2017



On 15/08/2017 00:56, Joe Stringer wrote:
> On 8 August 2017 at 07:21, Roi Dayan <roid at mellanox.com> wrote:
>> From: Paul Blakey <paulb at mellanox.com>
>>
>> To be later used to implement ovs action set offloading.
>>
>> Signed-off-by: Paul Blakey <paulb at mellanox.com>
>> Reviewed-by: Roi Dayan <roid at mellanox.com>
>> ---
> 
> Hi Paul and folks, some questions inline.
> 
> utilities/checkpatch.py says:
> 
> $ utilities/checkpatch.py -4
> == Checking 00ecb482f5d1 ("compat: Add act_pedit compatibility for old
> kernels") ==
> Lines checked: 127, no obvious problems found
> 
> == Checking 7503746718f6 ("odp-util: Expose ovs flow key attr len
> table for reuse") ==
> Lines checked: 195, no obvious problems found
> 
> == Checking 1e8ab68aefe1 ("tc: Add header rewrite using tc pedit action") ==
> ERROR: Improper whitespace around control block
> #359 FILE: lib/tc.c:480:
>     NL_ATTR_FOR_EACH(nla, left, keys_ex, keys_ex_size) {
> 
> Lines checked: 721, Warnings: 0, Errors: 1
> 
> == Checking f6b52ce504c2 ("netdev-tc-offloads: Add support for action set") ==
> ERROR: Improper whitespace around control block
> #908 FILE: lib/netdev-tc-offloads.c:590:
>     NL_ATTR_FOR_EACH_UNSAFE(set_attr, set_left, set, set_len) {
> 
> Lines checked: 981, Warnings: 0, Errors: 1
> 
> <snip>
> 
>> +static struct flower_key_to_pedit flower_pedit_map[] = {
>> +    [offsetof(struct tc_flower_key, ipv4.ipv4_src)] = {
>> +        TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
>> +        12,
>> +        MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_src)
>> +    },
> 
> Why are these items indexed by the offset, and not just an array we
> iterate? It seems like the only places where we iterate this, we use
> "for (i=0; ... i++)", which means that we iterate several times across
> empty items, and this array is sparsely populated.
Right we can make this one not sparse as there is no direct access to 
some index. Or some of the suggestions on the other patch's map (like 
creating a fast map for both put and dump paths).

> 
>> +    [offsetof(struct tc_flower_key, ipv4.ipv4_dst)] = {
>> +        TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
>> +        16,
>> +        MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_dst)
>> +    },
>> +    [offsetof(struct tc_flower_key, ipv4.rewrite_ttl)] = {
>> +        TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
>> +        8,
>> +        MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_ttl)
>> +    },
> 
> <snip>
> 
>> @@ -346,6 +425,96 @@ nl_parse_flower_ip(struct nlattr **attrs, struct tc_flower *flower) {
>>       }
>>   }
>>
>> +static const struct nl_policy pedit_policy[] = {
>> +            [TCA_PEDIT_PARMS_EX] = { .type = NL_A_UNSPEC,
>> +                                     .min_len = sizeof(struct tc_pedit),
>> +                                     .optional = false, },
>> +            [TCA_PEDIT_KEYS_EX]   = { .type = NL_A_NESTED,
>> +                                      .optional = false, },
>> +};
>> +
>> +static int
>> +nl_parse_act_pedit(struct nlattr *options, struct tc_flower *flower)
>> +{
>> +    struct nlattr *pe_attrs[ARRAY_SIZE(pedit_policy)];
>> +    const struct tc_pedit *pe;
>> +    const struct tc_pedit_key *keys;
>> +    const struct nlattr *nla, *keys_ex, *ex_type;
>> +    const void *keys_attr;
>> +    char *rewrite_key = (void *) &flower->rewrite.key;
>> +    char *rewrite_mask = (void *) &flower->rewrite.mask;
> 
> These are actually 'struct tc_flower_key', shouldn't we use the actual
> types? (I realise there is pointer arithmetic below, but usually we
> restrict the usage of void casting and pointer arithmetic as much as
> possible).
> 

Yes, It's for the pointer arithmetic. (void *) cast was for the
clangs warning "cast increases required alignment of target type"
which we couldn't find another way to suppress.
I don't think alignments an issue here.


>> +    size_t keys_ex_size, left;
>> +    int type, i = 0;
>> +
>> +    if (!nl_parse_nested(options, pedit_policy, pe_attrs,
>> +                         ARRAY_SIZE(pedit_policy))) {
>> +        VLOG_ERR_RL(&error_rl, "failed to parse pedit action options");
>> +        return EPROTO;
>> +    }
>> +
>> +    pe = nl_attr_get_unspec(pe_attrs[TCA_PEDIT_PARMS_EX], sizeof *pe);
>> +    keys = pe->keys;
>> +    keys_attr = pe_attrs[TCA_PEDIT_KEYS_EX];
>> +    keys_ex = nl_attr_get(keys_attr);
>> +    keys_ex_size = nl_attr_get_size(keys_attr);
>> +
>> +    NL_ATTR_FOR_EACH(nla, left, keys_ex, keys_ex_size) {
>> +        if (i >= pe->nkeys) {
>> +            break;
>> +        }
>> +
>> +        if (nl_attr_type(nla) == TCA_PEDIT_KEY_EX) {
>> +            ex_type = nl_attr_find_nested(nla, TCA_PEDIT_KEY_EX_HTYPE);
>> +            type = nl_attr_get_u16(ex_type);
>> +
>> +            for (int j = 0; j < ARRAY_SIZE(flower_pedit_map); j++) {
>> +                struct flower_key_to_pedit *m = &flower_pedit_map[j];
>> +                int flower_off = j;
>> +                int sz = m->size;
>> +                int mf = m->offset;
>> +
>> +                if (!sz || m->htype != type) {
>> +                   continue;
>> +                }
> 
> If flower_pedit_map was just a regular array and the offset was
> included in 'struct flower_key_to_pedit', then I don't think we need
> the above check?
> 
>> +
>> +                /* check overlap between current pedit key, which is always
>> +                 * 4 bytes (range [off, off + 3]), and a map entry in
>> +                 * flower_pedit_map (range [mf, mf + sz - 1]) */
>> +                if ((keys->off >= mf && keys->off < mf + sz)
>> +                    || (keys->off + 3 >= mf && keys->off + 3 < mf + sz)) {
>> +                    int diff = flower_off + (keys->off - mf);
>> +                    uint32_t *dst = (void *) (rewrite_key + diff);
>> +                    uint32_t *dst_m = (void *) (rewrite_mask + diff);
>> +                    uint32_t mask = ~(keys->mask);
>> +                    uint32_t zero_bits;
>> +
>> +                    if (keys->off < mf) {
>> +                        zero_bits = 8 * (mf - keys->off);
>> +                        mask &= UINT32_MAX << zero_bits;
>> +                    } else if (keys->off + 4 > mf + m->size) {
>> +                        zero_bits = 8 * (keys->off + 4 - mf - m->size);
>> +                        mask &= UINT32_MAX >> zero_bits;
>> +                    }
> 
> I think this is all trying to avoid having a giant switch case here
> which would handle all of the possible flower key attributes, similar
> to somewhere else where this kind of iteration occurs.
Right that was the objective.

  Is there any
> overlap between this code and calc_offsets()calc_offsets was to calculate the word offsets and masks for the first 
and last word to write, here its the reverse so its already in word size 
with correct masks, and easier to write back.

> 
> Could we somehow trigger an assert or a strong warning message if the
> offsets of our flower_pedit_map and the flower key don't align, given
> that there's a bunch of pointer arithmetic happening here?
> 

Not sure what you mean here, we are aligned to words (4 bytes) and
"cut" the words if they overflow the target. we also padded 
flower.rewrite key and mask so we won't overflow to flower member we 
don't mean to write.


> <snip>
> 
>> @@ -589,6 +758,10 @@ nl_parse_single_action(struct nlattr *action, struct tc_flower *flower)
>>           nl_parse_act_vlan(act_options, flower);
>>       } else if (!strcmp(act_kind, "tunnel_key")) {
>>           nl_parse_act_tunnel_key(act_options, flower);
>> +    } else if (!strcmp(act_kind, "pedit")) {
>> +        nl_parse_act_pedit(act_options, flower);
>> +    } else if (!strcmp(act_kind, "csum")) {
>> +        /* not doing anything for now */
> 
> Should we log a message that we're not handling csums?
> 

We aren't handling them because we don't need to, OVS has an implicit
csum recalculation of rewritten packet headers.

> <snip>
> 
>> +static int
>> +nl_msg_put_flower_rewrite_pedits(struct ofpbuf *request,
>> +                                 struct tc_flower *flower)
>> +{
>> +    struct {
>> +        struct tc_pedit sel;
>> +        struct tc_pedit_key keys[MAX_PEDIT_OFFSETS];
>> +        struct tc_pedit_key_ex keys_ex[MAX_PEDIT_OFFSETS];
>> +    } sel = {
>> +        .sel = {
>> +            .nkeys = 0
>> +        }
>> +    };
>> +    int i, j;
>> +
>> +    for (i = 0; i < ARRAY_SIZE(flower_pedit_map); i++) {
>> +        struct flower_key_to_pedit *m = &flower_pedit_map[i];
>> +        struct tc_pedit_key *pedit_key = NULL;
>> +        struct tc_pedit_key_ex *pedit_key_ex = NULL;
>> +        uint32_t *mask, *data, first_word_mask, last_word_mask;
>> +        int cnt = 0, cur_offset = 0;
>> +
>> +        if (!m->size) {
>> +            continue;
>> +        }
>> +
>> +        calc_offsets(flower, m, i, &cur_offset, &cnt, &last_word_mask,
>> +                     &first_word_mask, &mask, &data);
>> +
>> +        for (j = 0; j < cnt; j++,  mask++, data++, cur_offset += 4) {
>> +            uint32_t mask_word = *mask;
>> +
>> +            if (j == 0) {
>> +                mask_word &= first_word_mask;
>> +            }
>> +            if (j == cnt - 1) {
>> +                mask_word &= last_word_mask;
>> +            }
>> +            if (!mask_word) {
>> +                continue;
>> +            }
>> +            if (sel.sel.nkeys == MAX_PEDIT_OFFSETS) {
>> +                VLOG_ERR_RL(&error_rl, "reached too many pedit offsets: %d",
>> +                            MAX_PEDIT_OFFSETS);
>> +                return EOPNOTSUPP;
>> +            }
> 
> Just wondering if this should be err or warn, since presumably the
> result is that such flows avoid TC and should end up being correctly
> handled by the openvswitch kernel datapath?

Right, we'll change that.

> 
>> +
>> +            pedit_key = &sel.keys[sel.sel.nkeys];
>> +            pedit_key_ex = &sel.keys_ex[sel.sel.nkeys];
>> +            pedit_key_ex->cmd = TCA_PEDIT_KEY_EX_CMD_SET;
>> +            pedit_key_ex->htype = m->htype;
>> +            pedit_key->off = cur_offset;
>> +            pedit_key->mask = ~mask_word;
>> +            pedit_key->val = *data & mask_word;
>> +            sel.sel.nkeys++;
>> +            csum_update_flag(flower, m->htype);
>> +        }
>> +    }
>> +    nl_msg_put_act_pedit(request, &sel.sel, sel.keys_ex);
>> +
>> +    return 0;
>> +}
>> +
>> +static int
>>   nl_msg_put_flower_acts(struct ofpbuf *request, struct tc_flower *flower)
>>   {
>>       size_t offset;
>> @@ -1002,11 +1354,19 @@ nl_msg_put_flower_tunnel(struct ofpbuf *request, struct tc_flower *flower)
>>       nl_msg_put_masked_value(request, type, type##_MASK, &flower->key.member, \
>>                               &flower->mask.member, sizeof flower->key.member)
>>
>> -static void
>> +static int
>>   nl_msg_put_flower_options(struct ofpbuf *request, struct tc_flower *flower)
>>   {
>> +
>>       uint16_t host_eth_type = ntohs(flower->key.eth_type);
>>       bool is_vlan = (host_eth_type == ETH_TYPE_VLAN);
>> +    int err;
>> +
>> +    /* need to parse acts first as some acts require changing the matching */
>> +    err  = nl_msg_put_flower_acts(request, flower);
>> +    if (err) {
>> +        return err;
>> +    }
>>
>>       if (is_vlan) {
>>           host_eth_type = ntohs(flower->key.encap_eth_type);
>> @@ -1062,7 +1422,7 @@ nl_msg_put_flower_options(struct ofpbuf *request, struct tc_flower *flower)
>>           nl_msg_put_flower_tunnel(request, flower);
>>       }
>>
>> -    nl_msg_put_flower_acts(request, flower);
>> +    return 0;
>>   }
> 
> The above snippet seems like it could be logically separate and kept
> as a different patch, as it shouldn't affect behaviour at all but it
> does actually introduce additional error checking. This could assist
> git bisect if necessary.

We'll do that, thanks.

> 
>>
>>   int
>> @@ -1085,11 +1445,17 @@ tc_replace_flower(int ifindex, uint16_t prio, uint32_t handle,
>>       nl_msg_put_string(&request, TCA_KIND, "flower");
>>       basic_offset = nl_msg_start_nested(&request, TCA_OPTIONS);
>>       {
>> -        nl_msg_put_flower_options(&request, flower);
>> +        error = nl_msg_put_flower_options(&request, flower);
>> +
>> +        if (error) {
>> +            ofpbuf_uninit(&request);
>> +            return error;
>> +        }
>>       }
>>       nl_msg_end_nested(&request, basic_offset);
>>
>>       error = tc_transact(&request, &reply);
>> +
> 
> Random extra newline?

Will be removed, thanks.

> 
>>       if (!error) {
>>           struct tcmsg *tc =
>>               ofpbuf_at_assert(reply, NLMSG_HDRLEN, sizeof *tc);
>> diff --git a/lib/tc.h b/lib/tc.h
>> index 5f363d0..2269a22 100644
>> --- a/lib/tc.h
>> +++ b/lib/tc.h
>> @@ -93,6 +93,7 @@ struct tc_flower_key {
>>       struct {
>>           ovs_be32 ipv4_src;
>>           ovs_be32 ipv4_dst;
>> +        uint8_t rewrite_ttl;
>>       } ipv4;
>>       struct {
>>           struct in6_addr ipv6_src;
>> @@ -117,6 +118,17 @@ struct tc_flower {
>>       uint64_t lastused;
>>
>>       struct {
>> +        bool rewrite;
>> +        uint8_t pad1[3];
>> +        struct tc_flower_key key;
>> +        uint8_t pad2[3];
>> +        struct tc_flower_key mask;
>> +        uint8_t pad3[3];
>> +    } rewrite;
>> +
>> +    uint32_t csum_update_flags;
>> +
>> +    struct {
>>           bool set;
>>           ovs_be64 id;
>>           ovs_be16 tp_src;
> 
> This is maybe an aside, but it seems like 'rewrite_ttl' already exists
> in 'struct tc_flower_key'. Should it also be in 'struct tc_flower'
> like this?
> 

On latest master, we have ip_ttl in struct tc_flower, which is for 
matching on ipv6/ipv4 ttl/tos.
This one is for rewriting of ttl, we need to distinguish between
ipv4 ttl and ipv6 ttl to translate them to different action pedit 
offsets/layers, we didn't want to add ip protocol dependencies in the map.

> Cheers,
> Joe
> 


More information about the dev mailing list