[ovs-dev] [PATCH v3] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields

Ilya Maximets i.maximets at ovn.org
Sun Jul 25 23:22:17 UTC 2021


Re-adding the list and people from CC.

On 6/27/21 8:16 AM, 贺鹏 wrote:
> Hi, Ilya,
> 
> sorry for replying so late, being really busy this month.

Busy time for everyone.
BTW, I'm taking some time off next week.  Maybe someone else will be able
to take a look at the changes while I'm not here.  My replies inline.

> I've read through your comments. I personally prefer to keep the
> number of megaflows as few as possible, as megaflow is becoming
> expensive as the size of flow struct is large.
> below are my experiments.
> 
> Ilya Maximets <i.maximets at ovn.org> 于2021年5月22日周六 上午3:50写道:
>>
>> On 5/21/21 5:00 AM, 贺鹏 wrote:
>>> Hi, Ilya
>>>
>>>
>>>
>>> Ilya Maximets <i.maximets at ovn.org> 于2021年5月19日周三 下午8:50写道:
>>>>
>>>> On 2/27/21 10:34 AM, Peng He wrote:
>>>>> CT zone could be set from a field that is not included in frozen
>>>>> metadata. Consider the example rules which are typically seen in
>>>>> OpenStack security group rules:
>>>>>
>>>>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
>>>>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
>>>>>
>>>>> The zone is set from the first rule's ct action. These two rules will
>>>>> generate two megaflows: the first one uses zone=5 to query the CT module,
>>>>> the second one sets the zone-id from the first megaflow and commit to CT.
>>>>>
>>>>> The current implementation will generate a megaflow that does not use
>>>>> ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone is
>>>>> set by an Imm not a field.
>>>>>
>>>>> Consider a situation that one changes the zone id (for example to 15)
>>>>> in the first rule, however, still keep the second rule unchanged. During
>>>>> this change, there is traffic hitting the two generated megaflows, the
>>>>> revaldiator would revalidate all megaflows, however, the revalidator will
>>>>> not change the second megaflow, because zone=5 is recorded in the
>>>>> megaflow, so the xlate will still translate the commit action into zone=5,
>>>>> and the new traffic will still commit to CT as zone=5, not zone=15,
>>>>> resulting in taffic drops and other issues.
>>>>>
>>>>> Just like OVS set-field convention, if a field X is set by Y
>>>>> (Y is a variable not an Imm), we should also mask Y as a match
>>>>> in the generated megaflow. An exception is that if the zone-id is
>>>>> set by the field that is included in the frozen state (i.e. regs) and this
>>>>> upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
>>>>> as the recirc_id is a hash of the values in these fields, and it will change
>>>>> following the changes of these fields. When the recirc_id changes,
>>>>> all megaflows with the old recirc id will be invalid later.
>>>>>
>>>>> Fixes: 07659514c3 ("Add support for connection tracking.")
>>>>> Reported-by: Sai Su <susai.ss at bytedance.com>
>>>>> Signed-off-by: Peng He <hepeng.0320 at bytedance.com>
>>>>> ---
>>>>>  include/openvswitch/meta-flow.h |  1 +
>>>>>  lib/meta-flow.c                 | 13 ++++++++++
>>>>>  ofproto/ofproto-dpif-xlate.c    | 12 +++++++++
>>>>>  tests/system-traffic.at         | 45 +++++++++++++++++++++++++++++++++
>>>>>  4 files changed, 71 insertions(+)
>>>>>
>>>>> diff --git a/include/openvswitch/meta-flow.h b/include/openvswitch/meta-flow.h
>>>>> index 95e52e358..045dce8f5 100644
>>>>> --- a/include/openvswitch/meta-flow.h
>>>>> +++ b/include/openvswitch/meta-flow.h
>>>>> @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
>>>>>                                const union mf_value *mask,
>>>>>                                struct flow *);
>>>>>  bool mf_is_tun_metadata(const struct mf_field *);
>>>>> +bool mf_is_frozen_metadata(const struct mf_field *);
>>>>>  bool mf_is_pipeline_field(const struct mf_field *);
>>>>>  bool mf_is_set(const struct mf_field *, const struct flow *);
>>>>>  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
>>>>> diff --git a/lib/meta-flow.c b/lib/meta-flow.c
>>>>> index c808d205d..e03cd8d0c 100644
>>>>> --- a/lib/meta-flow.c
>>>>> +++ b/lib/meta-flow.c
>>>>> @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
>>>>>             mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
>>>>>  }
>>>>>
>>>>> +bool
>>>>> +mf_is_frozen_metadata(const struct mf_field *mf)
>>>>> +{
>>>>> +    if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
>>>>> +        return true;
>>>>> +    }
>>>>> +
>>>>> +    if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
>>>>> +        return true;
>>>>> +    }
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>>  bool
>>>>>  mf_is_pipeline_field(const struct mf_field *mf)
>>>>>  {
>>>>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>>>>> index 7108c8a30..14d00db1e 100644
>>>>> --- a/ofproto/ofproto-dpif-xlate.c
>>>>> +++ b/ofproto/ofproto-dpif-xlate.c
>>>>> @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>>>>>
>>>>>      if (ofc->zone_src.field) {
>>>>>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
>>>>> +        if (ctx->xin->frozen_state) {
>>>>> +            /* If the upcall is a resume of a recirculation, we only need to
>>>>> +             * unwildcard the fields that are not in the frozen_metadata, as
>>>>> +             * when the rules update, OVS will generate a new recirc_id,
>>>>> +             * which will invalidate the megaflow with old the recirc_id.
>>>>> +             */
>>>>> +            if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
>>>>> +                mf_mask_field(ofc->zone_src.field, ctx->wc);
>>>>> +            }
>>>>> +        } else {
>>>>> +            mf_mask_field(ofc->zone_src.field, ctx->wc);
>>>>> +        }
>>>>
>>>> IIUC, in current code above block is equal to just a single line:
>>>>
>>>>     mf_mask_field(ofc->zone_src.field, ctx->wc);
>>>>
>>>> That is because zone is not part of a frozen metadata, right?
>>>
>>> Yes, basically this is the reason.
>>> I think the original code only considers the case that the zone id is
>>> set by IMM, not
>>> by a reg.
>>>
>>>>
>>>> Can we just remove all this extra logic and unwildcard unconditionally?
>>>> I understand that this code might save us one match in case someday
>>>> zone will be part of a frozen metadata, but is that really important?
>>>> Is there any harm in unwildcarding this field unconditionally?
>>>
>>> I think the code can reduce the number of megaflows in case that the rule
>>> sets the zone-id using IMM. But I think unconditionally un-wildcarding is also
>>> OK.
>>
>> From what I managed to test, if value is taken from any register
>> populated by Imm, there are no extra matches in datapath flows.
>> Maybe I'm missing some cases, though.
>>
>> I tried to experiment a bit with this code and the test case.
>> First thing that I found is that we, probably, need to use
>> mf_write_subfield_flow() instead of mf_mask_field(), because
>> we should not unwildcard the whole field if only part of it
>> used.
>>
>> Second thing is that I tried to write following set of flows
>> that might not make much sense, but should probably work fine:
>>
>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0,commit,exec(load:0xffff0005->NXM_NX_CT_LABEL[[0..31]]))
>> priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_LABEL[[0..15]]),2
>>
>> And what I see is that in resulted datapath flows there is no
>> match on ct_label.  This happens because masks of ct_label and
>> ct_mark gets saved at the beginning of the function and restored
>> later, i.e. our un-wildcarding is lost.
> 
> Yes, good catch!
> 
> Yes, I am using the belowing code:
> 
>      if (ofc->zone_src.field) {
>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> +        union mf_subvalue value;
> +        memset(&value, 0xff, sizeof(value));
>          if (ctx->xin->frozen_state) {
>              /* If the upcall is a resume of a recirculation, we only need to
>               * unwildcard the fields that are not in the frozen_metadata, as
> @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx
> *ctx, struct ofpact_conntrack *ofc,
>               * which will invalidate the megaflow with old the recirc_id.
>               */
>              if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> -                mf_mask_field(ofc->zone_src.field, ctx->wc);
> +                mf_write_subfield_flow(&ofc->zone_src, &value,
> &ctx->wc->masks);
>              }
>          } else {
> -            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +            mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks);
>          }
>      } else {
>          zone = ofc->zone_imm;
>      }
> 
> Not sure if it fits the idiom usage of mf APIs.

I'm not an expert in this part of the code, but this seems fine to me.
I would re-order the definition part though, oterwise it's hard to read.
Something like:

if (ofc->zone_src.field) {
    union mf_subvalue value;
    memset(&value, 0xff, sizeof(value));

    zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
    if (ctx->xin->frozen_state) {
        ...


> 
>>
>> The third thing is that if I'm un-wildcarding before saving or
>> after restoring, I see a bit weird datapath flow after modification
>> of the first flow to:
>>
>> priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=15,table=0,commit,exec(load:0xffff000f->NXM_NX_CT_LABEL[[0..31]]))
>>
>> There is a new datapath flow with a correct ct_label match, but
>> the old flow gets updated to have ct_label match on 0.  This might
>> make some sense for this set of flows, but I'm not sure.
>>
>> Before flow-mod:
>>
>> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=5,label=0xffff0005/0xffffffff),recirc(0x1)
>>
>> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0x5/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=5,label=0x5/0xffff),ovs-p1
>>
>> After:
>>
>> recirc_id(0),in_port(ovs-p0),ct_state(-trk),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x1)
>>
>> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,label=0/0xffff),ovs-p1
>>
>> recirc_id(0x1),in_port(ovs-p0),ct_state(+trk),ct_label(0xf/0xffff),eth(),eth_type(0x0800),ipv4(proto=6,frag=no),
>>     actions:ct(commit,zone=15,label=0xf/0xffff),ovs-p1
>>
> 
> I do not encounter such an issue, my changes are here:

Hmm.  Maybe I did something wrong.
Could you prepare v2, so we can continue discussion from there?

Best regards, Ilya Maximets.

> 
>  compose_conntrack_action(struct xlate_ctx *ctx, struct ofpact_conntrack *ofc,
>                           bool is_last_action)
>  {
> -    ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label;
> -    uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark;
> -    size_t ct_offset;
>      uint16_t zone;
> -
>      uint16_t zone;
> -
> -    /* Ensure that any prior actions are applied before composing the new
> -     * conntrack action. */
> -    xlate_commit_actions(ctx);
> -
> -    /* Process nested actions first, to populate the key. */
> -    ctx->ct_nat_action = NULL;
> -    ctx->wc->masks.ct_mark = 0;
> -    ctx->wc->masks.ct_label = OVS_U128_ZERO;
> -    do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx,
> -                     is_last_action, false);
> -
>      if (ofc->zone_src.field) {
>          zone = mf_get_subfield(&ofc->zone_src, &ctx->xin->flow);
> +        union mf_subvalue value;
> +        memset(&value, 0xff, sizeof(value));
>          if (ctx->xin->frozen_state) {
>              /* If the upcall is a resume of a recirculation, we only need to
>               * unwildcard the fields that are not in the frozen_metadata, as
> @@ -6202,15 +6189,30 @@ compose_conntrack_action(struct xlate_ctx
> *ctx, struct ofpact_conntrack *ofc,
>               * which will invalidate the megaflow with old the recirc_id.
>               */
>              if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> -                mf_mask_field(ofc->zone_src.field, ctx->wc);
> +                mf_write_subfield_flow(&ofc->zone_src, &value,
> &ctx->wc->masks);
>              }
>          } else {
> -            mf_mask_field(ofc->zone_src.field, ctx->wc);
> +            mf_write_subfield_flow(&ofc->zone_src, &value, &ctx->wc->masks);
>          }
>      } else {
>          zone = ofc->zone_imm;
>      }
> 
> +    size_t ct_offset;
> +    ovs_u128 old_ct_label_mask = ctx->wc->masks.ct_label;
> +    uint32_t old_ct_mark_mask = ctx->wc->masks.ct_mark;
> +    /* Ensure that any prior actions are applied before composing the new
> +     * conntrack action. */
> +    xlate_commit_actions(ctx);
> +
> +    /* Process nested actions first, to populate the key. */
> +    ctx->ct_nat_action = NULL;
> +    ctx->wc->masks.ct_mark = 0;
> +    ctx->wc->masks.ct_label = OVS_U128_ZERO;
> +    do_xlate_actions(ofc->actions, ofpact_ct_get_action_len(ofc), ctx,
> +                     is_last_action, false);
> +
> +
> 
> and my test is OK.
> root at debian:~/ovs# ovs-appctl dpctl/dump-flows | grep label
> ct_state(+trk),ct_label(0xffff0005/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:5, bytes:465, used:1.323s, flags:FP.,
> actions:ct(commit,zone=5),3
> ct_state(-trk),recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:11, bytes:1004, used:1.293s, flags:SFP.,
> actions:ct(commit,zone=15,label=0xffff000f/0xffffffff),recirc(0x2)
> ct_state(+trk),ct_label(0xffff000f/0xffff),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=6,frag=no),
> packets:5, bytes:465, used:1.293s, flags:FP.,
> actions:ct(commit,zone=15),3
> 
> 
> 
>> Best regards, Ilya Maximets.
> 
> 
> 



More information about the dev mailing list