[ovs-dev] [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining recirc actions at xlate.

Chandran, Sugesh sugesh.chandran at intel.com
Mon Jul 17 22:17:23 UTC 2017


Comments inline

Regards
_Sugesh

> -----Original Message-----
> From: Joe Stringer [mailto:joe at ovn.org]
> Sent: Monday, July 17, 2017 7:33 PM
> To: Chandran, Sugesh <sugesh.chandran at intel.com>
> Cc: Zoltán Balogh <zoltan.balogh at ericsson.com>; ovs dev
> <dev at openvswitch.org>; Andy Zhou <azhou at ovn.org>
> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining
> recirc actions at xlate.
> 
> On 17 July 2017 at 10:27, Chandran, Sugesh <sugesh.chandran at intel.com>
> wrote:
> >> > -----Original Message-----
> >> > From: Joe Stringer [mailto:joe at ovn.org]
> >> > Sent: Saturday, July 15, 2017 1:46 AM
> >> > To: Sugesh Chandran <sugesh.chandran at intel.com>
> >> > Cc: ovs dev <dev at openvswitch.org>; Andy Zhou <azhou at ovn.org>;
> >> > Zoltán Balogh <zoltan.balogh at ericsson.com>
> >> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by
> >> > combining
> >> recirc actions at xlate.
> >> >
> >> > On 4 July 2017 at 02:46, Sugesh Chandran
> >> > <sugesh.chandran at intel.com>
> >> wrote:
> 
> <snip>
> 
> >> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> >> > > 4e29085..4d996c1 100644
> >> > > --- a/lib/dpif-netdev.c
> >> > > +++ b/lib/dpif-netdev.c
> >> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct
> >> > > dp_packet_batch *packets_,
> >> > >
> >> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:
> >> > >          if (*depth < MAX_RECIRC_DEPTH) {
> >> > > -            struct dp_packet_batch tnl_pkt;
> >> > > -            struct dp_packet_batch *orig_packets_ = packets_;
> >> > > -            int err;
> >> > > -
> >> > > -            if (!may_steal) {
> >> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);
> >> > > -                packets_ = &tnl_pkt;
> >> > > -                dp_packet_batch_reset_cutlen(orig_packets_);
> >> > > -            }
> >> > > -
> >> > >              dp_packet_batch_apply_cutlen(packets_);
> >> > > -
> >> > > -            err = push_tnl_action(pmd, a, packets_);
> >> > > -            if (!err) {
> >> > > -                (*depth)++;
> >> > > -                dp_netdev_recirculate(pmd, packets_);
> >> > > -                (*depth)--;
> >> > > -            }
> >> > > +            push_tnl_action(pmd, a, packets_);
> >> >
> >> > If I follow, this was the previous datapath-level code to ensure
> >> > that when packets are output to a tunnel, only the copy of the
> >> > packet that goes to the tunnel gets the cutlen applied. With the
> >> > new version of the code, we replace this by making sure that the
> >> > ofproto layer generates a clone to wrap the push_tunnel and all
> >> > subsequent bridge translation, so then it results in the equivalent
> >> > behaviour: The cutlen is only ever applied to a clone of the packets. Is
> that right?
> > [Sugesh] Hmm. That’s the good catch. Looks like the packets are not
> > cloned when combine_tunnel_action is found impossible in ofproto layer.
> >
> > As a fix we should keep the code block in 'may_steal' to handle the
> > non tunnel-combine case.
> 
> Oh, really? I thought that we always end up with a clone wrapping the
> tunnel_push now, which means we shouldn't need this bit any more. My
> comment was intended just to ask if I understood correctly to make sure we
> don't miss anything.
> 
> > Any other suggestion to apply cut_len in ofproto?
> 
> Unfortunately the cut_len is set up through a previous truncate action, and it
> is supposed to apply to the "next output action" (and reset afterwards), so it
> is part of the datapath actions interface definition; I don't think there's a way
> to push it up to ofproto.
[Sugesh] Or at ofproto layer,  the non-combine tunnel push also placed in a clone, similar to the
combine case? That will help to keep the datapath as simple as just a push.
> 
> >> >
> >> > > +{
> >> > > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);
> >> > > +    switch (xc_src->type) {
> >> > > +    case XC_RULE:
> >> > > +        ofproto_rule_ref(&xc_dst->rule->up);
> >> > > +        break;
> >> > > +    case XC_BOND:
> >> > > +        bond_ref(xc_dst->bond.bond);
> >> > > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,
> >> > > +                            sizeof *xc_src->bond.flow);
> >> > > +        break;
> >> > > +    case XC_NETDEV:
> >> > > +        if (xc_src->dev.tx) {
> >> > > +            netdev_ref(xc_dst->dev.tx);
> >> > > +        }
> >> > > +        if (xc_src->dev.rx) {
> >> > > +            netdev_ref(xc_dst->dev.rx);
> >> > > +        }
> >> > > +        if (xc_src->dev.bfd) {
> >> > > +            bfd_ref(xc_dst->dev.bfd);
> >> > > +        }
> >> > > +        break;
> >> > > +    case XC_NETFLOW:
> >> > > +        netflow_ref(xc_dst->nf.netflow);
> >> > > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof
> >> > > +*xc_src-
> >> >nf.flow);
> >> > > +        break;
> >> > > +    case XC_MIRROR:
> >> > > +        mbridge_ref(xc_dst->mirror.mbridge);
> >> > > +        break;
> >> > > +    case XC_LEARN:
> >> > > +    case XC_TABLE:
> >> > > +    case XC_NORMAL:
> >> > > +    case XC_FIN_TIMEOUT:
> >> > > +    case XC_GROUP:
> >> > > +    case XC_TNL_NEIGH:
> >> > > +    case XC_CONTROLLER:
> >> > > +    case XC_TUNNEL_HEADER:
> >> > > +    break;
> >> > > +    default:
> >> > > +        OVS_NOT_REACHED();
> >> > > +    }
> >> > > +}
> >> > > +
> >> > > +static bool
> >> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
> >> > > +                                      const struct xport *xport,
> >> > > +                                      struct xport *out_dev,
> >> > > +                                      struct ovs_action_push_tnl
> >> > > +tnl_push_data) {
> >> > > +    const struct dpif_flow_stats *backup_resubmit_stats;
> >> > > +    struct xlate_cache *backup_xcache;
> >> > > +    bool nested_act_flag = false;
> >> > > +    struct flow_wildcards tmp_flow_wc;
> >> > > +    struct flow_wildcards *backup_flow_wc_ptr;
> >> > > +    bool backup_side_effects;
> >> > > +    const struct dp_packet *backup_pkt;
> >> > > +
> >> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
> >> > > +    backup_flow_wc_ptr = ctx->wc;
> >> > > +    ctx->wc = &tmp_flow_wc;
> >> > > +    ctx->xin->wc = NULL;
> >> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;
> >> > > +    backup_xcache = ctx->xin->xcache;
> >> > > +    backup_side_effects = ctx->xin->allow_side_effects;
> >> > > +    backup_pkt = ctx->xin->packet;
> >> > > +
> >> > > +    size_t push_action_size = 0;
> >> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
> >> > > +                                           OVS_ACTION_ATTR_CLONE);
> >> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
> >> > > +    push_action_size = ctx->odp_actions->size;
> >> > > +
> >> > > +    ctx->xin->resubmit_stats =  NULL;
> >> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary
> cache.
> >> */
> >> > > +    ctx->xin->allow_side_effects = false;
> >> > > +    ctx->xin->packet = NULL;
> >> >
> >> > I realised that you asked about this after the previous patch, but
> >> > I didn't understand the implications of it. I believe that this
> >> > packet is how, for instance, the second bridge can forward packets
> >> > to the controller. If you reset it to NULL here, then I don't think
> >> > that the controller action is executed correctly during the second
> >> > bridge translation. (That's a test we could add).
> > [Sugesh] Yes, that’s a good test case for it.
> >> >
> >> > I suspect that some multicast snooping stuff and some other pieces
> >> > that would be affected by this.
> > [Sugesh] This is interesting. So does the controller expects a encaped
> > packets in that case? Do you think, the controller action also be an
> > exception case like the TRUNC?
> 
> Yes, I think the controller expects encapped packets in this case.
> Yes, I think that the easiest way to handle this is to put controller in that
> exception case with truncate.
[Sugesh] Ok.  Will add that case as an exception.

Before that, Would like to clarify the controller output action bit more
How does the following case are currently handled that involves controller action
(We haven’t run any test case that involves a controller and very limited knowledge on that side)

MOD_VLAN(10) --> (PATCH-PORT) --> (SECOND-BR) 
What if there is no rules to handle the packet in 'second-br' and wanted to report to controller?

Will the packet report with VLAN 10 to controller?

> 
> > Do we have a case where send the encap traffic to controller?
> 
> Logically, the second bridge should only see the encapped traffic so if there's
> any way to send to controller, or sample for things like IPFIX and so on, then
> those external entities should observe encap traffic. I can't think of many
> cases on transmit - debugging exactly what is happening to packets would be
> the main one.
[Sugesh] ok. Thank you for this input.

> 
> >> >
> >> > > +
> >> > > +    if (ctx->xin->xcache) {
> >> >
> >> > A few lines above, this is unconditionally set. Could be removed?
> > [Sugesh] I think it is newly created and its possible to have a NULL
> > incase of malloc failure. That’s why added the case.
> 
> OK. For what it's worth, OVS xmalloc() guarantees a valid object so this
> should be unnecessary.
[Sugesh] Ok, will remove it. 


More information about the dev mailing list