[ovs-dev] [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining recirc actions at xlate.

Chandran, Sugesh sugesh.chandran at intel.com
Mon Jul 17 21:44:46 UTC 2017


Hi Joe,
Thank you for the comments., 
Please find my answers below

Regards
_Sugesh


> -----Original Message-----
> From: Zoltán Balogh [mailto:zoltan.balogh at ericsson.com]
> Sent: Monday, July 17, 2017 3:38 PM
> To: Joe Stringer <joe at ovn.org>; Chandran, Sugesh
> <sugesh.chandran at intel.com>
> Cc: ovs dev <dev at openvswitch.org>; Andy Zhou <azhou at ovn.org>
> Subject: RE: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining
> recirc actions at xlate.
> 
> 
> Hi Joe,
> 
> I used the setup below to measure Tx performance using a single core and 2
> x 10G Ethernet links. I generated traffic of 64 byte packets to utilize the 10G
> bandwidth and measured the Tx processing cost per transmitted packet in
> nsec.
> 
>             +-------------+
>       dpdk0 |             |
>          -->o    br-in    |
>             |             o--> gre0
>             +-------------+
> 
>                    --> LOCAL
>             +-----------o-+
>             |             | dpdk1
>             |    br-p1    o-->
>             |             |
>             +-------------+
> 
> This is the result of OVS master with DPDK 16.11.2:
> 
>  # dpdk0
> 
>  RX packets         : 7037641.60  / sec
>  RX packet errors   : 0  / sec
>  RX packets dropped : 7730632.90  / sec
>  RX rate            : 402.69 MB/sec
> 
>  # dpdk1
> 
>  TX packets         : 7037641.60  / sec
>  TX packet errors   : 0  / sec
>  TX packets dropped : 0  / sec
>  TX rate            : 657.73 MB/sec
>  TX processing cost per TX packets in nsec : 142.09
> 
> 
> This is the result of OVS master + patch with DPDK 16.11.2:
> 
>  # dpdk0
> 
>  RX packets         : 9386809.60  / sec
>  RX packet errors   : 0  / sec
>  RX packets dropped : 5381496.40  / sec
>  RX rate            : 537.11 MB/sec
> 
>  # dpdk1
> 
>  TX packets         : 9386809.60  / sec
>  TX packet errors   : 0  / sec
>  TX packets dropped : 0  / sec
>  TX rate            : 877.29 MB/sec
>  TX processing cost per TX packets in nsec : 106.53
> 
> As you can see the number of transmitted packets per second increased
> from 7M to 9.3M. The gain is above 30%
> 
> Best regards,
> Zoltan
> 
> > -----Original Message-----
> > From: Joe Stringer [mailto:joe at ovn.org]
> > Sent: Saturday, July 15, 2017 1:46 AM
> > To: Sugesh Chandran <sugesh.chandran at intel.com>
> > Cc: ovs dev <dev at openvswitch.org>; Andy Zhou <azhou at ovn.org>; Zoltán
> > Balogh <zoltan.balogh at ericsson.com>
> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining
> recirc actions at xlate.
> >
> > On 4 July 2017 at 02:46, Sugesh Chandran <sugesh.chandran at intel.com>
> wrote:
> > > This patch set removes the recirculation of encapsulated tunnel
> > > packets if possible. It is done by computing the post tunnel actions
> > > at the time of translation. The combined nested action set are
> > > programmed in the datapath using CLONE action.
> > >
> > > Signed-off-by: Sugesh Chandran <sugesh.chandran at intel.com>
> > > Signed-off-by: Zoltán Balogh <zoltan.balogh at ericsson.com>
> > > Co-authored-by: Zoltán Balogh <zoltan.balogh at ericsson.com>
> > > ---
> >
> > Hi Sugesh, Zoltán,
> >
> > Thanks for working on this, it's subtle code but it seems like you've
> > found a useful optimization here. Hopefully we can move this forward
> > soon.
> >
[Sugesh] Thank you Joe!. Will try to do the best to cover most of the corner cases.
> > Would you be able to put your performance numbers into the commit
> > message here? They could be useful to refer back to in the future.
[Sugesh] Sure, Zoltan already ran performance benchmarking and shared the results.
Will add that in the next version of patch.
> >
> > This patch needs rebasing.
[Sugesh] Will do that in the next version.
> >
> > More feedback below.
> >
> > >  lib/dpif-netdev.c                  |  18 +--
> > >  ofproto/ofproto-dpif-xlate-cache.c |  14 +-
> > > ofproto/ofproto-dpif-xlate-cache.h |  12 +-
> > >  ofproto/ofproto-dpif-xlate.c       | 295
> ++++++++++++++++++++++++++++++++++++-
> > >  ofproto/ofproto-dpif-xlate.h       |   1 +
> > >  ofproto/ofproto-dpif.c             |   3 +-
> > >  tests/packet-type-aware.at         |  24 +--
> > >  7 files changed, 324 insertions(+), 43 deletions(-)
> > >
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
> > > 4e29085..4d996c1 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct
> > > dp_packet_batch *packets_,
> > >
> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:
> > >          if (*depth < MAX_RECIRC_DEPTH) {
> > > -            struct dp_packet_batch tnl_pkt;
> > > -            struct dp_packet_batch *orig_packets_ = packets_;
> > > -            int err;
> > > -
> > > -            if (!may_steal) {
> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);
> > > -                packets_ = &tnl_pkt;
> > > -                dp_packet_batch_reset_cutlen(orig_packets_);
> > > -            }
> > > -
> > >              dp_packet_batch_apply_cutlen(packets_);
> > > -
> > > -            err = push_tnl_action(pmd, a, packets_);
> > > -            if (!err) {
> > > -                (*depth)++;
> > > -                dp_netdev_recirculate(pmd, packets_);
> > > -                (*depth)--;
> > > -            }
> > > +            push_tnl_action(pmd, a, packets_);
> >
> > If I follow, this was the previous datapath-level code to ensure that
> > when packets are output to a tunnel, only the copy of the packet that
> > goes to the tunnel gets the cutlen applied. With the new version of
> > the code, we replace this by making sure that the ofproto layer
> > generates a clone to wrap the push_tunnel and all subsequent bridge
> > translation, so then it results in the equivalent behaviour: The
> > cutlen is only ever applied to a clone of the packets. Is that right?
[Sugesh] Hmm. That’s the good catch. Looks like the packets are not
cloned when combine_tunnel_action is found impossible in ofproto layer.

As a fix we should keep the code block in 'may_steal' to handle the non tunnel-combine
case.
Any other suggestion to apply cut_len in ofproto?

> >
> > <snip>
> >
> > > +/* Validate if the transalated combined actions are OK to proceed.
> > > + * If actions consist of TRUNC action, it is not allowed to do the
> > > + * tunnel_push combine as it cannot update stats correctly.
> > > + */
> > > +static bool
> > > +is_tunnel_actions_clone_ready(struct ofpbuf *actions) {
> > > +    struct nlattr *tnl_actions;
> > > +    const struct nlattr *a;
> > > +    unsigned int left;
> > > +    size_t actions_len;
> > > +
> > > +    if (!actions) {
> > > +        /* No actions, no harm in doing combine. */
> > > +        return true;
> > > +    }
> > > +    actions_len = actions->size;
> > > +
> > > +    tnl_actions =(struct nlattr *)(actions->data);
> > > +    NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) {
> > > +        int type = nl_attr_type(a);
> > > +        if (type == OVS_ACTION_ATTR_TRUNC) {
> > > +            VLOG_DBG("Cannot do tunnel action-combine on trunc action");
> > > +            return false;
> > > +            break;
> > > +        }
> > > +    }
> > > +    return true;
> > > +}
> > > +
> > > +/*
> > > + * Copy the xc entry and update the references accordingly.
> > > + */
> > > +static void
> > > +copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src)
> >
> > I think that this whole function should probably reside in
> > ofproto/ofproto-dpif-xlate-cache.c. However, I have an alternative
> > proposal detailed below that would not require us to also take
> > expensive refcounts due to this approach.
[Sugesh] Will introduce a new function in xlate-cache to handle it.
> >
> > > +{
> > > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);
> > > +    switch (xc_src->type) {
> > > +    case XC_RULE:
> > > +        ofproto_rule_ref(&xc_dst->rule->up);
> > > +        break;
> > > +    case XC_BOND:
> > > +        bond_ref(xc_dst->bond.bond);
> > > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,
> > > +                            sizeof *xc_src->bond.flow);
> > > +        break;
> > > +    case XC_NETDEV:
> > > +        if (xc_src->dev.tx) {
> > > +            netdev_ref(xc_dst->dev.tx);
> > > +        }
> > > +        if (xc_src->dev.rx) {
> > > +            netdev_ref(xc_dst->dev.rx);
> > > +        }
> > > +        if (xc_src->dev.bfd) {
> > > +            bfd_ref(xc_dst->dev.bfd);
> > > +        }
> > > +        break;
> > > +    case XC_NETFLOW:
> > > +        netflow_ref(xc_dst->nf.netflow);
> > > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src-
> >nf.flow);
> > > +        break;
> > > +    case XC_MIRROR:
> > > +        mbridge_ref(xc_dst->mirror.mbridge);
> > > +        break;
> > > +    case XC_LEARN:
> > > +    case XC_TABLE:
> > > +    case XC_NORMAL:
> > > +    case XC_FIN_TIMEOUT:
> > > +    case XC_GROUP:
> > > +    case XC_TNL_NEIGH:
> > > +    case XC_CONTROLLER:
> > > +    case XC_TUNNEL_HEADER:
> > > +    break;
> > > +    default:
> > > +        OVS_NOT_REACHED();
> > > +    }
> > > +}
> > > +
> > > +static bool
> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
> > > +                                      const struct xport *xport,
> > > +                                      struct xport *out_dev,
> > > +                                      struct ovs_action_push_tnl
> > > +tnl_push_data) {
> > > +    const struct dpif_flow_stats *backup_resubmit_stats;
> > > +    struct xlate_cache *backup_xcache;
> > > +    bool nested_act_flag = false;
> > > +    struct flow_wildcards tmp_flow_wc;
> > > +    struct flow_wildcards *backup_flow_wc_ptr;
> > > +    bool backup_side_effects;
> > > +    const struct dp_packet *backup_pkt;
> > > +
> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
> > > +    backup_flow_wc_ptr = ctx->wc;
> > > +    ctx->wc = &tmp_flow_wc;
> > > +    ctx->xin->wc = NULL;
> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;
> > > +    backup_xcache = ctx->xin->xcache;
> > > +    backup_side_effects = ctx->xin->allow_side_effects;
> > > +    backup_pkt = ctx->xin->packet;
> > > +
> > > +    size_t push_action_size = 0;
> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
> > > +                                           OVS_ACTION_ATTR_CLONE);
> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
> > > +    push_action_size = ctx->odp_actions->size;
> > > +
> > > +    ctx->xin->resubmit_stats =  NULL;
> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache.
> */
> > > +    ctx->xin->allow_side_effects = false;
> > > +    ctx->xin->packet = NULL;
> >
> > I realised that you asked about this after the previous patch, but I
> > didn't understand the implications of it. I believe that this packet
> > is how, for instance, the second bridge can forward packets to the
> > controller. If you reset it to NULL here, then I don't think that the
> > controller action is executed correctly during the second bridge
> > translation. (That's a test we could add).
[Sugesh] Yes, that’s a good test case for it.
> >
> > I suspect that some multicast snooping stuff and some other pieces
> > that would be affected by this.
[Sugesh] This is interesting. So does the controller expects a encaped
packets in that case? Do you think, the controller action
also be an exception case like the TRUNC?

Do we have a case where send the encap traffic to controller?
	
> >
> > > +
> > > +    if (ctx->xin->xcache) {
> >
> > A few lines above, this is unconditionally set. Could be removed?
[Sugesh] I think it is newly created and its possible to have a NULL incase
of malloc failure. That’s why added the case.
> >
> > > +        /* Push the cache entry for the tunnel first. */
> > > +        struct xc_entry *entry;
> > > +        entry = xlate_cache_add_entry(ctx->xin->xcache,
> XC_TUNNEL_HEADER);
> > > +        entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;
> > > +        entry->tunnel_hdr.operation = ADD;
> > > +    }
> > > +
> > > +    apply_nested_clone_actions(ctx, xport, out_dev);
> > > +    nested_act_flag =
> > > + is_tunnel_actions_clone_ready(ctx->odp_actions);
> > > +
> > > +    if (nested_act_flag) {
> > > +        struct dpif_flow_stats tmp_resubmit_stats;
> > > +        struct xc_entry *entry;
> > > +        struct ofpbuf entries;
> > > +
> > > +        memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats);
> >
> > I think that tmp_resubmit_stats is only ever used in the if condition
> > below, which will initialize the entire structure anyway so the
> > declaration can be moved below, and this memset line can be removed.
[Sugesh] Sure. Will move it down.
> >
> > > +         /* Similar to the stats update in revalidation, the x_cache entries
> > > +          * are populated by the previous translation are used to update the
> > > +          * stats correctly.
> > > +         .*/
> > > +        ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex);
> >
> > Which piece of shared memory is this mutex protecting?
[Sugesh] Will remove it.
> >
> > > +        if (backup_resubmit_stats) {
> > > +            memcpy(&tmp_resubmit_stats, backup_resubmit_stats,
> > > +                   sizeof tmp_resubmit_stats);
> > > +            xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats);
> > > +        }
> > > +        entries = ctx->xin->xcache->entries;
> > > +        XC_ENTRY_FOR_EACH (entry, &entries) {
> > > +            struct xc_entry *xc_copy_entry;
> > > +            if (!backup_xcache) {
> > > +                break;
> >
> > Do we need to do this for every single XC entry?
[Sugesh] Will remove in next release .
> >
> > > +            }
> > > +            xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry-
> >type);
> > > +            copy_xc_entry(xc_copy_entry, entry);
> >
> > I wonder if instead of open-coding this stuff here, instead there was
> > a function defined in ofproto-dpif-xlate-cache.c which would append a
> > second xlate_cache onto a first. A function prototype like this,
> > perhaps?
> >
> > /* Iterates through 'src' and removes the xc_entry elements from it,
> > and places them inside 'dst' */ void xlate_cache_steal_entries(struct
> > xlate_cache *dst, struct xlate_cache *src);
[Sugesh] Make sense, will create a function to append cache entries and clear
off the src once the copying completed.
> >
> > The current execution should own both xlate_caches so I think that it
> > should be safe to more or less just append the latter xlate_cache onto
> > the original one.
> >
> > > +        }
> > > +        ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex);
> > > +    }
> > > +    else {
> >
> > Please place the else on the same line as the close bracket.
[Sugesh] ok
> >
> > > +        /* Combine is not valid. */
> > > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
> > > +        goto out;
> > > +    }
> > > +    if (ctx->odp_actions->size > push_action_size) {
> > > +        /* Update the CLONE action only when combined. */
> > > +        nl_msg_end_nested(ctx->odp_actions, clone_ofs);
> > > +    }
> > > +    else {
> >
> > Same style request.
> >
> > > +        /* No actions after the tunnel, no need of clone. */
> > > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
> > > +        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
> >
> > If there are no actions after the tunnel, then do we need the tunnel
> > header at all?
[Sugesh] There are test cases that does it. I kept it for cover those.
> >
> > Do we have a test case which outputs to a tunnel which goes to another
> > bridge that drops the packet, then on the first bridge continues to
> > execute some more stuff to make sure this path works in the way we
> > expect?


More information about the dev mailing list