[ovs-dev] [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining recirc actions at xlate.

Zoltán Balogh zoltan.balogh at ericsson.com
Mon Jul 17 14:37:41 UTC 2017


Hi Joe,

I used the setup below to measure Tx performance using a single core and 2 x 10G Ethernet links. I generated traffic of 64 byte packets to utilize the 10G bandwidth and measured the Tx processing cost per transmitted packet in nsec.

            +-------------+                  
      dpdk0 |             |                  
         -->o    br-in    |                  
            |             o--> gre0          
            +-------------+                  
                                             
                   --> LOCAL                 
            +-----------o-+                  
            |             | dpdk1            
            |    br-p1    o-->             
            |             |                  
            +-------------+  

This is the result of OVS master with DPDK 16.11.2:

 # dpdk0

 RX packets         : 7037641.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 7730632.90  / sec
 RX rate            : 402.69 MB/sec

 # dpdk1

 TX packets         : 7037641.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 657.73 MB/sec
 TX processing cost per TX packets in nsec : 142.09


This is the result of OVS master + patch with DPDK 16.11.2:

 # dpdk0

 RX packets         : 9386809.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 5381496.40  / sec
 RX rate            : 537.11 MB/sec

 # dpdk1

 TX packets         : 9386809.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 877.29 MB/sec
 TX processing cost per TX packets in nsec : 106.53

As you can see the number of transmitted packets per second increased from 7M to 9.3M. The gain is above 30%

Best regards,
Zoltan

> -----Original Message-----
> From: Joe Stringer [mailto:joe at ovn.org]
> Sent: Saturday, July 15, 2017 1:46 AM
> To: Sugesh Chandran <sugesh.chandran at intel.com>
> Cc: ovs dev <dev at openvswitch.org>; Andy Zhou <azhou at ovn.org>; Zoltán Balogh <zoltan.balogh at ericsson.com>
> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining recirc actions at xlate.
> 
> On 4 July 2017 at 02:46, Sugesh Chandran <sugesh.chandran at intel.com> wrote:
> > This patch set removes the recirculation of encapsulated tunnel packets
> > if possible. It is done by computing the post tunnel actions at the time of
> > translation. The combined nested action set are programmed in the datapath
> > using CLONE action.
> >
> > Signed-off-by: Sugesh Chandran <sugesh.chandran at intel.com>
> > Signed-off-by: Zoltán Balogh <zoltan.balogh at ericsson.com>
> > Co-authored-by: Zoltán Balogh <zoltan.balogh at ericsson.com>
> > ---
> 
> Hi Sugesh, Zoltán,
> 
> Thanks for working on this, it's subtle code but it seems like you've
> found a useful optimization here. Hopefully we can move this forward
> soon.
> 
> Would you be able to put your performance numbers into the commit
> message here? They could be useful to refer back to in the future.
> 
> This patch needs rebasing.
> 
> More feedback below.
> 
> >  lib/dpif-netdev.c                  |  18 +--
> >  ofproto/ofproto-dpif-xlate-cache.c |  14 +-
> >  ofproto/ofproto-dpif-xlate-cache.h |  12 +-
> >  ofproto/ofproto-dpif-xlate.c       | 295 ++++++++++++++++++++++++++++++++++++-
> >  ofproto/ofproto-dpif-xlate.h       |   1 +
> >  ofproto/ofproto-dpif.c             |   3 +-
> >  tests/packet-type-aware.at         |  24 +--
> >  7 files changed, 324 insertions(+), 43 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 4e29085..4d996c1 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,
> >
> >      case OVS_ACTION_ATTR_TUNNEL_PUSH:
> >          if (*depth < MAX_RECIRC_DEPTH) {
> > -            struct dp_packet_batch tnl_pkt;
> > -            struct dp_packet_batch *orig_packets_ = packets_;
> > -            int err;
> > -
> > -            if (!may_steal) {
> > -                dp_packet_batch_clone(&tnl_pkt, packets_);
> > -                packets_ = &tnl_pkt;
> > -                dp_packet_batch_reset_cutlen(orig_packets_);
> > -            }
> > -
> >              dp_packet_batch_apply_cutlen(packets_);
> > -
> > -            err = push_tnl_action(pmd, a, packets_);
> > -            if (!err) {
> > -                (*depth)++;
> > -                dp_netdev_recirculate(pmd, packets_);
> > -                (*depth)--;
> > -            }
> > +            push_tnl_action(pmd, a, packets_);
> 
> If I follow, this was the previous datapath-level code to ensure that
> when packets are output to a tunnel, only the copy of the packet that
> goes to the tunnel gets the cutlen applied. With the new version of
> the code, we replace this by making sure that the ofproto layer
> generates a clone to wrap the push_tunnel and all subsequent bridge
> translation, so then it results in the equivalent behaviour: The
> cutlen is only ever applied to a clone of the packets. Is that right?
> 
> <snip>
> 
> > +/* Validate if the transalated combined actions are OK to proceed.
> > + * If actions consist of TRUNC action, it is not allowed to do the
> > + * tunnel_push combine as it cannot update stats correctly.
> > + */
> > +static bool
> > +is_tunnel_actions_clone_ready(struct ofpbuf *actions)
> > +{
> > +    struct nlattr *tnl_actions;
> > +    const struct nlattr *a;
> > +    unsigned int left;
> > +    size_t actions_len;
> > +
> > +    if (!actions) {
> > +        /* No actions, no harm in doing combine. */
> > +        return true;
> > +    }
> > +    actions_len = actions->size;
> > +
> > +    tnl_actions =(struct nlattr *)(actions->data);
> > +    NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) {
> > +        int type = nl_attr_type(a);
> > +        if (type == OVS_ACTION_ATTR_TRUNC) {
> > +            VLOG_DBG("Cannot do tunnel action-combine on trunc action");
> > +            return false;
> > +            break;
> > +        }
> > +    }
> > +    return true;
> > +}
> > +
> > +/*
> > + * Copy the xc entry and update the references accordingly.
> > + */
> > +static void
> > +copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src)
> 
> I think that this whole function should probably reside in
> ofproto/ofproto-dpif-xlate-cache.c. However, I have an alternative
> proposal detailed below that would not require us to also take
> expensive refcounts due to this approach.
> 
> > +{
> > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);
> > +    switch (xc_src->type) {
> > +    case XC_RULE:
> > +        ofproto_rule_ref(&xc_dst->rule->up);
> > +        break;
> > +    case XC_BOND:
> > +        bond_ref(xc_dst->bond.bond);
> > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,
> > +                            sizeof *xc_src->bond.flow);
> > +        break;
> > +    case XC_NETDEV:
> > +        if (xc_src->dev.tx) {
> > +            netdev_ref(xc_dst->dev.tx);
> > +        }
> > +        if (xc_src->dev.rx) {
> > +            netdev_ref(xc_dst->dev.rx);
> > +        }
> > +        if (xc_src->dev.bfd) {
> > +            bfd_ref(xc_dst->dev.bfd);
> > +        }
> > +        break;
> > +    case XC_NETFLOW:
> > +        netflow_ref(xc_dst->nf.netflow);
> > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src->nf.flow);
> > +        break;
> > +    case XC_MIRROR:
> > +        mbridge_ref(xc_dst->mirror.mbridge);
> > +        break;
> > +    case XC_LEARN:
> > +    case XC_TABLE:
> > +    case XC_NORMAL:
> > +    case XC_FIN_TIMEOUT:
> > +    case XC_GROUP:
> > +    case XC_TNL_NEIGH:
> > +    case XC_CONTROLLER:
> > +    case XC_TUNNEL_HEADER:
> > +    break;
> > +    default:
> > +        OVS_NOT_REACHED();
> > +    }
> > +}
> > +
> > +static bool
> > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
> > +                                      const struct xport *xport,
> > +                                      struct xport *out_dev,
> > +                                      struct ovs_action_push_tnl tnl_push_data)
> > +{
> > +    const struct dpif_flow_stats *backup_resubmit_stats;
> > +    struct xlate_cache *backup_xcache;
> > +    bool nested_act_flag = false;
> > +    struct flow_wildcards tmp_flow_wc;
> > +    struct flow_wildcards *backup_flow_wc_ptr;
> > +    bool backup_side_effects;
> > +    const struct dp_packet *backup_pkt;
> > +
> > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
> > +    backup_flow_wc_ptr = ctx->wc;
> > +    ctx->wc = &tmp_flow_wc;
> > +    ctx->xin->wc = NULL;
> > +    backup_resubmit_stats = ctx->xin->resubmit_stats;
> > +    backup_xcache = ctx->xin->xcache;
> > +    backup_side_effects = ctx->xin->allow_side_effects;
> > +    backup_pkt = ctx->xin->packet;
> > +
> > +    size_t push_action_size = 0;
> > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
> > +                                           OVS_ACTION_ATTR_CLONE);
> > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
> > +    push_action_size = ctx->odp_actions->size;
> > +
> > +    ctx->xin->resubmit_stats =  NULL;
> > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache. */
> > +    ctx->xin->allow_side_effects = false;
> > +    ctx->xin->packet = NULL;
> 
> I realised that you asked about this after the previous patch, but I
> didn't understand the implications of it. I believe that this packet
> is how, for instance, the second bridge can forward packets to the
> controller. If you reset it to NULL here, then I don't think that the
> controller action is executed correctly during the second bridge
> translation. (That's a test we could add).
> 
> I suspect that some multicast snooping stuff and some other pieces
> that would be affected by this.
> 
> > +
> > +    if (ctx->xin->xcache) {
> 
> A few lines above, this is unconditionally set. Could be removed?
> 
> > +        /* Push the cache entry for the tunnel first. */
> > +        struct xc_entry *entry;
> > +        entry = xlate_cache_add_entry(ctx->xin->xcache, XC_TUNNEL_HEADER);
> > +        entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;
> > +        entry->tunnel_hdr.operation = ADD;
> > +    }
> > +
> > +    apply_nested_clone_actions(ctx, xport, out_dev);
> > +    nested_act_flag = is_tunnel_actions_clone_ready(ctx->odp_actions);
> > +
> > +    if (nested_act_flag) {
> > +        struct dpif_flow_stats tmp_resubmit_stats;
> > +        struct xc_entry *entry;
> > +        struct ofpbuf entries;
> > +
> > +        memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats);
> 
> I think that tmp_resubmit_stats is only ever used in the if condition
> below, which will initialize the entire structure anyway so the
> declaration can be moved below, and this memset line can be removed.
> 
> > +         /* Similar to the stats update in revalidation, the x_cache entries
> > +          * are populated by the previous translation are used to update the
> > +          * stats correctly.
> > +         .*/
> > +        ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex);
> 
> Which piece of shared memory is this mutex protecting?
> 
> > +        if (backup_resubmit_stats) {
> > +            memcpy(&tmp_resubmit_stats, backup_resubmit_stats,
> > +                   sizeof tmp_resubmit_stats);
> > +            xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats);
> > +        }
> > +        entries = ctx->xin->xcache->entries;
> > +        XC_ENTRY_FOR_EACH (entry, &entries) {
> > +            struct xc_entry *xc_copy_entry;
> > +            if (!backup_xcache) {
> > +                break;
> 
> Do we need to do this for every single XC entry?
> 
> > +            }
> > +            xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry->type);
> > +            copy_xc_entry(xc_copy_entry, entry);
> 
> I wonder if instead of open-coding this stuff here, instead there was
> a function defined in ofproto-dpif-xlate-cache.c which would append a
> second xlate_cache onto a first. A function prototype like this,
> perhaps?
> 
> /* Iterates through 'src' and removes the xc_entry elements from it,
> and places them inside 'dst' */
> void xlate_cache_steal_entries(struct xlate_cache *dst, struct
> xlate_cache *src);
> 
> The current execution should own both xlate_caches so I think that it
> should be safe to more or less just append the latter xlate_cache onto
> the original one.
> 
> > +        }
> > +        ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex);
> > +    }
> > +    else {
> 
> Please place the else on the same line as the close bracket.
> 
> > +        /* Combine is not valid. */
> > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
> > +        goto out;
> > +    }
> > +    if (ctx->odp_actions->size > push_action_size) {
> > +        /* Update the CLONE action only when combined. */
> > +        nl_msg_end_nested(ctx->odp_actions, clone_ofs);
> > +    }
> > +    else {
> 
> Same style request.
> 
> > +        /* No actions after the tunnel, no need of clone. */
> > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
> > +        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
> 
> If there are no actions after the tunnel, then do we need the tunnel
> header at all?
> 
> Do we have a test case which outputs to a tunnel which goes to another
> bridge that drops the packet, then on the first bridge continues to
> execute some more stuff to make sure this path works in the way we
> expect?


More information about the dev mailing list