[ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for recirc packets.

Fischetti, Antonio antonio.fischetti at intel.com
Wed Aug 2 15:59:29 UTC 2017


> -----Original Message-----
> From: O Mahony, Billy
> Sent: Tuesday, August 1, 2017 11:51 AM
> To: Fischetti, Antonio <antonio.fischetti at intel.com>; dev at openvswitch.org
> Subject: RE: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> recirc packets.
> 
> Hi Antonio,
> 
> Unfortunately I think the performance deltas of this here probably need to be
> re-worked given the bug discovered & fixed in EMC Insertion algorithm here
> which according to the patch notes will significantly reduce EMC contention for
> a given number of flows.
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336452.html

[Antonio] I think this patch and the one you mentioned are 2 different 
approaches with 2 different goals that can work fine together. 

  "Fix emc replacement policy" patch
  ----------------------------------
It allows to select - better than now - which location to overwrite so 
that the emc is used in a smarter way. The usecase here is the general
emc replacement management, also with very few flows, ie 50 - 100 
active flows.
In case you have to choose between 2 active flows it will decide  
with a criteria based on a good random value.

  This patch
  ----------
This patch is instead targeting a 'congestion' usecase where you already have 
the EMC quite full and also recirculation(s). A typical example is a 
firewall keeping track of a tens of thousands of connections. A better 
example would be a scenario - as Jan S. mentioned in one of the last 
Community calls - with 'more than 1' recirculation.
It also defines a criteria to avoid lookups.

I think both patches can work together.


> 
> However, before you commit more effort I would like to post a proposal to the
> list on a more generalized EMC load-shedding mechanism which I think could be
> more effective as it would be more granular than shedding just re-circulated
> traffic. I hope to post that today.

[Antonio] I'll have a look.


> 
> Regards,
> /Billy
> 
> > -----Original Message-----
> > From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-
> > bounces at openvswitch.org] On Behalf Of antonio.fischetti at intel.com
> > Sent: Wednesday, July 19, 2017 5:05 PM
> > To: dev at openvswitch.org
> > Subject: [ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for
> > recirc packets.
> >
> > When OVS is configured as a firewall, with thousands of active concurrent
> > connections, the EMC gets quicly saturated and may come under heavy
> > thrashing for the reason that original and recirculated packets keep
> overwrite
> > existing active EMC entries due to its limited size (8k).
> >
> > This thrashing causes the EMC to be less efficient than the dcpls in terms of
> > lookups and insertions.
> >
> > This patch allows to use the EMC efficiently by allowing only the 'original'
> > packets to hit EMC. All recirculated packets are sent to the classifier
> directly.
> > An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for
> > EMC occupancy is set to trigger this logic. By doing so when EMC utilization
> > exceeds
> > EMC_RECIRCT_NO_INSERT_THRESHOLD:
> >  - EMC Insertions are allowed just for original packets. EMC insertion
> >    and look up is skipped for recirculated packets.
> >  - Recirculated packets are sent to the classifier.
> >
> > This patch is based on patch
> > "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
> > Also, this patch depends on the previous one in this series.
> >
> > Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
> > Signed-off-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodireddy at intel.com>
> > Co-authored-by: Bhanuprakash Bodireddy
> > <bhanuprakash.bodireddy at intel.com>
> > ---
> > In our Connection Tracker testbench set up with
> >
> >  table=0, priority=1 actions=drop
> >  table=0, priority=10,arp actions=NORMAL  table=0, priority=100,ct_state=-
> > trk,ip actions=ct(table=1)  table=1, ct_state=+new+trk,ip,in_port=1
> > actions=ct(commit),output:2  table=1, ct_state=+est+trk,ip,in_port=1
> > actions=output:2  table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> > table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> >
> > we saw the following performance improvement.
> >
> > We measured packet Rx rate (regardless of packet loss). Bidirectional test
> > with 64B UDP packets.
> > Each row is a test with a different number of traffic streams. The traffic
> > generator is set so that each stream establishes one UDP connection.
> > Mpps columns reports the Rx rates on the 2 sides.
> >
> >           +----------------------+-----------------------+
> >           |  Original OvS-DPDK   |    Previous case      |
> >           |  + patches #1,2      |    + this patch       |
> >  ---------+------------+---------+------------+----------+
> >   Traffic |     Rx     |   EMC   |     Rx     |   EMC    |
> >   Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
> >  ---------+------------+---------+------------+----------+
> >       10  | 2.60, 2.67 |    20   | 2.60, 2.64 |    20    |
> >      100  | 2.53, 2.58 |   200   | 2.59, 2.61 |   201    |
> >    1,000  | 2.02, 2.03 |  1929   | 2.15, 2.15 |  1997    |
> >    2,000  | 1.94, 1.96 |  3661   | 1.97, 1.98 |  3668    |
> >    3,000  | 1.87, 1.90 |  5086   | 1.96, 1.98 |  4736    |
> >    4,000  | 1.82, 1.82 |  6173   | 1.95, 1.94 |  5280    |
> >   10,000  | 1.68, 1.69 |  7826   | 1.84, 1.84 |  7102    |
> >   30,000  | 1.57, 1.58 |  8192   | 1.68, 1.70 |  8192    |
> >  ---------+------------+---------+------------+----------+
> >
> > This test setup implies 1 recirculation on each received packet.
> > We didn't check this patch in a test scenario where more than 1 recirculation
> > is occurring per packet.
> >
> >  lib/dpif-netdev.c | 63
> > ++++++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 58 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9562827..79efce6
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
> >      packet_batch_per_flow_update(batch, pkt, mf);  }
> >
> > +/* Threshold to skip EMC for recirculated packets. */ #define
> > +EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
> > +
> >  /* Try to process all ('cnt') the 'packets' using only the exact match cache
> >   * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
> >   * miniflow is copied into 'keys' and the packet pointer is moved at the @@
> -
> > 4620,15 +4623,39 @@ emc_processing(struct dp_netdev_pmd_thread
> > *pmd,
> >          miniflow_extract(packet, &key->mf);
> >          key->len = 0; /* Not computed yet. */
> >
> > -        /* If EMC is disabled skip hash computation and emc_lookup */
> > +        /*
> > +         * EMC lookup is skipped when one or both of the following
> > +         * two cases occurs:
> > +         *
> > +         *   - EMC is disabled.  This is detected from cur_min.
> > +         *
> > +         *   - The EMC occupancy exceeds
> > EMC_RECIRCT_NO_INSERT_THRESHOLD and
> > +         *     the packet to be classified is being recirculated.  When this
> > +         *     happens also EMC insertions are skipped for recirculated
> > +         *     packets.  So that EMC is used just to store entries which
> > +         *     are hit from the 'original' packets.  This way the EMC
> > +         *     thrashing is mitigated with a benefit on performance.
> > +         */
> >          if (OVS_LIKELY(cur_min)) {
> >              if (!md_is_valid) {
> > +                /* This is an original packet.  As it is not recirculated
> > +                 * we can retrieve the 5-tuple hash value without
> considering
> > +                 * the recirc id. */
> >                  key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
> >                          &key->mf);
> > +                flow = emc_lookup(flow_cache, key);
> >              } else {
> > -                key->hash = dpif_netdev_packet_get_rss_hash(packet, &key-
> >mf);
> > +                /* Recirculated packet. */
> > +                if (flow_cache->n_entries &
> > EMC_RECIRCT_NO_INSERT_THRESHOLD) {
> > +                    /* EMC occupancy is over the threshold.  We skip EMC
> > +                     * lookup for recirculated packets. */
> > +                    flow = NULL;
> > +                } else {
> > +                    key->hash = dpif_netdev_packet_get_rss_hash(packet,
> > +                            &key->mf);
> > +                    flow = emc_lookup(flow_cache, key);
> > +                }
> >              }
> > -            flow = emc_lookup(flow_cache, key);
> >          } else {
> >              flow = NULL;
> >          }
> > @@ -4716,7 +4743,20 @@ handle_packet_upcall(struct
> > dp_netdev_pmd_thread *pmd,
> >                                               add_actions->size);
> >          }
> >          ovs_mutex_unlock(&pmd->flow_mutex);
> > -        emc_probabilistic_insert(pmd, key, netdev_flow);
> > +        /* EMC insertion can be skipped by a probabilistic criteria or
> > +         * - in case of recirculated packets - depending on the number of
> > +         * EMC entries. */
> > +        if (!packet->md.recirc_id) {
> > +            emc_probabilistic_insert(pmd, key, netdev_flow);
> > +        } else {
> > +            /* Recirculated packets.  When EMC occupancy goes over
> > +             * a threshold we avoid inserting new entries. */
> > +            if (!(pmd->flow_cache.n_entries &
> > +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> > +                /* Still under the threshold. */
> > +                emc_probabilistic_insert(pmd, key, netdev_flow);
> > +            }
> > +        }
> >      }
> >  }
> >
> > @@ -4809,7 +4849,20 @@ fast_path_processing(struct
> > dp_netdev_pmd_thread *pmd,
> >
> >          flow = dp_netdev_flow_cast(rules[i]);
> >
> > -        emc_probabilistic_insert(pmd, &keys[i], flow);
> > +        /* EMC insertion can be skipped by a probabilistic criteria or
> > +         * - in case of recirculated packets - depending on the number of
> > +         * EMC entries. */
> > +        if (!packet->md.recirc_id) {
> > +            emc_probabilistic_insert(pmd, &keys[i], flow);
> > +        } else {
> > +            /* Recirculated packets.  When EMC occupancy goes over
> > +             * a threshold we avoid inserting new entries. */
> > +            if (!(pmd->flow_cache.n_entries &
> > +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> > +                /* Still under the threshold. */
> > +                emc_probabilistic_insert(pmd, &keys[i], flow);
> > +            }
> > +        }
> >          dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> > n_batches);
> >      }
> >
> > --
> > 2.4.11
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list