[ovs-dev] [PATCH v2 3/5] dpif-netdev: Skip EMC lookup/insert for recirc packets.

antonio.fischetti at intel.com antonio.fischetti at intel.com
Wed Jul 19 16:04:55 UTC 2017


When OVS is configured as a firewall, with thousands of active
concurrent connections, the EMC gets quicly saturated and may come under
heavy thrashing for the reason that original and recirculated packets
keep overwrite existing active EMC entries due to its limited size (8k).

This thrashing causes the EMC to be less efficient than the dcpls in
terms of lookups and insertions.

This patch allows to use the EMC efficiently by allowing only the 'original'
packets to hit EMC. All recirculated packets are sent to the classifier directly.
An empirical threshold (EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50%) for EMC
occupancy is set to trigger this logic. By doing so when EMC utilization exceeds
EMC_RECIRCT_NO_INSERT_THRESHOLD:
 - EMC Insertions are allowed just for original packets. EMC insertion
   and look up is skipped for recirculated packets.
 - Recirculated packets are sent to the classifier.

This patch is based on patch
"dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327570.html
Also, this patch depends on the previous one in this series.

Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
---
In our Connection Tracker testbench set up with

 table=0, priority=1 actions=drop
 table=0, priority=10,arp actions=NORMAL
 table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
 table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
 table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
 table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1

we saw the following performance improvement.

We measured packet Rx rate (regardless of packet loss). Bidirectional
test with 64B UDP packets.
Each row is a test with a different number of traffic streams. The traffic
generator is set so that each stream establishes one UDP connection.
Mpps columns reports the Rx rates on the 2 sides.

          +----------------------+-----------------------+
          |  Original OvS-DPDK   |    Previous case      |
          |  + patches #1,2      |    + this patch       |
 ---------+------------+---------+------------+----------+
  Traffic |     Rx     |   EMC   |     Rx     |   EMC    |
  Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
 ---------+------------+---------+------------+----------+
      10  | 2.60, 2.67 |    20   | 2.60, 2.64 |    20    |
     100  | 2.53, 2.58 |   200   | 2.59, 2.61 |   201    | 
   1,000  | 2.02, 2.03 |  1929   | 2.15, 2.15 |  1997    |
   2,000  | 1.94, 1.96 |  3661   | 1.97, 1.98 |  3668    |
   3,000  | 1.87, 1.90 |  5086   | 1.96, 1.98 |  4736    |    
   4,000  | 1.82, 1.82 |  6173   | 1.95, 1.94 |  5280    |        
  10,000  | 1.68, 1.69 |  7826   | 1.84, 1.84 |  7102    |     
  30,000  | 1.57, 1.58 |  8192   | 1.68, 1.70 |  8192    | 
 ---------+------------+---------+------------+----------+

This test setup implies 1 recirculation on each received packet.
We didn't check this patch in a test scenario where more than 1
recirculation is occurring per packet.

 lib/dpif-netdev.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9562827..79efce6 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4573,6 +4573,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
     packet_batch_per_flow_update(batch, pkt, mf);
 }
 
+/* Threshold to skip EMC for recirculated packets. */
+#define EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
+
 /* Try to process all ('cnt') the 'packets' using only the exact match cache
  * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]', the
  * miniflow is copied into 'keys' and the packet pointer is moved at the
@@ -4620,15 +4623,39 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
         miniflow_extract(packet, &key->mf);
         key->len = 0; /* Not computed yet. */
 
-        /* If EMC is disabled skip hash computation and emc_lookup */
+        /*
+         * EMC lookup is skipped when one or both of the following
+         * two cases occurs:
+         *
+         *   - EMC is disabled.  This is detected from cur_min.
+         *
+         *   - The EMC occupancy exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD and
+         *     the packet to be classified is being recirculated.  When this
+         *     happens also EMC insertions are skipped for recirculated
+         *     packets.  So that EMC is used just to store entries which
+         *     are hit from the 'original' packets.  This way the EMC
+         *     thrashing is mitigated with a benefit on performance.
+         */
         if (OVS_LIKELY(cur_min)) {
             if (!md_is_valid) {
+                /* This is an original packet.  As it is not recirculated
+                 * we can retrieve the 5-tuple hash value without considering
+                 * the recirc id. */
                 key->hash = dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
                         &key->mf);
+                flow = emc_lookup(flow_cache, key);
             } else {
-                key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);
+                /* Recirculated packet. */
+                if (flow_cache->n_entries & EMC_RECIRCT_NO_INSERT_THRESHOLD) {
+                    /* EMC occupancy is over the threshold.  We skip EMC
+                     * lookup for recirculated packets. */
+                    flow = NULL;
+                } else {
+                    key->hash = dpif_netdev_packet_get_rss_hash(packet,
+                            &key->mf);
+                    flow = emc_lookup(flow_cache, key);
+                }
             }
-            flow = emc_lookup(flow_cache, key);
         } else {
             flow = NULL;
         }
@@ -4716,7 +4743,20 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
                                              add_actions->size);
         }
         ovs_mutex_unlock(&pmd->flow_mutex);
-        emc_probabilistic_insert(pmd, key, netdev_flow);
+        /* EMC insertion can be skipped by a probabilistic criteria or
+         * - in case of recirculated packets - depending on the number of
+         * EMC entries. */
+        if (!packet->md.recirc_id) {
+            emc_probabilistic_insert(pmd, key, netdev_flow);
+        } else {
+            /* Recirculated packets.  When EMC occupancy goes over
+             * a threshold we avoid inserting new entries. */
+            if (!(pmd->flow_cache.n_entries &
+                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
+                /* Still under the threshold. */
+                emc_probabilistic_insert(pmd, key, netdev_flow);
+            }
+        }
     }
 }
 
@@ -4809,7 +4849,20 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
 
         flow = dp_netdev_flow_cast(rules[i]);
 
-        emc_probabilistic_insert(pmd, &keys[i], flow);
+        /* EMC insertion can be skipped by a probabilistic criteria or
+         * - in case of recirculated packets - depending on the number of
+         * EMC entries. */
+        if (!packet->md.recirc_id) {
+            emc_probabilistic_insert(pmd, &keys[i], flow);
+        } else {
+            /* Recirculated packets.  When EMC occupancy goes over
+             * a threshold we avoid inserting new entries. */
+            if (!(pmd->flow_cache.n_entries &
+                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
+                /* Still under the threshold. */
+                emc_probabilistic_insert(pmd, &keys[i], flow);
+            }
+        }
         dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches, n_batches);
     }
 
-- 
2.4.11



More information about the dev mailing list