[ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for recirc packets

Fischetti, Antonio antonio.fischetti at intel.com
Tue Aug 15 13:55:10 UTC 2017



> -----Original Message-----
> From: Darrell Ball [mailto:dball at vmware.com]
> Sent: Monday, August 14, 2017 7:27 AM
> To: Fischetti, Antonio <antonio.fischetti at intel.com>; dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> recirc packets
> 
> 
> 
> -----Original Message-----
> From: <ovs-dev-bounces at openvswitch.org> on behalf of
> "antonio.fischetti at intel.com" <antonio.fischetti at intel.com>
> Date: Friday, August 11, 2017 at 8:52 AM
> To: "dev at openvswitch.org" <dev at openvswitch.org>
> Subject: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> 	recirc packets
> 
>     When OVS is configured as a firewall, with thousands of active
>     concurrent connections, the EMC gets quicly saturated and may
>     come under heavy thrashing for the reason that original and
>     recirculated packets keep overwriting the existing active EMC
>     entries due to its limited size (8k).
> 
> 
> The recirculated packet could have been modified, in which case, maybe we
> still want to do the emc lookup/insert ?

[Antonio] 
IMPO I'd say we should still skip emc anyway, because the purpose is to 
mitigate thrashing when emc is full. So any recirculated packet should
be classified at the dpcls/ofproto layers.
I don't know if I'm missing something from your question?

We can expect that a recirc pkt that has been modified - similarly to all 
other recirculated pkts - could result in a miss when emc is full. 
Later we should do an emc insertion that is likely to overwrite some 
active entry. And recursively, this new insertion itself could be 
overwritten - due to the shortage of locations - even before it is hit 
again. This proposal is to mitigate the thrashing with the criteria of 
reserving emc usage to original packets only. 
So a limited resource like emc hopefully could be used more efficiently, 
especially when there is more than 1 recirculation.
I guess that adding an exception for modified recirc pkts could also 
drop a bit the throughtput as we should add another if statement inside 
emc_processing.


> 
> 
>     This thrashing causes the EMC to be less efficient than the dcpls
>     in terms of lookups and insertions.
> 
>     This patch allows to use the EMC efficiently by allowing only
>     the 'original' packets to hit EMC. All recirculated packets are
>     sent to the classifier directly.
>     An empirical threshold EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50% -
>     for EMC occupancy is set to trigger this logic. By doing so when
>     EMC utilization exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD:
>      - EMC Insertions are allowed just for original packets.
>        EMC insertion and look up are skipped for recirculated packets.
>      - Recirculated packets are sent to the classifier.
> 
>     This patch is based on patch
>     "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
>     https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DJanuary_327570.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> uZnsw&m=NHY06RD-Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=-
> PhWyltJ71UipVzd1D0H0I9k4uSTLdCJ_zanXxHd7fo&e=
> 
>     CC: Jan Scheurich <jan.scheurich at ericsson.com>
>     Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
>     Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
>     Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
>     ---
>     Connection Tracker testbench set up with
> 
>      table=0, priority=1 actions=drop
>      table=0, priority=10,arp actions=NORMAL
>      table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
>      table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
>      table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
>      table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
>      table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
> 
>     2 PMDs, 3 Tx queues.
> 
>     I measured packet Rx rate (regardless of packet loss). Bidirectional
>     test with 64B UDP packets.
>     Each row is a test with a different number of traffic streams. The traffic
>     generator is set so that each stream establishes one UDP connection.
>     Mpps columns reports the Rx rates on the 2 sides.
> 
>     I set up the generator to loop on the dest IP addr on one side,
>     and loop instead on the source IP addr on the other side.
> 
>     For example to generate 10 different flows, I was sending to phy port #1
>     UDP, IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29], PortSrc: 63, PortDest: 63
> 
>     Instead to phy port #2 (source and dest IPs are now swapped):
>     UDP, IPsrc: 20.20.20.[20-29], IPdest: 10.10.10.10, PortSrc: 63, PortDest:
> 63
> 
>     I saw the following performance improvement.
> 
>     Original OvS-DPDK means at Commit ID:
>       6b1babacc3ca0488e07596bf822fe356c9bab646
> 
>               +----------------------+-----------------------+
>               |  Original OvS-DPDK   |   Original OvS-DPDK   |
>               |                      |    + this patch       |
>      ---------+------------+---------+------------+----------+
>       Traffic |     Rx     |   EMC   |     Rx     |   EMC    |
>       Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
>      ---------+------------+---------+------------+----------+
>          100  | 2.43, 2.49 |   200   | 2.55, 2.57 |   201    |
>        1,000  | 2.01, 2.02 |  2007   | 2.12, 2.12 |  2006    |
>        2,000  | 1.93, 1.95 |  3868   | 1.98, 1.96 |  3884    |
>        3,000  | 1.87, 1.91 |  5086   | 1.97, 1.97 |  4757    |
>        4,000  | 1.83, 1.82 |  6173   | 1.94, 1.93 |  5280    |
>       10,000  | 1.67, 1.69 |  7826   | 1.82, 1.81 |  7090    |
>       30,000  | 1.57, 1.59 |  8192   | 1.66, 1.67 |  8192    |
>      ---------+------------+---------+------------+----------+
> 
>     This test setup implies 1 recirculation on each received packet.
>     We didn't check this patch in a test scenario where more than 1
>     recirculation is occurring per packet.
>     ---
>      lib/dpif-netdev.c | 65
> +++++++++++++++++++++++++++++++++++++++++++++++++++----
>      1 file changed, 61 insertions(+), 4 deletions(-)
> 
>     diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     index bea1c3f..8f6b96b 100644
>     --- a/lib/dpif-netdev.c
>     +++ b/lib/dpif-netdev.c
>     @@ -4663,6 +4663,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
>          packet_batch_per_flow_update(batch, pkt, mf);
>      }
> 
>     +/* Threshold to skip EMC for recirculated packets. */
>     +#define EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
>     +
>      /* Try to process all ('cnt') the 'packets' using only the exact match
> cache
>       * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]',
> the
>       * miniflow is copied into 'keys' and the packet pointer is moved at the
>     @@ -4714,8 +4717,36 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
>              key->len = 0; /* Not computed yet. */
>              key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);
> 
>     -        /* If EMC is disabled skip emc_lookup */
>     -        flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
>     +        /*
>     +         * EMC lookup is skipped when one or both of the following
>     +         * two cases occurs:
>     +         *
>     +         *    - EMC is disabled.  This is detected from cur_min.
>     +         *
>     +         *    - The EMC occupancy exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD
> and
>     +         *      the packet to be classified is being recirculated.  When
> this
>     +         *      happens also EMC insertions are skipped for recirculated
>     +         *      packets.  So that EMC is used just to store entries which
>     +         *      are hit from the 'original' packets.  This way the EMC
>     +         *      thrashing is mitigated with a benefit on performance.
>     +         */
>     +        if (OVS_LIKELY(cur_min)) {
>     +            if (!md_is_valid) {
>     +                flow = emc_lookup(flow_cache, key);
>     +            } else {
>     +                /* Recirculated packet. */
>     +                if (flow_cache->n_entries &
> EMC_RECIRCT_NO_INSERT_THRESHOLD) {
>     +                    /* EMC occupancy is over the threshold.  We skip EMC
>     +                     * lookup for recirculated packets. */
>     +                    flow = NULL;
>     +                } else {
>     +                    flow = emc_lookup(flow_cache, key);
>     +                }
>     +            }
>     +        } else {
>     +            flow = NULL;
>     +        }
>     +
>              if (OVS_LIKELY(flow)) {
>                  dp_netdev_queue_batches(packet, flow, &key->mf, batches,
>                                          n_batches);
>     @@ -4800,7 +4831,20 @@ handle_packet_upcall(struct dp_netdev_pmd_thread
> *pmd,
>                                                   add_actions->size);
>              }
>              ovs_mutex_unlock(&pmd->flow_mutex);
>     -        emc_probabilistic_insert(pmd, key, netdev_flow);
>     +        /* EMC insertion can be skipped by a probabilistic criteria or
>     +         * - in case of recirculated packets - depending on the number of
>     +         * EMC entries. */
>     +        if (!packet->md.recirc_id) {
>     +            emc_probabilistic_insert(pmd, key, netdev_flow);
>     +        } else {
>     +            /* Recirculated packets.  When EMC occupancy goes over
>     +             * a threshold we avoid inserting new entries. */
>     +            if (!(pmd->flow_cache.n_entries &
>     +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
>     +                /* Still under the threshold. */
>     +                emc_probabilistic_insert(pmd, key, netdev_flow);
>     +            }
>     +        }
>          }
>      }
> 
>     @@ -4893,7 +4937,20 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
> 
>              flow = dp_netdev_flow_cast(rules[i]);
> 
>     -        emc_probabilistic_insert(pmd, &keys[i], flow);
>     +        /* EMC insertion can be skipped by a probabilistic criteria or
>     +         * - in case of recirculated packets - depending on the number of
>     +         * EMC entries. */
>     +        if (!packet->md.recirc_id) {
>     +            emc_probabilistic_insert(pmd, &keys[i], flow);
>     +        } else {
>     +            /* Recirculated packets.  When EMC occupancy goes over
>     +             * a threshold we avoid inserting new entries. */
>     +            if (!(pmd->flow_cache.n_entries &
>     +                    EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
>     +                /* Still under the threshold. */
>     +                emc_probabilistic_insert(pmd, &keys[i], flow);
>     +            }
>     +        }
>              dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> n_batches);
>          }
> 
>     --
>     2.4.11
> 
>     _______________________________________________
>     dev mailing list
>     dev at openvswitch.org
>     https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=NHY06RD-
> Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=-xSW7voYnxrudlh_WPXXsKJ1n1o680-
> 3ZCuwj33q0H8&e=
> 



More information about the dev mailing list