[ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for recirc packets
Fischetti, Antonio
antonio.fischetti at intel.com
Tue Aug 15 13:55:10 UTC 2017
> -----Original Message-----
> From: Darrell Ball [mailto:dball at vmware.com]
> Sent: Monday, August 14, 2017 7:27 AM
> To: Fischetti, Antonio <antonio.fischetti at intel.com>; dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> recirc packets
>
>
>
> -----Original Message-----
> From: <ovs-dev-bounces at openvswitch.org> on behalf of
> "antonio.fischetti at intel.com" <antonio.fischetti at intel.com>
> Date: Friday, August 11, 2017 at 8:52 AM
> To: "dev at openvswitch.org" <dev at openvswitch.org>
> Subject: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for
> recirc packets
>
> When OVS is configured as a firewall, with thousands of active
> concurrent connections, the EMC gets quicly saturated and may
> come under heavy thrashing for the reason that original and
> recirculated packets keep overwriting the existing active EMC
> entries due to its limited size (8k).
>
>
> The recirculated packet could have been modified, in which case, maybe we
> still want to do the emc lookup/insert ?
[Antonio]
IMPO I'd say we should still skip emc anyway, because the purpose is to
mitigate thrashing when emc is full. So any recirculated packet should
be classified at the dpcls/ofproto layers.
I don't know if I'm missing something from your question?
We can expect that a recirc pkt that has been modified - similarly to all
other recirculated pkts - could result in a miss when emc is full.
Later we should do an emc insertion that is likely to overwrite some
active entry. And recursively, this new insertion itself could be
overwritten - due to the shortage of locations - even before it is hit
again. This proposal is to mitigate the thrashing with the criteria of
reserving emc usage to original packets only.
So a limited resource like emc hopefully could be used more efficiently,
especially when there is more than 1 recirculation.
I guess that adding an exception for modified recirc pkts could also
drop a bit the throughtput as we should add another if statement inside
emc_processing.
>
>
> This thrashing causes the EMC to be less efficient than the dcpls
> in terms of lookups and insertions.
>
> This patch allows to use the EMC efficiently by allowing only
> the 'original' packets to hit EMC. All recirculated packets are
> sent to the classifier directly.
> An empirical threshold EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50% -
> for EMC occupancy is set to trigger this logic. By doing so when
> EMC utilization exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD:
> - EMC Insertions are allowed just for original packets.
> EMC insertion and look up are skipped for recirculated packets.
> - Recirculated packets are sent to the classifier.
>
> This patch is based on patch
> "dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DJanuary_327570.html&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> uZnsw&m=NHY06RD-Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=-
> PhWyltJ71UipVzd1D0H0I9k4uSTLdCJ_zanXxHd7fo&e=
>
> CC: Jan Scheurich <jan.scheurich at ericsson.com>
> Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
> Co-authored-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
> ---
> Connection Tracker testbench set up with
>
> table=0, priority=1 actions=drop
> table=0, priority=10,arp actions=NORMAL
> table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
> table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
> table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
> table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
> table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1
>
> 2 PMDs, 3 Tx queues.
>
> I measured packet Rx rate (regardless of packet loss). Bidirectional
> test with 64B UDP packets.
> Each row is a test with a different number of traffic streams. The traffic
> generator is set so that each stream establishes one UDP connection.
> Mpps columns reports the Rx rates on the 2 sides.
>
> I set up the generator to loop on the dest IP addr on one side,
> and loop instead on the source IP addr on the other side.
>
> For example to generate 10 different flows, I was sending to phy port #1
> UDP, IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29], PortSrc: 63, PortDest: 63
>
> Instead to phy port #2 (source and dest IPs are now swapped):
> UDP, IPsrc: 20.20.20.[20-29], IPdest: 10.10.10.10, PortSrc: 63, PortDest:
> 63
>
> I saw the following performance improvement.
>
> Original OvS-DPDK means at Commit ID:
> 6b1babacc3ca0488e07596bf822fe356c9bab646
>
> +----------------------+-----------------------+
> | Original OvS-DPDK | Original OvS-DPDK |
> | | + this patch |
> ---------+------------+---------+------------+----------+
> Traffic | Rx | EMC | Rx | EMC |
> Streams | [Mpps] | entries | [Mpps] | entries |
> ---------+------------+---------+------------+----------+
> 100 | 2.43, 2.49 | 200 | 2.55, 2.57 | 201 |
> 1,000 | 2.01, 2.02 | 2007 | 2.12, 2.12 | 2006 |
> 2,000 | 1.93, 1.95 | 3868 | 1.98, 1.96 | 3884 |
> 3,000 | 1.87, 1.91 | 5086 | 1.97, 1.97 | 4757 |
> 4,000 | 1.83, 1.82 | 6173 | 1.94, 1.93 | 5280 |
> 10,000 | 1.67, 1.69 | 7826 | 1.82, 1.81 | 7090 |
> 30,000 | 1.57, 1.59 | 8192 | 1.66, 1.67 | 8192 |
> ---------+------------+---------+------------+----------+
>
> This test setup implies 1 recirculation on each received packet.
> We didn't check this patch in a test scenario where more than 1
> recirculation is occurring per packet.
> ---
> lib/dpif-netdev.c | 65
> +++++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bea1c3f..8f6b96b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4663,6 +4663,9 @@ dp_netdev_queue_batches(struct dp_packet *pkt,
> packet_batch_per_flow_update(batch, pkt, mf);
> }
>
> +/* Threshold to skip EMC for recirculated packets. */
> +#define EMC_RECIRCT_NO_INSERT_THRESHOLD 0xFFFFF000
> +
> /* Try to process all ('cnt') the 'packets' using only the exact match
> cache
> * 'pmd->flow_cache'. If a flow is not found for a packet 'packets[i]',
> the
> * miniflow is copied into 'keys' and the packet pointer is moved at the
> @@ -4714,8 +4717,36 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
> key->len = 0; /* Not computed yet. */
> key->hash = dpif_netdev_packet_get_rss_hash(packet, &key->mf);
>
> - /* If EMC is disabled skip emc_lookup */
> - flow = (cur_min == 0) ? NULL: emc_lookup(flow_cache, key);
> + /*
> + * EMC lookup is skipped when one or both of the following
> + * two cases occurs:
> + *
> + * - EMC is disabled. This is detected from cur_min.
> + *
> + * - The EMC occupancy exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD
> and
> + * the packet to be classified is being recirculated. When
> this
> + * happens also EMC insertions are skipped for recirculated
> + * packets. So that EMC is used just to store entries which
> + * are hit from the 'original' packets. This way the EMC
> + * thrashing is mitigated with a benefit on performance.
> + */
> + if (OVS_LIKELY(cur_min)) {
> + if (!md_is_valid) {
> + flow = emc_lookup(flow_cache, key);
> + } else {
> + /* Recirculated packet. */
> + if (flow_cache->n_entries &
> EMC_RECIRCT_NO_INSERT_THRESHOLD) {
> + /* EMC occupancy is over the threshold. We skip EMC
> + * lookup for recirculated packets. */
> + flow = NULL;
> + } else {
> + flow = emc_lookup(flow_cache, key);
> + }
> + }
> + } else {
> + flow = NULL;
> + }
> +
> if (OVS_LIKELY(flow)) {
> dp_netdev_queue_batches(packet, flow, &key->mf, batches,
> n_batches);
> @@ -4800,7 +4831,20 @@ handle_packet_upcall(struct dp_netdev_pmd_thread
> *pmd,
> add_actions->size);
> }
> ovs_mutex_unlock(&pmd->flow_mutex);
> - emc_probabilistic_insert(pmd, key, netdev_flow);
> + /* EMC insertion can be skipped by a probabilistic criteria or
> + * - in case of recirculated packets - depending on the number of
> + * EMC entries. */
> + if (!packet->md.recirc_id) {
> + emc_probabilistic_insert(pmd, key, netdev_flow);
> + } else {
> + /* Recirculated packets. When EMC occupancy goes over
> + * a threshold we avoid inserting new entries. */
> + if (!(pmd->flow_cache.n_entries &
> + EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> + /* Still under the threshold. */
> + emc_probabilistic_insert(pmd, key, netdev_flow);
> + }
> + }
> }
> }
>
> @@ -4893,7 +4937,20 @@ fast_path_processing(struct dp_netdev_pmd_thread
> *pmd,
>
> flow = dp_netdev_flow_cast(rules[i]);
>
> - emc_probabilistic_insert(pmd, &keys[i], flow);
> + /* EMC insertion can be skipped by a probabilistic criteria or
> + * - in case of recirculated packets - depending on the number of
> + * EMC entries. */
> + if (!packet->md.recirc_id) {
> + emc_probabilistic_insert(pmd, &keys[i], flow);
> + } else {
> + /* Recirculated packets. When EMC occupancy goes over
> + * a threshold we avoid inserting new entries. */
> + if (!(pmd->flow_cache.n_entries &
> + EMC_RECIRCT_NO_INSERT_THRESHOLD)) {
> + /* Still under the threshold. */
> + emc_probabilistic_insert(pmd, &keys[i], flow);
> + }
> + }
> dp_netdev_queue_batches(packet, flow, &keys[i].mf, batches,
> n_batches);
> }
>
> --
> 2.4.11
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=NHY06RD-
> Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo&s=-xSW7voYnxrudlh_WPXXsKJ1n1o680-
> 3ZCuwj33q0H8&e=
>
More information about the dev
mailing list