[ovs-dev] [RFC 0/2] EMC load-shedding

O Mahony, Billy billy.o.mahony at intel.com
Mon Sep 25 12:16:22 UTC 2017


Hi Darrell,

Some more information below. I'll hold off on a v2 for now to give others time to comment.

Thanks,
Billy. 

 
> -----Original Message-----
> From: Darrell Ball [mailto:dball at vmware.com]
> Sent: Friday, September 22, 2017 7:20 PM
> To: O Mahony, Billy <billy.o.mahony at intel.com>; dev at openvswitch.org
> Cc: i.maximets at samsung.com; jan.scheurich at ericsson.com
> Subject: Re: [RFC 0/2] EMC load-shedding
> 
> Thanks for working on this Billy
> One comment inline.
> 
> On 9/22/17, 6:47 AM, "Billy O'Mahony" <billy.o.mahony at intel.com> wrote:
> 
>     Hi All,
> 
>     Please find attached RFC patch for EMC load-shedding [1] as promised [2].
> 
>     This applies clean on 5ff834 "Increment ct packet counters..." It also uses
>     Ilya's patch "Fix per packet cycles statistics." [3] so I've included that in
>     the patch set as it wasn't merged when I started the RFC.
> 
>     The main goal for this RFC is only to demonstrate the outline of the
> mechanism
>     and get feedback & advice for further work.
> 
>     However I did some initial testing with promising results. For 8K to 64K
> flows
>     the cycles per packet drop from ~1200 to ~1100. For small numbers of flows
>     (~16) the cycles per packet remain at ~900 which I beleive means no
> increase
>     but I didn't baseline that situation.
> 
>     There are some TODOs commented in the patch with XXX.
> 
>     For one I think the mechanism should take into account the expected
> cycle-cost
>     of EMC lookup and EMC miss (dpcls lookup) when deciding how much load
> to shed.
>     Rather than the heuristic in this patch which is to keep the emc hit rate (for
>     flow which have not been diverted from the EMC) between certain
> bounds.
> 
> 
> [Darrell]
> Could you expand on the description of the algorithm and the rational?
> I know the algorithm was discussed along with other proposed patches, but I
> think it be would be beneficial if the patch (boils down to a single patch)
> described it.
[[BO'M]] 

I'll add that description and some comments to the v2 of the patch.  In the meantime reviewers should find this helpful:

As the number of flows increased there will eventually be too many flows contending for a place in the EMC cache. The EMC cache becomes a liability when 
emc_lookup_cost + (emc_miss_rate * dpcls_lookup_cost) grows to be greater
than a straightforward dpcls_lookup_cost. When this occurs if some proportion of flows could be made to skip the EMC  (ie neither be inserted into nor looked up in the EMC) it would result in lower lookup costs overall. 

This requires and efficient and flexible way to categorize flows into - 'skip EMC' and 'use EMC' categories. The RSS hash can fulfil this role by setting a threshold whereby RSS hashes under a certain value are skipped from the EMC.

The algorithm in this RFC is based on setting this shed threshold so that the hit rate on the EMC remains between 50 and 70% which from observation gives an efficient use of the EMC (based on cycles per packet). Periodically (after each 3 million packets) the EMC hit rate is checked and if it is over 70% then the shed threshold is increased (more flows are shed from the EMC) and if it is below 50% the shed threshold in decreased (fewer flows are shed from the EMC). The shed threshold as 16 different values (0x0000_0000 to 0xF000_0000) which allows for no-shedding, 1/16th, 2/16ths, ... 15/16ths of flows to skipped from the EMC.

Each time the shed_threshold is adjusted it is moved by just one step.

Later revisions will look at the actual lookup cost for flows in the EMC and dpcls rather than using hard-coded hit rates to define efficient use of the EMC. They may also adjust the shed rate in a proportional manner and adjust on a timed interval instead of every N packets.


> 
> Probably the code could benefit from some expanded comments as well?
> 
> I see one comment in the code
> +        /* As hit rate goes down shed thresh goes up (more is shed from EMC)
> */
> +        /* XXX consider increment more if further out of bounds *
> 
[[BO'M]] 
> 
>     Also we should decide on at least one flow distribution that would be
> useful
>     (i.e. realistic) for EMC testing. The tests above have either been carried out
>     with a random (uniform) flow distribution which doesn't play well with flow
>     caching or else a round-robin flow distribution which is actually adverserial
>     to flow caching. If I have an agreed flow distribution I can then figure out
>     how to produce it for testing :).
> 
>     [1] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DAugust_336509.html&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA
> 09CGX7JQ5Ih-
> uZnsw&m=xyXl4Y7PjflJKH8RTjHev3uMh9JgQGiFk0aPezyN3Qc&s=dr9kNy7p3
> flCYC1W1AO02TRCvTHiazrQ9A5-Ssi4A7o&e=
>     [2] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DSeptember_338380.html&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=BV
> hFA09CGX7JQ5Ih-
> uZnsw&m=xyXl4Y7PjflJKH8RTjHev3uMh9JgQGiFk0aPezyN3Qc&s=OvsyuPzDl
> kZOk_fNXK380np29aySrXUeLKUUKqHCKjw&e=
>     [3] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DAugust_337309.html&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA
> 09CGX7JQ5Ih-
> uZnsw&m=xyXl4Y7PjflJKH8RTjHev3uMh9JgQGiFk0aPezyN3Qc&s=eDHE9eXgz
> l8wlJhJ_pCqqSIuJYM0EfxgejqgD8xHBZ0&e=
> 
>     Billy O'Mahony (1):
>       dpif-netdev: RFC EMC load-shedding
> 
>     Ilya Maximets (1):
>       dpif-netdev: Fix per packet cycles statistics.
> 
>      lib/dpif-netdev.c | 118
> +++++++++++++++++++++++++++++++++++++++++++++++++-----
>      1 file changed, 108 insertions(+), 10 deletions(-)
> 
>     --
>     2.7.4
> 
> 



More information about the dev mailing list