[ovs-dev] [OVN Patch v8 3/3] northd: Restore parallel build with dp_groups

Han Zhou hzhou at ovn.org
Fri Oct 1 22:34:28 UTC 2021


On Thu, Sep 30, 2021 at 11:04 PM Anton Ivanov <
anton.ivanov at cambridgegreys.com> wrote:

> On 01/10/2021 01:32, Han Zhou wrote:
>
>
>
> On Thu, Sep 30, 2021 at 2:03 PM Anton Ivanov <
> anton.ivanov at cambridgegreys.com> wrote:
>
>> On 30/09/2021 20:48, Han Zhou wrote:
>>
>>
>>
>> On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov <
>> anton.ivanov at cambridgegreys.com> wrote:
>>
>>> Summary of findings.
>>>
>>> 1. The numbers on the perf test do not align with heater which is much
>>> closer to a realistic load. On some tests where heater gives 5-10%
>>> end-to-end improvement with parallelization we get worse results with the
>>> perf-test. You spotted this one correctly.
>>>
>>> Example of the northd average pulled out of the test report via grep and
>>> sed.
>>>
>>>    127.489353
>>>    131.509458
>>>    116.088205
>>>    94.721911
>>>    119.629756
>>>    114.896258
>>>    124.811069
>>>    129.679160
>>>    106.699905
>>>    134.490338
>>>    112.106713
>>>    135.957658
>>>    132.471111
>>>    94.106849
>>>    117.431450
>>>    115.861592
>>>    106.830657
>>>    132.396905
>>>    107.092542
>>>    128.945760
>>>    94.298464
>>>    120.455510
>>>    136.910426
>>>    134.311765
>>>    115.881292
>>>    116.918458
>>>
>>> These values are all over the place - this is not a reproducible test.
>>>
>>> 2. In the present state you need to re-run it > 30+ times and take an
>>> average. The standard deviation for the values for the northd loop is >
>>> 10%. Compared to that the reproducibility of ovn-heater is significantly
>>> better. I usually get less than 0.5% difference between runs if there was
>>> no iteration failures. I would suggest using that instead if you want to do
>>> performance comparisons until we have figured out what affects the
>>> perf-test.
>>>
>>> 3. It is using the short term running average value in reports which is
>>> probably wrong because you have very significant skew from the last several
>>> values.
>>>
>>> I will look into all of these.
>>>
>> Thanks for the summary! However, I think there is a bigger problem
>> (probably related to my environment) than the stability of the test (make
>> check-perf TESTSUITEFLAGS="--rebuild") itself. As I mentioned in an earlier
>> email I observed even worse results with a large scale topology closer to a
>> real world deployment of ovn-k8s just testing with the command:
>>     ovn-nbctl --print-wait-time --wait=sb sync
>>
>> This command simply triggers a change in NB_Global table and wait for
>> northd to complete all the recompute and update SB. It doesn't have to use
>> "sync" command but any change to the NB DB produces similar result (e.g.:
>> ovn-nbctl --print-wait-time --wait=sb ls-add ls1)
>>
>> Without parallel:
>> ovn-northd completion: 7807ms
>>
>> With parallel:
>> ovn-northd completion: 41267ms
>>
>> Is this with current master or prior to these patches?
>>
> 1. There was an issue prior to these where the hash on first iteration
>> with an existing database when loading a large database for the first time
>> was not sized correctly. These numbers sound about right when this bug was
>> around.
>>
> The patches are included. The commit id is 9242f27f63 as mentioned in my
> first email.
>
>> 2. There should be NO DIFFERENCE in a single compute cycle with an
>> existing database between a run with parallel and without with dp groups at
>> present. This is because the first cycle does not use parallel compute. It
>> is disabled in order to achieve the correct hash sizings for future cycle
>> by auto-scaling the hash.
>>
> Yes, I understand this and I did enable dp-group for the above "ovn-nbctl
> sync" test, so the number I showed above for "with parallel" was for the
> 2nd run and onwards. For the first round the result is exactly the same as
> without parallel.
>
> I just tried disabling DP group for the large scale "ovn-nbctl sync" test
> (after taking some effort squeezing out memory spaces on my desktop), and
> the result shows that parallel build performs slightly better (although it
> is 3x slower than with dp-group & without parallel, which is expected).
> Summarize the result together below:
>
> Without parallel, with dp-group:
> ovn-northd completion: 7807ms
>
> With parallel, with dp-group:
> ovn-northd completion: 41267ms
>
> without parallel, without dp-group:
> ovn-northd completion: 27996ms
>
> with parallel, without dp-group:
> ovn-northd completion: 26584ms
>
> Now the interesting part:
> I implemented a POC of a hash based mutex array that replaces the rw lock
> in the function do_ovn_lflow_add_pd(), and the performance is greatly
> improved for the dp-group test:
>
> with parallel, with dp-group (hash based mutex):
> ovn-northd completion: 5081ms
>
> This is 8x faster than the current parallel one and 30% faster than
> without parallel. This result looks much more reasonable to me. My theory
> is that when using parallel with dp-group, the rwlock contention is causing
> the low CPU utilization of the threads and the overall slowness on my
> machine. I will refine the POC to a formal patch and send it for review,
> hopefully by tomorrow.
>
> Cool. The older implementation prior to going to rwlock was based on that.
>
> I found a couple of issues with it which is why I switched to RWlock
>
> Namely - the access to the lflow hash size is not controlled and the hash
> size ends up corrupt because different threads modify it without a lock. In
> a worst case scenario you end up with a dog's breakfast in this entire
> cache line.
>
> So you need a couple of extra macros to insert fast without touching the
> cache size.
>
> This in turn leaves you with a hash you cannot resize to correct size for
> searching for post-processing lflows and reconciliation. You will probably
> need the post-processing optimization patch which I submitted a couple of
> weeks back. Instead of using a HMAPX to hold the single flows and modifying
> in-place the lflow hash it rebuilds it completely and replaces the original
> one. At that point you have the right size and the hash is resized to
> optimum size.
>
> By the way, there is one more option - you may want to try switching to
> fatrwlock - that is supposed to decrease contention and make things faster.
> Though that probably will not be enough.
>
> I still do not get it why your results are so different from ovn-heater
> tests, but that is something I will look into separately.
>
> Brgds,
>
Hi Anton, thanks for bringing up the tricky points. I end up with a
relatively simple solution:
https://patchwork.ozlabs.org/project/ovn/patch/20211001222944.2353351-1-hzhou@ovn.org/

The considerations are described in the comments. Please take a look and
see if it solves the concerns.

Thanks,
Han

>
> Thanks,
> Han
>
>
> --
> Anton R. Ivanov
> Cambridgegreys Limited. Registered in England. Company Number 10273661https://www.cambridgegreys.com/
>
>


More information about the dev mailing list