[ovs-discuss] Incremental perf results
Mark Michelson
mmichels at redhat.com
Tue Jun 5 19:00:07 UTC 2018
On 06/05/2018 01:02 PM, Mark Michelson wrote:
> On 06/05/2018 12:40 PM, Han Zhou wrote:
>>
>>
>> On Fri, May 18, 2018 at 2:03 PM, Han Zhou <zhouhan at gmail.com
>> <mailto:zhouhan at gmail.com>> wrote:
>>
>> Hi Mark,
>>
>> Thank you so much for sharing this data. Please see my comments
>> inline.
>>
>> On Fri, May 18, 2018 at 1:31 PM, Mark Michelson <mmichels at redhat.com
>> <mailto:mmichels at redhat.com>> wrote:
>>
>> Hi Han, I finally did some tests and looked at the CPU usage
>> between master and the ip7 branch.
>>
>> On the machines running ovn-controller:
>> Master branch: Climbs to around 100% over the course of 3
>> minutes, then oscillates close to 100% for about 10 minutes, and
>> then is pegged to 100% for the rest of the test. Total test time
>> was about 23 minutes.
>> ip7 branch: oscillates between 10 and 25% for the first 10
>> minutes of the test, then hovers around 10% for the rest. Total
>> test time was about 19 minutes.
>>
>> This is aligned with my observation of ~90% improvement on CPU cost.
>>
>> For the throughput/total time, the improvement ratio is different
>> (in my test case the execution time reduced ~50%) but I think it can
>> be explained. The total execution time is not accurately reflecting
>> the efficiency of the processing, because when CPU is 100%,
>> ovn-controller processing will be slowed down which may just end up
>> less iterations during the whole test. I think the stop-watch
>> profiling mechanism you implemented (also rebased into the
>> incremental processing) will be able to tell the truth. The real
>> impact of that is longer latency for handling a change in control
>> plane. So I also use latency to evaluate the improvement. The way I
>> test latency is using ovn-nbctl --wait=hv, with the nb_cfg
>> improvement (https://patchwork.ozlabs.org/patch/899608/
>> <https://patchwork.ozlabs.org/patch/899608/>).
>>
>> When I switched over to tests that have ACLs:
>> Master branch: Behaves about the same as the master branch when
>> no ACLs are used. Total test time was about 28 minutes
>> ip7 branch: CPU usage hovered around 30% for the entirety of the
>> test, hitting spikes around 50% a couple of times. Total test
>> time was about 25 minutes.
>>
>> Since I had not done it yet, I also ran perf while running the
>> incremental branch with ACLs. I am attaching the flame graph
>> here. The gist is that much like the master branch, the majority
>> of CPU time is spent processing logical flows.
>>
>> Seeing the drop in CPU usage between the master branch and the
>> ip7 branch makes me think it is worth investigating other areas
>> that may be the bottleneck. I monitored memory, disk usage, and
>> network usage on the machines, but I didn't see anything that
>> seemed obvious as being the cause for delay.
>>
>> The CPU drop between master and ip7 when testing with ACLs, for my
>> understanding, most likely because of incremental processing avoids
>> recompute flows when irrelevant input such as pinctrl/ofctrl
>> messages (e.g. probe/echo) comes, while in master any of these
>> inputs would trigger recomputing.
>>
>> CPU-wise, I think the biggest improvements that can be made to
>> the incremental processing branch are:
>> * Adding a change handler for the Address_Set table.
>> * ofctrl_put() improvements we have discussed.
>>
>> I think this will have noticeable improvements in our test
>> times. However, based on how much the CPU usage dropped just
>> from switching to the incremental processing branch, I think
>> there are likely some other bottlenecks in our tests that would
>> be more impactful to remove. We already know that
>> "ovn_network.bind_port" and "ovn_network.wait_port_up" in
>> ovn-scale-test terminology are the operations in our test
>> iterations that take the longest. If we can break those down
>> into smaller pieces, we can potentially zero in on what to
>> target next.
>>
>>
>> I am not sure if there is any other *big* bottlenecks, but
>> address-set/port-group and ofctrl_put() improvement are surely
>> needed :)
>> The latest patch I provided is from my ip9 branch, which is rebased
>> on master this week, with some code refactors. Feel free to try it,
>> but don't expect any performance difference.
>>
>>
>> Hi Mark,
>>
>> Do you still have the same environment to try out the address-set
>> incremental processing patches, to see if it improves the test results
>> for ACLs with per-port address sets updates?
>> The patch is v3:
>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=48060
>> It is also in branch ip11.
>>
>> Thanks,
>> Han
>
> As a matter of fact, I saw the ip11 branch this past Friday and gave it
> a test during the weekend. I didn't run perf during the test, but based
> solely on the time the test took to run, it was improved. For the test,
> I ran with 3312 iterations. In the results I reported earlier in this
> thread, we were doing 864 iterations, so I don't have an
> apples-to-apples comparison at the moment. I will run an 864 iteration
> test and see how it compares to the earlier numbers. I'll report back
> when I have numbers.
I ran the test with ACLs with 864 iterations. The results are nearly
exactly the same as when I had run the ip7 branch with no ACLs. That is,
it took around 19 minutes to run the test, and the CPU usage hovered
around 10% for the test. I also ran perf. The flame graph shows what we
would expect by this point. That is, the majority of processing time in
ovn-controller is spent in ofctrl_put().
So I'd say that address set incremental processing is successful in our
tests. Great job!
More information about the discuss
mailing list