[ovs-discuss] Incremental perf results

Tue Jun 5 19:00:07 UTC 2018

On 06/05/2018 01:02 PM, Mark Michelson wrote:
> On 06/05/2018 12:40 PM, Han Zhou wrote:
>>
>>
>> On Fri, May 18, 2018 at 2:03 PM, Han Zhou <zhouhan at gmail.com 
>> <mailto:zhouhan at gmail.com>> wrote:
>>
>>     Hi Mark,
>>
>>     Thank you so much for sharing this data. Please see my comments 
>> inline.
>>
>>     On Fri, May 18, 2018 at 1:31 PM, Mark Michelson <mmichels at redhat.com
>>     <mailto:mmichels at redhat.com>> wrote:
>>
>>         Hi Han, I finally did some tests and looked at the CPU usage
>>         between master and the ip7 branch.
>>
>>         On the machines running ovn-controller:
>>         Master branch: Climbs to around 100% over the course of 3
>>         minutes, then oscillates close to 100% for about 10 minutes, and
>>         then is pegged to 100% for the rest of the test. Total test time
>>         was about 23 minutes.
>>         ip7 branch: oscillates between 10 and 25% for the first 10
>>         minutes of the test, then hovers around 10% for the rest. Total
>>         test time was about 19 minutes.
>>
>>     This is aligned with my observation of ~90% improvement on CPU cost.
>>
>>     For the throughput/total time, the improvement ratio is different
>>     (in my test case the execution time reduced ~50%) but I think it can
>>     be explained. The total execution time is not accurately reflecting
>>     the efficiency of the processing, because when CPU is 100%,
>>     ovn-controller processing will be slowed down which may just end up
>>     less iterations during the whole test. I think the stop-watch
>>     profiling mechanism you implemented (also rebased into the
>>     incremental processing) will be able to tell the truth. The real
>>     impact of that is longer latency for handling a change in control
>>     plane. So I also use latency to evaluate the improvement. The way I
>>     test latency is using ovn-nbctl --wait=hv, with the nb_cfg
>>     improvement (https://patchwork.ozlabs.org/patch/899608/
>>     <https://patchwork.ozlabs.org/patch/899608/>).
>>
>>         When I switched over to tests that have ACLs:
>>         Master branch: Behaves about the same as the master branch when
>>         no ACLs are used. Total test time was about 28 minutes
>>         ip7 branch: CPU usage hovered around 30% for the entirety of the
>>         test, hitting spikes around 50% a couple of times. Total test
>>         time was about 25 minutes.
>>
>>         Since I had not done it yet, I also ran perf while running the
>>         incremental branch with ACLs. I am attaching the flame graph
>>         here. The gist is that much like the master branch, the majority
>>         of CPU time is spent processing logical flows.
>>
>>         Seeing the drop in CPU usage between the master branch and the
>>         ip7 branch makes me think it is worth investigating other areas
>>         that may be the bottleneck. I monitored memory, disk usage, and
>>         network usage on the machines, but I didn't see anything that
>>         seemed obvious as being the cause for delay.
>>
>>     The CPU drop between master and ip7 when testing with ACLs, for my
>>     understanding, most likely because of incremental processing avoids
>>     recompute flows when irrelevant input such as pinctrl/ofctrl
>>     messages (e.g. probe/echo) comes, while in master any of these
>>     inputs would trigger recomputing.
>>
>>         CPU-wise, I think the biggest improvements that can be made to
>>         the incremental processing branch are:
>>         * Adding a change handler for the Address_Set table.
>>         * ofctrl_put() improvements we have discussed.
>>
>>         I think this will have noticeable improvements in our test
>>         times. However, based on how much the CPU usage dropped just
>>         from switching to the incremental processing branch, I think
>>         there are likely some other bottlenecks in our tests that would
>>         be more impactful to remove. We already know that
>>         "ovn_network.bind_port" and "ovn_network.wait_port_up" in
>>         ovn-scale-test terminology are the operations in our test
>>         iterations that take the longest. If we can break those down
>>         into smaller pieces, we can potentially zero in on what to
>>         target next.
>>
>>
>>     I am not sure if there is any other *big* bottlenecks, but
>>     address-set/port-group and ofctrl_put() improvement are surely 
>> needed :)
>>     The latest patch I provided is from my ip9 branch, which is rebased
>>     on master this week, with some code refactors. Feel free to try it,
>>     but don't expect any performance difference.
>>
>>
>> Hi Mark,
>>
>> Do you still have the same environment to try out the address-set 
>> incremental processing patches, to see if it improves the test results 
>> for ACLs with per-port address sets updates?
>> The patch is v3: 
>> https://patchwork.ozlabs.org/project/openvswitch/list/?series=48060
>> It is also in branch ip11.
>>
>> Thanks,
>> Han
> 
> As a matter of fact, I saw the ip11 branch this past Friday and gave it 
> a test during the weekend. I didn't run perf during the test, but based 
> solely on the time the test took to run, it was improved. For the test, 
> I ran with 3312 iterations. In the results I reported earlier in this 
> thread, we were doing 864 iterations, so I don't have an 
> apples-to-apples comparison at the moment. I will run an 864 iteration 
> test and see how it compares to the earlier numbers. I'll report back 
> when I have numbers.

I ran the test with ACLs with 864 iterations. The results are nearly 
exactly the same as when I had run the ip7 branch with no ACLs. That is, 
it took around 19 minutes to run the test, and the CPU usage hovered 
around 10% for the test. I also ran perf. The flame graph shows what we 
would expect by this point. That is, the majority of processing time in 
ovn-controller is spent in ofctrl_put().

So I'd say that address set incremental processing is successful in our 
tests. Great job!