[ovs-dev] [PATCH ovn] ovn-controller.c: Avoid adding neighbour flows for non-local datapaths.

Dumitru Ceara dceara at redhat.com
Thu Feb 20 09:58:19 UTC 2020


On 2/19/20 6:36 PM, Han Zhou wrote:
> 
> 
> On Wed, Feb 19, 2020 at 4:50 AM Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>> wrote:
>>
>> On 2/19/20 12:32 AM, Han Zhou wrote:
>> > This is usefule when external_ids:ovn-monitor-all is set to true.
>> >
>> > Signed-off-by: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
>>
>> Hi Han,
>>
>> Looks good to me.
>>
>> Acked-by: Dumitru Ceara <dceara at redhat.com <mailto:dceara at redhat.com>>
>>
>> I also tested this (together with your previous patch) on a scaled setup
>> with 150 ovn-fake-multinode nodes and ovn-monitor-all enabled.
>>
>> With OVN master I see high CPU usage on ovn-controllers from time to time:
>>
>>
> ovn-netlab1-64/ovn-controller.log:2020-02-19T12:14:11.896Z|00017|timeval|WARN|Unreasonably
>> long 1087ms poll interval (224ms user, 14ms system)
>>
> ovn-netlab1-140/ovn-controller.log:2020-02-19T12:14:12.030Z|00017|timeval|WARN|Unreasonably
>> long 1055ms poll interval (241ms user, 11ms system)
>>
> ovn-netlab1-69/ovn-controller.log:2020-02-19T12:14:11.856Z|00017|timeval|WARN|Unreasonably
>> long 1019ms poll interval (221ms user, 1ms system)
>>
> ovn-netlab1-25/ovn-controller.log:2020-02-19T12:14:11.857Z|00017|timeval|WARN|Unreasonably
>> long 1053ms poll interval (230ms user, 9ms system)
>>
> ovn-netlab1-48/ovn-controller.log:2020-02-19T12:14:11.827Z|00017|timeval|WARN|Unreasonably
>> long 1005ms poll interval (245ms user, 22ms system)
>>
> ovn-netlab1-80/ovn-controller.log:2020-02-19T12:14:11.936Z|00017|timeval|WARN|Unreasonably
>> long 1127ms poll interval (218ms user, 2ms system)
>>
> ovn-netlab1-56/ovn-controller.log:2020-02-19T12:14:01.202Z|00017|timeval|WARN|Unreasonably
>> long 1016ms poll interval (224ms user, 0ms system)
>>
> ovn-netlab1-24/ovn-controller.log:2020-02-19T12:14:22.623Z|00017|timeval|WARN|Unreasonably
>> long 1022ms poll interval (227ms user, 1ms system)
>>
> ovn-netlab1-65/ovn-controller.log:2020-02-19T12:13:19.585Z|00017|timeval|WARN|Unreasonably
>> long 1012ms poll interval (213ms user, 1ms system)
>>
> ovn-netlab1-46/ovn-controller.log:2020-02-19T12:14:11.893Z|00017|timeval|WARN|Unreasonably
>> long 1086ms poll interval (225ms user, 0ms system)
>>
> ovn-netlab1-21/ovn-controller.log:2020-02-19T12:13:19.586Z|00017|timeval|WARN|Unreasonably
>> long 1031ms poll interval (222ms user, 0ms system)
>>
>> With your changes this happens less often:
>>
> ./localhost/ovn-netlab1-63/ovn-controller.log:2020-02-19T12:46:10.204Z|00017|timeval|WARN|Unreasonably
>> long 1038ms poll interval (223ms user, 1ms system)
>>
> ./localhost/ovn-netlab1-67/ovn-controller.log:2020-02-19T12:45:59.677Z|00017|timeval|WARN|Unreasonably
>> long 1033ms poll interval (215ms user, 0ms system)
>>
> ./localhost/ovn-netlab1-96/ovn-controller.log:2020-02-19T12:46:10.261Z|00017|timeval|WARN|Unreasonably
>> long 1009ms poll interval (219ms user, 1ms system)
>>
> ./localhost/ovn-netlab1-43/ovn-controller.log:2020-02-19T12:46:10.194Z|00017|timeval|WARN|Unreasonably
>> long 1044ms poll interval (222ms user, 0ms system)
>>
> ./localhost/ovn-netlab1-58/ovn-controller.log:2020-02-19T12:46:10.253Z|00017|timeval|WARN|Unreasonably
>> long 1091ms poll interval (225ms user, 12ms system)
>>
> ./localhost/ovn-netlab1-95/ovn-controller.log:2020-02-19T12:46:10.246Z|00017|timeval|WARN|Unreasonably
>> long 1031ms poll interval (216ms user, 16ms system)
>>
>>
>> Regards,
>> Dumitru
>>
> Thanks Dumitru for reviewing and testing it out.
> Are you seeing high CPU only after applying this patch? In theory I
> think this patch should not contribute to CPU spike.
> Enabling ovn-monitor-all can result in higher CPU in ovn-controller in
> circumstances when not all datapaths are local. In your test case, is
> the topology ideal for ovn-monitor-all? I.e. does each node cares about
> all datapaths? If the answer is yes, then could you try enabling
> ovn-monitor-all only on half of the nodes, and see if the nodes with
> ovn-monitor-all enabled are with higher CPU than others?
> 

Hi Han,

In my test topology all datapaths are local (i.e., all logical switches
are connected to a single cluster logical router).

The test machine I used initially was oversubscribed so I ran the tests
again on a setup with more physical machines:

1. With OVN master, ovn-monitor-all=false, bringing up 300 nodes (300
logical switches + one VIF per switch):
- SB DB CPU usage is high after a certain number of nodes come up.
Running perf on the setup points to ovsdb_monitor_get_update that takes
up to 70% CPU time (including children). This due to each ovn-controller
subscribing to OVSDB updates for all datapaths individually.
- ovn-controller CPU usage is normal, i.e., no visible CPU spikes.

2. With OVN master, ovn-monitor-all=true, bringing up 300 nodes:
- SB DB CPU usage is low, no visible CPU spikes.
- ovn-controller CPU usage is normal as well.

3. With OVN master + your patches, ovn-monitor-all=true, bringing up 300
nodes:
- SB DB CPU usage is low, no visible CPU spikes.
- ovn-controller CPU usage is normal as well.

In conclusion all seems fine to me and even in the worst case scenario,
when all datapaths are local, ovn-controller cpu usage is not affected
by the extra datapath lookups introduced by your changes.

Thanks,
Dumitru

> In addition, did you see any difference of CPU usage on SB DB?
> 
> Thanks,
> Han



More information about the dev mailing list