[ovs-discuss] Inquiry for DDlog status for ovn-northd

Dumitru Ceara dceara at redhat.com
Wed Aug 26 15:14:08 UTC 2020


On 8/26/20 5:11 PM, Dumitru Ceara wrote:
> On 8/25/20 7:46 PM, Ben Pfaff wrote:
>> On Tue, Aug 25, 2020 at 06:43:51PM +0200, Dumitru Ceara wrote:
>>> On 8/25/20 6:01 PM, Ben Pfaff wrote:
>>>> On Mon, Aug 24, 2020 at 04:28:22PM -0700, Han Zhou wrote:
>>>>> As I remember you were working on the new ovn-northd that utilizes DDlog
>>>>> for incremental processing. Could you share the current status?
>>>>>
>>>>> Now that some more improvements have been made in ovn-controller and OVSDB,
>>>>> the ovn-northd becomes the more obvious bottleneck for OVN use in large
>>>>> scale environments. Since you were not in the OVN meetings for the last
>>>>> couple of weeks, could you share here the status and plan moving forward?
>>>>
>>>> The status is basically that I haven't yet succeeded at getting Red
>>>> Hat's recommended benchmarks running.  I'm told that is important before
>>>> we merge it.  I find them super difficult to set up.  I tried a few
>>>> weeks ago and basically gave up.  Piles and piles of repos all linked
>>>> together in tricky ways, making it really difficult to substitute my own
>>>> branches.  I intend to try again soon, though.  I have a new computer
>>>> that should be arriving soon, which should also allow it to proceed more
>>>> quickly.
>>>
>>> Hi Ben,
>>>
>>> I can try to help with setting up ovn-heater, in theory it should be
>>> enough to export OVS_REPO, OVS_BRANCH, OVN_REPO, OVN_BRANCH, make them
>>> point to your repos and branches and then run "do.sh install" and it
>>> should take care of installing all the dependencies and repos.
>>>
>>> I can also try to run the scale tests on our downstream if that helps.
>>
>> It's probably better if I come up with something locally, because I
>> expect to have to run it multiple times, maybe many times, since I will
>> presumably discover bottlenecks.
>>
>> This time around, I'll speak up when I run into problems.
>>
> 
> Sorry in advance for the log email.
> 
> I went ahead and added a new test scenario to ovn-heater that I think
> might be relevant in the context of ovn-northd incremental processing:
> 
> https://github.com/dceara/ovn-heater#example-run-scenario-3---scale-up-number-of-pods---stress-ovn-northd
> 
> On my test machine:
> Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
> 2 NUMA nodes - 28 cores each.
> 
> I did:
> 
> $ cd
> $ git clone https://github.com/dceara/ovn-heater
> $ cd ovn-heater
> $ cat > physical-deployments/physical-deployment.yml << EOF
> registry-node: localhost
> internal-iface: none
> 
> central-node:
>   name: localhost
> 
> worker-nodes:
>   - localhost
> EOF
> 
> # Install all the required repos and make everything work together using
> # latest OVS and OVN code from github. This generates the
> # ~/ovn-heater/runtime where all the repos are cloned and the test suite
> # is run. This step also generates the container image with OVS/OVN
> # compiled from sources. This step has to be done every time we need
> # to test with a different version of OVS/OVN and can be customized with
> # the OVS/OVN_REPO and OVS/OVN_BRANCH env vars.
> $ ./do.sh install

# Missed a step here:
$ ./do.sh rally-deploy

> 
> # Start the test:
> # This brings up 30 "fake" OVN nodes and then simulates addition of
> # 1000 pods (lsps) and associated policies (port_group/address_set/acl).
> $ ./do.sh browbeat-run
> browbeat-scenarios/switch-per-node-30-node-1000-pods.yml debug-dceara-pods
> 
> # This takes quite long, ~1hr on my system.
> # Results are stored at:
> # ls -l
> ~/ovn-heater/test_results/debug-dceara-pods-20200826-080650/20200826-120718/rally/plugin-workloads/all-rally-run-0.html
> 
> What I noticed was that while the test was running (we can monitor the
> execution by tailing ~/ovn-heater/runtime/browbeat/*.log) that
> ovn-northd's CPU usage increased constantly and was above 70-80% after
> ~500 iterations.
> 
> ovn-northd logs:
> 2020-08-26T14:24:25.989Z|02119|poll_loop|INFO|wakeup due to [POLLIN] on
> fd 12 (192.16.0.1:53642<->192.16.0.1:6642) at lib/stream-ssl.c:832 (97%
> CPU usage)
> 
> 2020-08-26T14:24:31.985Z|02120|poll_loop|INFO|Dropped 54 log messages in
> last 5 seconds (most recently, 0 seconds ago) due to excessive rate
> 
> 
> 2020-08-26T14:24:31.985Z|02121|poll_loop|INFO|wakeup due to [POLLIN] on
> fd 11 (192.16.0.1:56340<->192.16.0.1:6641) at lib/stream-ssl.c:832 (99%
> CPU usage)
> 
> For troubleshooting/profiling, the easiest way I can think of for
> rerunning the sequence of commands without actually running the whole
> suite is to extract them from the ovn-nbctl daemon logs. We start it on
> node ovn-central-1. I also added a short sleep to avoid NB changes being
> batched before ovn-northd processes them:
> 
> $ docker exec ovn-central-1 grep "Running command"
> /var/log/openvswitch/ovn-nbctl.log | sed -ne 's/.*Running command
> run\(.*\)/ovn-nbctl\1; sleep 0.01/p' > commands.sh
> 
> # Now we can just run ovn-northd locally:
> $ ovn-ctl start_northd
> # Start an ovn-nbctl daemon locally:
> $ export OVN_NB_DAEMON=$(ovn-nbctl --detach)
> # Replay the commands:
> $ ./commands.sh
> 
> Regarding the ddlog compilation I suspect that we need to add support
> for it in ovn-fake-multinode which builds and runs the fake node's
> images. I can take care of that and add the rust compiler and ddlog
> binaries to the docker files.
> 
> I assume these are the branches I should use for testing:
> https://github.com/blp/ovs-reviews/tree/ovs-for-ddlog
> https://github.com/blp/ovs-reviews/tree/ddlog4
> 
> Hope this helps.
> 
> Regards,
> Dumitru
> 



More information about the discuss mailing list