[ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

Numan Siddique nusiddiq at redhat.com
Mon Jun 24 10:12:22 UTC 2019


On Mon, Jun 24, 2019 at 1:51 PM aginwala <aginwala at asu.edu> wrote:

> Hi:
> As per irc meeting discussion, some nice findings were already discussed
> by Numan (Thanks for sharing the details).  When changing external_ids for
> a claimed port e.g. ovn-nbctl set logical_switch_port sw0-port1
> external_ids:foo=bar triggers re-computation on local compute. I do see the
> same behavior. Numan is proposing a patch to skip computation for
> external_ids column for an already claimed port for port_binding table
> because of runtime_data, can't handle change for input SB_port_binding,
> fall back to recompute (
> https://github.com/openvswitch/ovs/blob/master/ovn/lib/inc-proc-eng.h#L77).
> However,  I don't see external_ids in port_binding table for the port being
> set explicitly when setting Interface table in the test code that Daniel
> posted [1] which could trigger extra re-computation in current test
> scenario.
>

ovn-northd just copies the external_ids of a logical switch port to
external_ids of port binding.  And networking-ovn makes use of external_ids
a lot.


>
> Also ovs-vsctl add-br test will also trigger re-computation on local
> compute and yes I can see the same. Since we don't have any handlers for
> Ports and Interfaces table similar to port_binding and other handlers @
> https://github.com/openvswitch/ovs/blob/master/ovn/controller/ovn-controller.c#L1769,
> adding a new bridge also causes re-computation on the local compute. Not
> sure if its required immediately because as per the patch shared by Daniel
> [1], I don't see any new test bridges getting created  apart from br-int
> and hence wont be much impact. Or may be I missed to see if they are also
> creating test bridges during testing. Of course, any new ovs-vsctl command
> for attaching/detaching vif will sure trigger recompute on br-int as and
> when VIF(vm) gets added/deleted to program the flow on local compute.
>

It would impact how the CMS creates the ovs port.

If suppose If I do something like below
---
ovs-vsctl add-port br-int foo
ovs-vsctl set interface foo type=internal
ovs-vsctl set Interface foo external_ids:iface-id=foo-id
----
and if ovn-controller gets 3 updates from ovsdb-server, this would result
in 3 recomputations.

However if I do
ovs-vsctl add-port br-int foo -- set interface foo type=internal -- set
interface foo external_ids:iface-id=foo-id

this could result in only 1 recomputation.

I think ovn-controller should handle the local ovsdb changes for
   1. external_ids of openvswitch table
   2. if an ovs interface's external_ids:iface-id  is updated.

We should try to ignore or any other changes to the local ovsdb.


> I didn't get a chance to verify when a chassisredirect port is claimed on
> a gateway chassis, it triggers computation on all computes registered with
> SB as per code
> https://github.com/openvswitch/ovs/blob/master/ovn/controller/binding.c#L722
> which was also raises further optimization for chassisredirect flow that
> Numan is suggesting.
>
> 1.
> https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad
>
>
I submitted the patches just now to address some of the issues -
https://patchwork.ozlabs.org/project/openvswitch/list/?series=115737

I also ran the test with these patches, but it didn't help in any
improvement. Although the patches I submitted avoids  recomputation
for some of the scenarios, I think I still need to dig further to see
what's causing the performance impact when compared with non IP patches,

Thanks
Numan

On Fri, Jun 21, 2019 at 12:32 AM Han Zhou <zhouhan at gmail.com> wrote:
>
>>
>>
>> On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <nusiddiq at redhat.com>
>> wrote:
>> >
>> >
>> >
>> > On Fri, Jun 21, 2019, 11:47 AM Han Zhou <zhouhan at gmail.com> wrote:
>> >>
>> >>
>> >>
>> >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
>> dalvarez at redhat.com> wrote:
>> >> >
>> >> > Thanks a lot Han for the answer!
>> >> >
>> >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <zhouhan at gmail.com> wrote:
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <dceara at redhat.com>
>> wrote:
>> >> > > >
>> >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez
>> >> > > > <dalvarez at redhat.com> wrote:
>> >> > > > >
>> >> > > > > Hi Han, all,
>> >> > > > >
>> >> > > > > Lucas, Numan and I have been doing some 'scale' testing of
>> OpenStack
>> >> > > > > using OVN and wanted to present some results and issues that
>> we've
>> >> > > > > found with the Incremental Processing feature in
>> ovn-controller. Below
>> >> > > > > is the scenario that we executed:
>> >> > > > >
>> >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute
>> nodes. OVS
>> >> > > > > 2.10.
>> >> > > > > * The test consists on:
>> >> > > > >   - Create openstack network (OVN LS), subnet and router
>> >> > > > >   - Attach subnet to the router and set gw to the external
>> network
>> >> > > > >   - Create an OpenStack port and apply a Security Group (ACLs
>> to allow
>> >> > > > > UDP, SSH and ICMP).
>> >> > > > >   - Bind the port to one of the 4 compute nodes (randomly) by
>> >> > > > > attaching it to a network namespace.
>> >> > > > >   - Wait for the port to be ACTIVE in Neutron ('up == True' in
>> NB)
>> >> > > > >   - Wait until the test can ping the port
>> >> > > > > * Running browbeat/rally with 16 simultaneous process to
>> execute the
>> >> > > > > test above 150 times.
>> >> > > > > * When all the 150 'fake VMs' are created, browbeat will
>> delete all
>> >> > > > > the OpenStack/OVN resources.
>> >> > > > >
>> >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which
>> showed
>> >> > > > > 100% success but ovn-controller is quite loaded (as expected)
>> in all
>> >> > > > > the nodes especially during the deletion phase:
>> >> > > > >
>> >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>> >> > > > > - Controller node (ovn-northd and ovsdb-servers):
>> https://imgur.com/a/8ffKKYF
>> >> > > > >
>> >> > > > > After conducting the tests above, we replaced ovn-controller
>> in all 7
>> >> > > > > nodes by the one with the current master branch (actually from
>> last
>> >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the
>> >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The
>> expected
>> >> > > > > results were to get less ovn-controller CPU usage and also
>> better
>> >> > > > > times due to the Incremental Processing feature introduced
>> recently.
>> >> > > > > However, the results don't look very good:
>> >> > > > >
>> >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>> >> > > > > - Controller node (ovn-northd and ovsdb-servers):
>> https://imgur.com/a/99kiyDp
>> >> > > > >
>> >> > > > > One thing that we can tell from the ovs-vswitchd CPU
>> consumption is
>> >> > > > > that it's much less in the Incremental Processing (IP) case
>> which
>> >> > > > > apparently doesn't make much sense. This led us to think that
>> perhaps
>> >> > > > > ovn-controller was not installing the necessary flows in the
>> switch
>> >> > > > > and we confirmed this hypothesis by looking into the dataplane
>> >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via
>> ping
>> >> > > > > when using ovn-controller from master.
>> >> > > > >
>> >> > > > > @Han, others, do you have any ideas as of what could be
>> happening
>> >> > > > > here? We'll be able to use this setup for a few more days so
>> let me
>> >> > > > > know if you want us to pull some other data/traces, ...
>> >> > > > >
>> >> > > > > Some other interesting things:
>> >> > > > > On each of the compute nodes, (with an almost evenly
>> distributed
>> >> > > > > number of logical ports bound to them), the max amount of
>> logical
>> >> > > > > flows in br-int is ~90K (by the end of the test, right before
>> deleting
>> >> > > > > the resources).
>> >> > > > >
>> >> > > > > It looks like with the IP version, ovn-controller leaks some
>> memory:
>> >> > > > > https://imgur.com/a/trQrhWd
>> >> > > > > While with OVS 2.10, it remains pretty flat during the test:
>> >> > > > > https://imgur.com/a/KCkIT4O
>> >> > > >
>> >> > > > Hi Daniel, Han,
>> >> > > >
>> >> > > > I just sent a small patch for the ovn-controller memory leak:
>> >> > > > https://patchwork.ozlabs.org/patch/1113758/
>> >> > > >
>> >> > > > At least on my setup this is what valgrind was pointing at.
>> >> > > >
>> >> > > > Cheers,
>> >> > > > Dumitru
>> >> > > >
>> >> > > > >
>> >> > > > > Looking forward to hearing back :)
>> >> > > > > Daniel
>> >> > > > >
>> >> > > > > PS. Sorry for my previous email, I sent it by mistake without
>> the subject
>> >> > > > > _______________________________________________
>> >> > > > > discuss mailing list
>> >> > > > > discuss at openvswitch.org
>> >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >> > >
>> >> > > Thanks Daniel for the testing and reporting, and thanks Dumitru
>> for fixing the memory leak.
>> >> > >
>> >> > > Currently ovn-controller incremental processing only handles below
>> SB changes incrementally:
>> >> > > - logical_flow
>> >> > > - port_binding (for regular VIF binding NOT on current chassis)
>> >> > > - mc_group
>> >> > > - address_set
>> >> > > - port_group
>> >> > > - mac_binding
>> >> > >
>> >> > > So, in test scenario you described, since each iteration creates
>> network (SB datapath changes) and router ports (port_binding changes for
>> non VIF), the incremental processing would not help much, because most
>> steps in your test should trigger recompute. It would help if you create
>> more Fake VMs in each iteration, e.g. create 10 VMs or more on each LS.
>> Secondly, when VIF port-binding happens on current chassis, the
>> ovn-controller will still do re-compute, and because you have only 4
>> compute nodes, so 1/4 of the compute node will still recompute even when
>> binding a regular VIF port. When you have more compute nodes you would see
>> incremental processing more effective.
>> >> >
>> >> > Got it, it makes sense (although then worst case, it should be at
>> >> > least what we had before and not worse but it can also be because
>> >> > we're mixing version here: 2.10 vs master).
>> >> > >
>> >> > > However, what really worries me is the 10% VM unreachable. I have
>> one confusion here on the test steps. The last step you described was: -
>> Wait until the test can ping the port. So if the VM is not pingable the
>> test won't continue?
>> >> >
>> >> > Sorry I should've explained it better. We wait for 2 minutes to the
>> >> > port to respond to pings, if it's not reachable then we continue with
>> >> > the next port (16 rally processes are running simultaneously so the
>> >> > rest of the process may be doing stuff at the same time).
>> >> >
>> >> > >
>> >> > > To debug the problem, the first thing is to identify what flows
>> are missing for the VMs that is unreachable. Could you do ovs-appctl
>> ofproto/trace for the ICMP flow of any VM with ping failure? And then,
>> please enable debug log for ovn-controller with ovs-appctl -t
>> ovn-controller vlog/set file:dbg. There may be too many logs so please
>> enable it for as short time as any VM with ping failure is reproduced. If
>> the last step "wait until the test can ping the port" is there then it
>> should be able to detect the first occurrence if the VM is not reachable in
>> e.g. 30 sec.
>> >> >
>> >> > We'll need to hack a bit here but let's see :)
>> >> > >
>> >> > > In the ovn-scale-test we didn't have data plane test, but this
>> problem was not seen in our live environment either, with a far larger
>> scale. The major difference in your test v.s. our environment are:
>> >> > > - We are runing with an older version. So there might be some
>> rebase/refactor problem caused this. To eliminate this, I'd suggest to try
>> a branch I created for 2.10 (
>> https://github.com/hzhou8/ovs/tree/ip12_rebase_on_2.10), which matches
>> the base test you did which is also 2.10. It may also eliminate
>> compatibility problem, if there is any, between OVN master branch and OVS
>> 2.10 as you mentioned is used in the test.
>> >> > > - We don't use Security Group (I guess the  ~90k OVS flows you
>> mentioned were mainly introduced by the Security Group use, if all ports
>> were put in same group). The incremental processing is expected to be
>> correct for security-groups, and handling it incrementally because of
>> address_set and port_group incremental processing. However, since the
>> testing only relied on the regression tests, I am not 100% sure if the test
>> coverage was sufficient. So could you try disabling Security Group to rule
>> out the problem?
>> >> >
>> >> > Ok will try to repeat the tests without the SGs.
>> >> > >
>> >> > > Thanks,
>> >> > > Han
>> >> >
>> >> > Thanks once again!
>> >> > Daniel
>> >>
>> >> Hi Daniel,
>> >>
>> >> Any updates? Do you still see the 10% VM unreachable
>> >>
>> >>
>> >> Thanks,
>> >> Han
>> >
>> >
>> > Hi Han,
>> >
>> > As such there is no datapath impact. After increasing the ping wait
>> timeout value from 120 seconds to 180 seconds its 100% now.
>> >
>> > But the time taken to program the flows is too huge when compared to
>> OVN master without IP patches.
>> > Here is some data -  http://paste.openstack.org/show/753224/ .  I am
>> still investigating it. I will update my findings in some time.
>> >
>> > Please see the times for the action - vm.wait_for_ping
>> >
>>
>> Thanks Numan for the investigation and update. Glad to hear there is no
>> correctness issue, but sorry for the slowness in your test scenario. I
>> expect that the operations in your test trigger recomputing and the worst
>> case should be similar performance as withour I-P. It is weird that it
>> turned out so much slower in your test. There can be some extra overhead
>> when it tries to do incremental processing and then fallback to full
>> recompute, but it shouldn't cause that big difference. It might be that for
>> some reason the main loop iteration is triggered more times unnecessarily.
>> I'd suggest to compare the coverage counter "lflow_run" between the tests,
>> and also check perf report to see if the hotspot is somewhere else. (Sorry
>> that I can't provide full-time help now since I am still on vacation but I
>> will try to be useful if things are blocked)
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190624/4a634886/attachment-0001.html>


More information about the discuss mailing list