[ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing

Han Zhou zhouhan at gmail.com
Wed Jul 24 03:08:23 UTC 2019


On Tue, Jul 23, 2019 at 7:41 AM Numan Siddique <nusiddiq at redhat.com> wrote:
>
>
>
> On Mon, Jul 22, 2019 at 12:35 PM Daniel Alvarez Sanchez <
dalvarez at redhat.com> wrote:
>>
>> Neat! Thanks folks :)
>> I'll try to get an OSP setup where we can patch this and re-run the
>> same tests than previous time to confirm but looks promising.
>>
>> On Fri, Jul 19, 2019 at 11:12 PM Han Zhou <zhouhan at gmail.com> wrote:
>> >
>> >
>> >
>> > On Fri, Jul 19, 2019 at 12:37 PM Numan Siddique <nusiddiq at redhat.com>
wrote:
>> >>
>> >>
>> >>
>> >> On Fri, Jul 19, 2019 at 6:19 PM Numan Siddique <nusiddiq at redhat.com>
wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Jul 19, 2019 at 6:28 AM Han Zhou <zhouhan at gmail.com> wrote:
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Jul 9, 2019 at 12:13 AM Numan Siddique <nusiddiq at redhat.com>
wrote:
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > On Tue, Jul 9, 2019 at 12:25 PM Daniel Alvarez Sanchez <
dalvarez at redhat.com> wrote:
>> >>>> >>
>> >>>> >> Thanks Numan for running these tests outside OpenStack!
>> >>>> >>
>> >>>> >> On Tue, Jul 9, 2019 at 7:50 AM Numan Siddique <
nusiddiq at redhat.com> wrote:
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > On Tue, Jul 9, 2019 at 11:05 AM Han Zhou <zhouhan at gmail.com>
wrote:
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> On Fri, Jun 21, 2019 at 12:31 AM Han Zhou <zhouhan at gmail.com>
wrote:
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique <
nusiddiq at redhat.com> wrote:
>> >>>> >> >> > >
>> >>>> >> >> > >
>> >>>> >> >> > >
>> >>>> >> >> > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou <zhouhan at gmail.com>
wrote:
>> >>>> >> >> > >>
>> >>>> >> >> > >>
>> >>>> >> >> > >>
>> >>>> >> >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez <
dalvarez at redhat.com> wrote:
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > Thanks a lot Han for the answer!
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou <
zhouhan at gmail.com> wrote:
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara <
dceara at redhat.com> wrote:
>> >>>> >> >> > >> > > >
>> >>>> >> >> > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez
Sanchez
>> >>>> >> >> > >> > > > <dalvarez at redhat.com> wrote:
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > Hi Han, all,
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > Lucas, Numan and I have been doing some 'scale'
testing of OpenStack
>> >>>> >> >> > >> > > > > using OVN and wanted to present some results and
issues that we've
>> >>>> >> >> > >> > > > > found with the Incremental Processing feature in
ovn-controller. Below
>> >>>> >> >> > >> > > > > is the scenario that we executed:
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > * 7 baremetal nodes setup: 3 controllers (running
>> >>>> >> >> > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker)
+ 4 compute nodes. OVS
>> >>>> >> >> > >> > > > > 2.10.
>> >>>> >> >> > >> > > > > * The test consists on:
>> >>>> >> >> > >> > > > >   - Create openstack network (OVN LS), subnet
and router
>> >>>> >> >> > >> > > > >   - Attach subnet to the router and set gw to
the external network
>> >>>> >> >> > >> > > > >   - Create an OpenStack port and apply a
Security Group (ACLs to allow
>> >>>> >> >> > >> > > > > UDP, SSH and ICMP).
>> >>>> >> >> > >> > > > >   - Bind the port to one of the 4 compute nodes
(randomly) by
>> >>>> >> >> > >> > > > > attaching it to a network namespace.
>> >>>> >> >> > >> > > > >   - Wait for the port to be ACTIVE in Neutron
('up == True' in NB)
>> >>>> >> >> > >> > > > >   - Wait until the test can ping the port
>> >>>> >> >> > >> > > > > * Running browbeat/rally with 16 simultaneous
process to execute the
>> >>>> >> >> > >> > > > > test above 150 times.
>> >>>> >> >> > >> > > > > * When all the 150 'fake VMs' are created,
browbeat will delete all
>> >>>> >> >> > >> > > > > the OpenStack/OVN resources.
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > We first tried with OVS/OVN 2.10 and pulled some
results which showed
>> >>>> >> >> > >> > > > > 100% success but ovn-controller is quite loaded
(as expected) in all
>> >>>> >> >> > >> > > > > the nodes especially during the deletion phase:
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR
>> >>>> >> >> > >> > > > > - Controller node (ovn-northd and
ovsdb-servers): https://imgur.com/a/8ffKKYF
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > After conducting the tests above, we replaced
ovn-controller in all 7
>> >>>> >> >> > >> > > > > nodes by the one with the current master branch
(actually from last
>> >>>> >> >> > >> > > > > week). We also replaced ovn-northd and
ovsdb-servers but the
>> >>>> >> >> > >> > > > > ovs-vswitchd has been left untouched (still on
2.10). The expected
>> >>>> >> >> > >> > > > > results were to get less ovn-controller CPU
usage and also better
>> >>>> >> >> > >> > > > > times due to the Incremental Processing feature
introduced recently.
>> >>>> >> >> > >> > > > > However, the results don't look very good:
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > - Compute node: https://imgur.com/a/wuq87F1
>> >>>> >> >> > >> > > > > - Controller node (ovn-northd and
ovsdb-servers): https://imgur.com/a/99kiyDp
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > One thing that we can tell from the ovs-vswitchd
CPU consumption is
>> >>>> >> >> > >> > > > > that it's much less in the Incremental
Processing (IP) case which
>> >>>> >> >> > >> > > > > apparently doesn't make much sense. This led us
to think that perhaps
>> >>>> >> >> > >> > > > > ovn-controller was not installing the necessary
flows in the switch
>> >>>> >> >> > >> > > > > and we confirmed this hypothesis by looking into
the dataplane
>> >>>> >> >> > >> > > > > results. Out of the 150 VMs, 10% of them were
unreachable via ping
>> >>>> >> >> > >> > > > > when using ovn-controller from master.
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > @Han, others, do you have any ideas as of what
could be happening
>> >>>> >> >> > >> > > > > here? We'll be able to use this setup for a few
more days so let me
>> >>>> >> >> > >> > > > > know if you want us to pull some other
data/traces, ...
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > Some other interesting things:
>> >>>> >> >> > >> > > > > On each of the compute nodes, (with an almost
evenly distributed
>> >>>> >> >> > >> > > > > number of logical ports bound to them), the max
amount of logical
>> >>>> >> >> > >> > > > > flows in br-int is ~90K (by the end of the test,
right before deleting
>> >>>> >> >> > >> > > > > the resources).
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > It looks like with the IP version,
ovn-controller leaks some memory:
>> >>>> >> >> > >> > > > > https://imgur.com/a/trQrhWd
>> >>>> >> >> > >> > > > > While with OVS 2.10, it remains pretty flat
during the test:
>> >>>> >> >> > >> > > > > https://imgur.com/a/KCkIT4O
>> >>>> >> >> > >> > > >
>> >>>> >> >> > >> > > > Hi Daniel, Han,
>> >>>> >> >> > >> > > >
>> >>>> >> >> > >> > > > I just sent a small patch for the ovn-controller
memory leak:
>> >>>> >> >> > >> > > > https://patchwork.ozlabs.org/patch/1113758/
>> >>>> >> >> > >> > > >
>> >>>> >> >> > >> > > > At least on my setup this is what valgrind was
pointing at.
>> >>>> >> >> > >> > > >
>> >>>> >> >> > >> > > > Cheers,
>> >>>> >> >> > >> > > > Dumitru
>> >>>> >> >> > >> > > >
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > Looking forward to hearing back :)
>> >>>> >> >> > >> > > > > Daniel
>> >>>> >> >> > >> > > > >
>> >>>> >> >> > >> > > > > PS. Sorry for my previous email, I sent it by
mistake without the subject
>> >>>> >> >> > >> > > > > _______________________________________________
>> >>>> >> >> > >> > > > > discuss mailing list
>> >>>> >> >> > >> > > > > discuss at openvswitch.org
>> >>>> >> >> > >> > > > >
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > Thanks Daniel for the testing and reporting, and
thanks Dumitru for fixing the memory leak.
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > Currently ovn-controller incremental processing only
handles below SB changes incrementally:
>> >>>> >> >> > >> > > - logical_flow
>> >>>> >> >> > >> > > - port_binding (for regular VIF binding NOT on
current chassis)
>> >>>> >> >> > >> > > - mc_group
>> >>>> >> >> > >> > > - address_set
>> >>>> >> >> > >> > > - port_group
>> >>>> >> >> > >> > > - mac_binding
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > So, in test scenario you described, since each
iteration creates network (SB datapath changes) and router ports
(port_binding changes for non VIF), the incremental processing would not
help much, because most steps in your test should trigger recompute. It
would help if you create more Fake VMs in each iteration, e.g. create 10
VMs or more on each LS. Secondly, when VIF port-binding happens on current
chassis, the ovn-controller will still do re-compute, and because you have
only 4 compute nodes, so 1/4 of the compute node will still recompute even
when binding a regular VIF port. When you have more compute nodes you would
see incremental processing more effective.
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > Got it, it makes sense (although then worst case, it
should be at
>> >>>> >> >> > >> > least what we had before and not worse but it can also
be because
>> >>>> >> >> > >> > we're mixing version here: 2.10 vs master).
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > However, what really worries me is the 10% VM
unreachable. I have one confusion here on the test steps. The last step you
described was: - Wait until the test can ping the port. So if the VM is not
pingable the test won't continue?
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > Sorry I should've explained it better. We wait for 2
minutes to the
>> >>>> >> >> > >> > port to respond to pings, if it's not reachable then
we continue with
>> >>>> >> >> > >> > the next port (16 rally processes are running
simultaneously so the
>> >>>> >> >> > >> > rest of the process may be doing stuff at the same
time).
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > To debug the problem, the first thing is to identify
what flows are missing for the VMs that is unreachable. Could you do
ovs-appctl ofproto/trace for the ICMP flow of any VM with ping failure? And
then, please enable debug log for ovn-controller with ovs-appctl -t
ovn-controller vlog/set file:dbg. There may be too many logs so please
enable it for as short time as any VM with ping failure is reproduced. If
the last step "wait until the test can ping the port" is there then it
should be able to detect the first occurrence if the VM is not reachable in
e.g. 30 sec.
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > We'll need to hack a bit here but let's see :)
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > In the ovn-scale-test we didn't have data plane
test, but this problem was not seen in our live environment either, with a
far larger scale. The major difference in your test v.s. our environment
are:
>> >>>> >> >> > >> > > - We are runing with an older version. So there
might be some rebase/refactor problem caused this. To eliminate this, I'd
suggest to try a branch I created for 2.10 (
https://github.com/hzhou8/ovs/tree/ip12_rebase_on_2.10), which matches the
base test you did which is also 2.10. It may also eliminate compatibility
problem, if there is any, between OVN master branch and OVS 2.10 as you
mentioned is used in the test.
>> >>>> >> >> > >> > > - We don't use Security Group (I guess the  ~90k OVS
flows you mentioned were mainly introduced by the Security Group use, if
all ports were put in same group). The incremental processing is expected
to be correct for security-groups, and handling it incrementally because of
address_set and port_group incremental processing. However, since the
testing only relied on the regression tests, I am not 100% sure if the test
coverage was sufficient. So could you try disabling Security Group to rule
out the problem?
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > Ok will try to repeat the tests without the SGs.
>> >>>> >> >> > >> > >
>> >>>> >> >> > >> > > Thanks,
>> >>>> >> >> > >> > > Han
>> >>>> >> >> > >> >
>> >>>> >> >> > >> > Thanks once again!
>> >>>> >> >> > >> > Daniel
>> >>>> >> >> > >>
>> >>>> >> >> > >> Hi Daniel,
>> >>>> >> >> > >>
>> >>>> >> >> > >> Any updates? Do you still see the 10% VM unreachable
>> >>>> >> >> > >>
>> >>>> >> >> > >>
>> >>>> >> >> > >> Thanks,
>> >>>> >> >> > >> Han
>> >>>> >> >> > >
>> >>>> >> >> > >
>> >>>> >> >> > > Hi Han,
>> >>>> >> >> > >
>> >>>> >> >> > > As such there is no datapath impact. After increasing the
ping wait timeout value from 120 seconds to 180 seconds its 100% now.
>> >>>> >> >> > >
>> >>>> >> >> > > But the time taken to program the flows is too huge when
compared to OVN master without IP patches.
>> >>>> >> >> > > Here is some data -
http://paste.openstack.org/show/753224/ .  I am still investigating it. I
will update my findings in some time.
>> >>>> >> >> > >
>> >>>> >> >> > > Please see the times for the action - vm.wait_for_ping
>> >>>> >> >> > >
>> >>>> >> >> >
>> >>>> >> >> > Thanks Numan for the investigation and update. Glad to hear
there is no correctness issue, but sorry for the slowness in your test
scenario. I expect that the operations in your test trigger recomputing and
the worst case should be similar performance as withour I-P. It is weird
that it turned out so much slower in your test. There can be some extra
overhead when it tries to do incremental processing and then fallback to
full recompute, but it shouldn't cause that big difference. It might be
that for some reason the main loop iteration is triggered more times
unnecessarily. I'd suggest to compare the coverage counter "lflow_run"
between the tests, and also check perf report to see if the hotspot is
somewhere else. (Sorry that I can't provide full-time help now since I am
still on vacation but I will try to be useful if things are blocked)
>> >>>> >> >>
>> >>>> >> >> Hi Numan/Daniel, do you have any new findings on why I-P got
worse result in your test? The extremely long latency (2 - 3 min) shown in
your report reminds me a similar problem I reported before:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-April/346321.html
>> >>>> >> >>
>> >>>> >> >> The root cause of that problem was still not clear. In that
report, the extremely long latency (7 min) was observed without I-P and it
didn't happen with I-P. If it is the same problem, then I suspect it is not
related to I-P or non I-P, but some problem related to ovsdb monitor
condition change. To confirm if it is same problem, could you:
>> >>>> >> >> 1. pause the test when the scale is big enough (e.g. when the
test is almost completed), and then
>> >>>> >> >> 2. enable ovn-controller debug log, and then
>> >>>> >> >> 3. run one more iteration of the test, and see if the time
was spent on waiting for SB DB update notification.
>> >>>> >> >>
>> >>>> >> >> Please ignore my speculation above if you already found the
root cause and it would be great if you could share it :)
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Thanks for sharing this Han.
>> >>>> >> >
>> >>>> >> > I do not have any new findings. Yesterday I ran ovn-scale-test
comparing OVN with IP vs without IP (using the master branch).
>> >>>> >> > The test creates a new logical switch, adds it to a router,
few ACLs and creates 2 logical ports and pings between them.
>> >>>> >> > I am using physical deployment which creates actual namespaces
instead of sandboxes.
>> >>>> >> >
>> >>>> >> > The results doesn't show any huge difference between the two.
>> >>>> >> 2300 vs 2900 seconds total time or  44 vs 56 seconds for the
95%ile?
>> >>>> >> It is not negligible IMHO. It's a >25% penalty with the IP.
Maybe I
>> >>>> >> missed something from the results?
>> >>>> >>
>> >>>> >
>> >>>> > Initially I ran with ovn-nbctl running commands as one batch (ie
combining commands with "--"). The results were very similar. Like this one
>> >>>> >
>> >>>> > *******
>> >>>> >
>> >>>> > With non IP - ovn-nbctl NO daemon mode
>> >>>> >
>> >>>> >
+--------------------------------------------------------------------------------------------------------------+
>> >>>> > |                                             Response Times
(sec)                                             |
>> >>>> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> > | action                                | min   | median | 90%ile
| 95%ile | max    | avg    | success | count |
>> >>>> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> > | ovn_network.create_routers            | 0.288 | 0.429  | 5.454
 | 5.538  | 20.531 | 1.523  | 100.0%  | 1000  |
>> >>>> > | ovn.create_lswitch                    | 0.046 | 0.139  | 0.202
 | 5.084  | 10.259 | 0.441  | 100.0%  | 1000  |
>> >>>> > | ovn_network.connect_network_to_router | 0.164 | 0.411  | 5.307
 | 5.491  | 15.636 | 1.128  | 100.0%  | 1000  |
>> >>>> > | ovn.create_lport                      | 0.11  | 0.272  | 0.478
 | 5.284  | 15.496 | 0.835  | 100.0%  | 1000  |
>> >>>> > | ovn_network.bind_port                 | 1.302 | 2.367  | 2.834
 | 3.24   | 12.409 | 2.527  | 100.0%  | 1000  |
>> >>>> > | ovn_network.wait_port_up              | 0.0   | 0.001  | 0.001
 | 0.001  | 0.002  | 0.001  | 100.0%  | 1000  |
>> >>>> > | ovn_network.ping_ports                | 0.04  | 10.24  | 10.397
| 10.449 | 10.82  | 6.767  | 100.0%  | 1000  |
>> >>>> > | total                                 | 2.219 | 13.903 | 23.068
| 24.538 | 49.437 | 13.222 | 100.0%  | 1000  |
>> >>>> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >
>> >>>> >
>> >>>> > With IP - ovn-nbctl NO daemon mode
>> >>>> >
>> >>>> > concurrency - 10
>> >>>> >
>> >>>> >
+--------------------------------------------------------------------------------------------------------------+
>> >>>> > |                                             Response Times
(sec)                                             |
>> >>>> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> > | action                                | min   | median | 90%ile
| 95%ile | max    | avg    | success | count |
>> >>>> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> > | ovn_network.create_routers            | 0.274 | 0.402  | 0.493
 | 0.51   | 0.584  | 0.408  | 100.0%  | 1000  |
>> >>>> > | ovn.create_lswitch                    | 0.064 | 0.137  | 0.213
 | 0.244  | 0.33   | 0.146  | 100.0%  | 1000  |
>> >>>> > | ovn_network.connect_network_to_router | 0.203 | 0.395  | 0.677
 | 0.766  | 0.912  | 0.427  | 100.0%  | 1000  |
>> >>>> > | ovn.create_lport                      | 0.13  | 0.261  | 0.437
 | 0.497  | 0.604  | 0.283  | 100.0%  | 1000  |
>> >>>> > | ovn_network.bind_port                 | 1.307 | 2.374  | 2.816
 | 2.904  | 3.401  | 2.325  | 100.0%  | 1000  |
>> >>>> > | ovn_network.wait_port_up              | 0.0   | 0.001  | 0.001
 | 0.001  | 0.002  | 0.001  | 100.0%  | 1000  |
>> >>>> > | ovn_network.ping_ports                | 0.028 | 10.237 | 10.422
| 10.474 | 11.281 | 6.453  | 100.0%  | 1000  |
>> >>>> > | total                                 | 2.251 | 13.631 | 14.822
| 15.008 | 15.901 | 10.044 | 100.0%  | 1000  |
>> >>>> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >
>> >>>> > *****************
>> >>>> >
>> >>>> > The results I shared in the previous email were with  ACLs added
and ovn-nbctl - batch mode disabled.
>> >>>> >
>> >>>> > I agree with you. Let me do few more runs to be sure that the
results are consistent.
>> >>>> >
>> >>>> > Thanks
>> >>>> > Numan
>> >>>> >
>> >>>> >
>> >>>> >> > I will test with OVN 2.9 vs 2.11 master along with what you
have suggested above and see if there are any problems related to ovsdb
monitor condition change.
>> >>>> >> >
>> >>>> >> > Thanks
>> >>>> >> > Numan
>> >>>> >> >
>> >>>> >> > Below are the results
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > With IP master - nbctl daemon node - No batch mode
>> >>>> >> > concurrency - 10
>> >>>> >> >
>> >>>> >> >
+--------------------------------------------------------------------------------------------------------------+
>> >>>> >> > |                                             Response Times
(sec)                                             |
>> >>>> >> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >> > | action                                | min   | median |
90%ile | 95%ile | max    | avg    | success | count |
>> >>>> >> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >> > | ovn_network.create_routers            | 0.269 | 0.661  |
10.426 | 15.422 | 37.259 | 3.721  | 100.0%  | 1000  |
>> >>>> >> > | ovn.create_lswitch                    | 0.313 | 0.45   |
12.107 | 15.373 | 30.405 | 4.185  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.connect_network_to_router | 0.163 | 0.255  |
10.121 | 10.64  | 20.475 | 2.655  | 100.0%  | 1000  |
>> >>>> >> > | ovn.create_lport                      | 0.351 | 0.514  |
12.255 | 15.511 | 34.74  | 4.621  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.bind_port                 | 1.362 | 2.447  |
7.34   | 7.651  | 17.651 | 3.146  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.wait_port_up              | 0.086 | 2.734  |
5.272  | 7.827  | 22.717 | 2.957  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.ping_ports                | 0.038 | 10.196 |
20.285 | 20.39  | 40.74  | 7.52   | 100.0%  | 1000  |
>> >>>> >> > | total                                 | 2.862 | 27.267 |
49.956 | 56.39  | 90.884 | 28.808 | 100.0%  | 1000  |
>> >>>> >> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >> > Load duration: 2950.4133141
>> >>>> >> > Full duration: 2951.58845997 seconds
>> >>>> >> >
>> >>>> >> > ***********
>> >>>> >> > With non IP - nbctl daemin node -ACLs - No batch mode
>> >>>> >> >
>> >>>> >> > concurrency - 10
>> >>>> >> >
>> >>>> >> >
+--------------------------------------------------------------------------------------------------------------+
>> >>>> >> > |                                             Response Times
(sec)                                             |
>> >>>> >> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >> > | action                                | min   | median |
90%ile | 95%ile | max    | avg    | success | count |
>> >>>> >> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >> > | ovn_network.create_routers            | 0.267 | 0.421  |
10.395 | 10.735 | 25.501 | 3.09   | 100.0%  | 1000  |
>> >>>> >> > | ovn.create_lswitch                    | 0.314 | 0.408  |
10.331 | 10.483 | 25.357 | 3.049  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.connect_network_to_router | 0.153 | 0.249  |
6.552  | 10.268 | 20.545 | 2.236  | 100.0%  | 1000  |
>> >>>> >> > | ovn.create_lport                      | 0.344 | 0.49   |
10.566 | 15.428 | 25.542 | 3.906  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.bind_port                 | 1.372 | 2.409  |
7.437  | 7.665  | 17.518 | 3.192  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.wait_port_up              | 0.086 | 1.323  |
5.157  | 7.769  | 20.166 | 2.291  | 100.0%  | 1000  |
>> >>>> >> > | ovn_network.ping_ports                | 0.034 | 2.077  |
10.347 | 10.427 | 20.307 | 5.123  | 100.0%  | 1000  |
>> >>>> >> > | total                                 | 3.109 | 21.26  |
39.245 | 44.495 | 70.197 | 22.889 | 100.0%  | 1000  |
>> >>>> >> >
+---------------------------------------+-------+--------+--------+--------+--------+--------+---------+-------+
>> >>>> >> > Load duration: 2328.11378407
>> >>>> >> > Full duration: 2334.43504095 seconds
>> >>>> >> >
>> >>>> >>
>> >>>>
>> >>>> Hi Numan/Daniel,
>> >>>>
>> >>>> I spent some time investigating this problem you reported. Thanks
Numan for the offline help sharing the details.
>> >>>>
>> >>>> Although I still didn't reproduce the slowness in my current single
node testing env with almost same steps and ACLs shared by Numan, I think I
may have figured out a highy probable cause of what you have seen.
>> >>>>
>> >>>> Here is my theory: there is a difference between the I-P and
non-I-P in the main loop. The non-I-P version checks ofctrl_can_put()
before doing any flow computation (which is introduced to solve a serious
performance problem when there are many OVS flows on a single node, see
[1]). When worked out the I-P version, I found this may not be the best
approach, since there can be new incremental changes coming and we want to
process them in current iteration incrementally, so that we don't need to
fallback to recompute in next iteration. So this logic is changed so that
we always prioritize computing new changes and keeping the desired flow
table up to date, while the in-flight messages to ovs-vswitchd may still
pending for an older version of desired state. In the end the final desired
state will be synced again to ovs-vswitchd. If there are new changes that
triggers recompute again, the recompute (which is always slow) will slow
down the ofctrl_run() which keeps sending old pending messages to
ovs-vswitchd by the same main thread. (But it won't cause the original
performance problem any more because incremental processing engine will not
recompute when there is no input change).
>> >>>>
>> >>>> However, when the test scenario triggers recompute frequently, each
single change may take longer to be enforced in OVS, because of this new
approach. The later recompute iterations would slow down the previous
computed OVS flow installation. In your test you used parallel 10, which
means at any point there might be new changes from one client such as
creating new router that triggers recomputing, which can block the OVS flow
installation triggered earlier for another client. So overall you will see
much bigger latency for each individual test iteration.
>> >>>>
>> >>>> This can also explain why I didn't reproduce the problem in my
single-client single-node environment, since each iteration is serialized.
>> >>>>
>> >>>> [1]
https://github.com/openvswitch/ovs/commit/74c760c8fe99d554b94423d49d13d5ca3dea0d9e
>> >>>>
>> >>>> To prove this theory, could you help with two tests reusing your
environment? Thanks a lot!
>> >>>>
>> >>>
>> >>> Thanks Han. I will try these and come back to you with the results.
>> >>>
>> >>> Numan
>> >>>
>> >>>>
>> >>>> 1. Instead of parallelism of 10, try 1, to make sure the test is
serialized. I'd expect the result should be similar w/ v.s. w/o I-P.
>> >>>>
>> >>>> 2. Try below patch on the I-P version you are testing, to see if
the problem is gone.
>> >>>>
----8><--------------------------------------------><8---------------
>> >>>> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
>> >>>> index 043abd6..0fcaa72 100644
>> >>>> --- a/ovn/controller/ofctrl.c
>> >>>> +++ b/ovn/controller/ofctrl.c
>> >>>> @@ -985,7 +985,7 @@ add_meter(struct ovn_extend_table_info
*m_desired,
>> >>>>   * in the correct state and not backlogged with existing
flow_mods.  (Our
>> >>>>   * criteria for being backlogged appear very conservative, but the
socket
>> >>>>   * between ovn-controller and OVS provides some buffering.) */
>> >>>> -static bool
>> >>>> +bool
>> >>>>  ofctrl_can_put(void)
>> >>>>  {
>> >>>>      if (state != S_UPDATE_FLOWS
>> >>>> diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h
>> >>>> index ed8918a..2b21c11 100644
>> >>>> --- a/ovn/controller/ofctrl.h
>> >>>> +++ b/ovn/controller/ofctrl.h
>> >>>> @@ -51,6 +51,7 @@ void ofctrl_put(struct ovn_desired_flow_table *,
>> >>>>                  const struct sbrec_meter_table *,
>> >>>>                  int64_t nb_cfg,
>> >>>>                  bool flow_changed);
>> >>>> +bool ofctrl_can_put(void);
>> >>>>  void ofctrl_wait(void);
>> >>>>  void ofctrl_destroy(void);
>> >>>>  int64_t ofctrl_get_cur_cfg(void);
>> >>>> diff --git a/ovn/controller/ovn-controller.c
b/ovn/controller/ovn-controller.c
>> >>>> index c4883aa..c85c6fa 100644
>> >>>> --- a/ovn/controller/ovn-controller.c
>> >>>> +++ b/ovn/controller/ovn-controller.c
>> >>>> @@ -1954,7 +1954,7 @@ main(int argc, char *argv[])
>> >>>>
>> >>>>                      stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>> >>>>                                      time_msec());
>> >>>> -                    if (ovnsb_idl_txn) {
>> >>>> +                    if (ovnsb_idl_txn && ofctrl_can_put()) {
>> >>>>                          engine_run(&en_flow_output,
++engine_run_id);
>> >>>>                      }
>> >>>>                      stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>> >>
>> >>
>> >>
>> >> Hi Han,
>> >>
>> >> So far I could do just one run after applying your above suggested
patch with the I-P version and  results look promising.
>> >> It seems to me the problem is gone.
>> >>
>> >>
+--------------------------------------------------------------------------------------------------------------------------+
>> >> |                                             Response Times (sec)
                                                                 |
>> >>
+----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>> >> | action                                | min   | median | 90%ile |
95%ile | max    | avg    | success | count   |
>> >>
+----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>> >> | ovn_network.ping_ports   | 0.037 | 10.236 | 10.392 | 10.462 |
20.455 | 7.15   | 100.0%  | 1000  |
>> >>
+----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>> >> | ovn_network.ping_ports   | 0.036 | 10.255 | 10.448 | 11.323 |
20.791 | 7.83   | 100.0%  | 1000  |
>> >>
+----------------------------------+--------+----------+----------+----------+---------+---------+------------+-------+
>> >>
>> >> The first row represents Non IP and the 2nd row represents IP + your
suggested patch.
>> >> The values are comparable and lot better compared to without your
patch.
>> >>
>> >> On monday I will do more runs to be sure that the data is consistent
and get back to you.
>> >>
>> >> If the results are consistent, I would try to run the tests which
Daniel and Lucas ran on an openstack deployment.
>
>
>
> Hi Han,
>
> I got some test results. I deployed devstack with OVN, configure browbeat
and patched it to
> include Daniel's test case -
https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad
>
> Ran the tests for 100 times with a concurrency of 25.
> The setup has 3 nodes - 1 controller and 2 compute nodes. The fake
namespace VMs are created on the compute nodes
> and controller node act as gateway node.
>
> Below are the results.
>
>
----------------------------------------------------------------------------------
> | Ping                     | Non IP  | Master (IP)  | Master (IP) with
Han's Fix |
>
 -------------------------------------------------------------------------------------------
> | Min (sec)             |  0.023   |   0.017          |             0.022
                   |
>
-------------------------------------------------------------------------------------------
> | Median (sec)       | 0.029    |   7.097          |            0.029
                |
>
-------------------------------------------------------------------------------------------
> | 90%ile                 | 2.254    |   47.625        |             2.047
                   |
>
-------------------------------------------------------------------------------------------
> | 95%ile                 | 4.065   |    55.26          |
4.052                    |
>
-------------------------------------------------------------------------------------------
> | Max                     | 6.088   |    66.987        |            6.075
                    |
>
-------------------------------------------------------------------------------------------
> | Avg                      | 0.877  |     17.732        |
0.599                    |
>
-------------------------------------------------------------------------------------------
>
>
> Your patch is definitely fixing the issue.
>
> Non IP  - commit - ffbe41dbcb4882aafdf80d86afa1906b2a00199e +
a62128adc303d49901509a02f7e894d0c699e5bb
> Master IP - commit - f627cf1dd922bb644b6480bfbda67a9460cb2947
> Master (IP) with Han's fix - f627cf1dd922bb644b6480bfbda67a9460cb2947 +
Above fix from Han.
>
>
> Thanks
> Numan
>
>
>> >>
>> >> Thanks
>> >> Numan
>> >>
>> >
>> > Glad to see the test result improved! Thanks a lot and looking forward
to more data. Once it is finally confirmed, we can discuss whether this
should be submitted as a formal patch considering real world scenarios.

Thanks Numan for the testing again. I posted a formal patch:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360990.html

It is slightly more complex change than the one you tested. It may (or may
not) perform slightly worse than the patch you tested in your test
scenario, but I think it is worthy. The previous patch works perfectly for
your test scenario, but it may cause incremental processing much less
effective in more common scenarios, such as:

A port-binding on local chassis triggers recompute, and then before all
flows are installed, a lot of new port bindings on other chassises keep
coming.

Previously, since new port-binding changes on other chassises can be
processed incrementally they don't block the previous flow installation.
However, with the patch I provided earlier, it will degrade to always
recompute.  In real world scenario I think those are more common than the
one in your scale test, so I don't want to sacrifice them for performance
for less common situation. Now with the formal patch I submitted, it tries
to incrementally processing new changes first, but if the new change
triggers recompute, it would abort there and prioritizing the previous flow
installation, so I hope it satisfies needs for both scenarios.

Would you mind do one more round of test with the new patch in the same
scale environment? Forgive me that I have put "tested-by" with your name
already :)

Thanks,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190723/84ca5eaf/attachment-0001.html>


More information about the discuss mailing list