[ovs-dev] [PATCH v6 2/5] Measure timing of ovn-controller loops.

Mark Michelson mmichels at redhat.com
Fri Mar 23 22:22:23 UTC 2018


Sorry for the delayed response on this, but things are a bit busy/weird 
for me right now.

On 03/21/2018 04:33 PM, Han Zhou wrote:
> (Add the dev list back)
> 
> On Wed, Mar 21, 2018 at 1:48 PM, Mark Michelson <mmichels at redhat.com 
> <mailto:mmichels at redhat.com>> wrote:
>  >
>  > On 03/15/2018 12:04 PM, Han Zhou wrote:
>  >>
>  >>
>  >>
>  >> On Wed, Mar 7, 2018 at 9:01 AM, Mark Michelson <mmichels at redhat.com 
> <mailto:mmichels at redhat.com> <mailto:mmichels at redhat.com 
> <mailto:mmichels at redhat.com>>> wrote:
>  >>  >
>  >>  > This modifies ovn-controller to measure the amount of time it 
> takes to
>  >>  > detect a change in the southbound database and generate the resulting
>  >>  > flow table. This may require multiple iterations of the 
> ovn-controller
>  >>  > loop.
>  >>  >
>  >>  > The statistics can be queried using:
>  >>  >
>  >>  > ovs-appctl -t ovn-controller stopwatch/show ovn-controller-loop
>  >>  >
>  >>  > The statistics can be reset using:
>  >>  >
>  >>  > ovs-appctl -t ovn-controller stopwatch/reset ovn-controller-loop
>  >>  >
>  >> If the purpose of this patch is for measuring *SB changes* handling 
> time only, the statistics may be renamed to something like 
> ovn-controller-loop-sb. Otherwise it is a little bit misleading since 
> the controller loop handles many other things, such as local interface 
> changes, Packet-ins, OF messages, etc.
>  >
>  >
>  > Sure, this sounds reasonable. I'll talk about this a bit more down 
> below, though.
>  >
>  >>
>  >>  > Signed-off-by: Mark Michelson <mmichels at redhat.com 
> <mailto:mmichels at redhat.com> <mailto:mmichels at redhat.com 
> <mailto:mmichels at redhat.com>>>
>  >>
>  >>  > ---
>  >>  >  ovn/controller/ovn-controller.c | 17 +++++++++++++++++
>  >>  >  1 file changed, 17 insertions(+)
>  >>  >
>  >>  > diff --git a/ovn/controller/ovn-controller.c 
> b/ovn/controller/ovn-controller.c
>  >>  > index 7592bda25..abd253fca 100644
>  >>  > --- a/ovn/controller/ovn-controller.c
>  >>  > +++ b/ovn/controller/ovn-controller.c
>  >>  > @@ -57,6 +57,9 @@
>  >>  >  #include "stream.h"
>  >>  >  #include "unixctl.h"
>  >>  >  #include "util.h"
>  >>  > +#include "timeval.h"
>  >>  > +#include "timer.h"
>  >>  > +#include "stopwatch.h"
>  >>  >
>  >>  >  VLOG_DEFINE_THIS_MODULE(main);
>  >>  >
>  >>  > @@ -67,6 +70,8 @@ static unixctl_cb_func inject_pkt;
>  >>  >  #define DEFAULT_BRIDGE_NAME "br-int"
>  >>  >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>  >>  >
>  >>  > +#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-loop"
>  >>  > +
>  >>  >  static void update_probe_interval(struct controller_ctx *,
>  >>  >                                    const char *ovnsb_remote);
>  >>  >  static void parse_options(int argc, char *argv[]);
>  >>  > @@ -639,8 +644,10 @@ main(int argc, char *argv[])
>  >>  >      unixctl_command_register("inject-pkt", "MICROFLOW", 1, 1, 
> inject_pkt,
>  >>  >                               &pending_pkt);
>  >>  >
>  >>  > +    stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>  >>  >      /* Main loop. */
>  >>  >      exiting = false;
>  >>  > +    unsigned int our_seqno = 0;
>  >>  >      while (!exiting) {
>  >>  >          /* Check OVN SB database. */
>  >>  >          char *new_ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
>  >>  > @@ -659,6 +666,12 @@ main(int argc, char *argv[])
>  >>  >              .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
>  >>  >          };
>  >>  >
>  >>  > +        if (our_seqno != ovsdb_idl_get_seqno(ctx.ovnsb_idl)) {
>  >>  > +            stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
>  >>  > +                            time_msec());
>  >> When multiple iterations are needed for one SB change handling, the 
> stopwatch can be started before the previous one is stopped, if seqno 
> changes in between. Could you explain the consideration here?
>  >
>  >
>  > That actually cannot happen. Check out 
> stopwatch_start_sample_protected() in lib/stopwatch.c. If the stopwatch 
> currently has a sample in progress, then the start operation immediately 
> returns and will not overwrite the previously recorded start time. The 
> only way that the start time can be recorded is if there is no sample 
> currently in progress.
> 
> Agree. So some of the SB change processing time are silently ignored, 
> but it is totally fine since it is just for statistics, and it is ok to 
> ignore some samples in rare situation :)
>  >
>  >>  > +            our_seqno = ovsdb_idl_get_seqno(ctx.ovnsb_idl);
>  >>  > +        }
>  >>  > +
>  >>  >          update_probe_interval(&ctx, ovnsb_remote);
>  >>  >
>  >>  >          update_ssl_config(ctx.ovs_idl);
>  >>  > @@ -728,6 +741,9 @@ main(int argc, char *argv[])
>  >>  >                      ofctrl_put(&flow_table, &pending_ct_zones,
>  >>  >                                 get_nb_cfg(ctx.ovnsb_idl));
>  >>  >
>  >>  > +                    stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
>  >>  > +                                   time_msec());
>  >>  > +
>  >>
>  >> Two issues here:
>  >>
>  >> Firstly, we start the stopwatch only when seqno changes, but we will 
> always call the stop, which may result in many unnecessary messages to 
> the stopwatch thread. Maybe it is not a big issue, but it would be 
> better if we have a variable to control it to stop only when necessary.
>  >
>  >
>  > This is a good point. I had been viewing the extra stopwatch_stop() 
> calls as being inconsequential since they would be eventual no-ops. 
> However, you're definitely right that it leads to extra processing that 
> we can easily avoid by keeping track of whether we have a sample in 
> progress.
>  >
>  >>
>  >> Secondly, when multiple iterations are needed for one SB change 
> handling, the iterations will be like:
>  >> - iter1:
>  >>      - stopwatch_start
>  >>      - ofctrl_put
>  >>      - stopwatch stop
>  >> - iter2:
>  >>      - ofctrl_run
>  >>      - (ofctrl_put skipped)
>  >> ...
>  >> - iterN:
>  >>      - ofctrl_run
>  >>      - ofctrl_put (will not send to OVS, since desired and installed 
> flows should be same now)
>  >>      - stopwatch stop
>  >>
>  >> So putting stopwatch_stop here still measures only the first 
> iteration. I had a suggestion earlier about checking ofctrl.cur_cfg, but 
> I was wrong since I realized that nb_cfg changes only upon user 
> requesting waiting for HV. To really measure the multi-iteration 
> processing, we need a mechanism that ofctrl takes current SB seqno as 
> input and maintains the cur_seq_no just like how cur_cfg is maintained.
>  >
>  >
>  > Thanks for the detailed explanation. I think I understand better the 
> position you are coming from. Let me first explain what I had been thinking.
>  >
>  > - iter1:
>  >     - SB seqno has changed
>  >     - stopwatch_start()
>  >     - ofctrl_can_put() returns false, so no stopwatch_stop call
> ofctrl_can_put() return false means the SB change detected in this 
> iteration can't even be started to get added to ovs.
>  > ...
>  > - iterN:
>  >     - ofctrl_can_put() returns true
> ofctrl_can_put() returns true means the last SB change now can get added 
> to OVS, but it may just get started, and it is not completed yet when 
> multiple iterations are needed.
>  >     - stopwatch_stop()
>  >
>  > On iterations 2 through N, it does not matter if the SB seqno has 
> changed or not. The stopwatch_start() will be a no-op since there is 
> already a sample being measured. The change to the southbound database 
> that we observed in iteration 1 gets timed until iteration N.
>  >
>  > The question here is whether my poor naming of the measurement has 
> created confusion. In other words, am I expecting a measurement of one 
> thing while you expect the measurement of something else? Or is the 
> logic being used here completely incorrect?
>  >
>  > What I was intending to measure was the time it takes to see a change 
> in the southbound database, create the resultant flows, and send those 
> to vswitchd. I believe what you're thinking about is just the latter two 
> steps.
> 
> To measure the time it takes to "see a change in the southbound 
> database, create the resultant flows, and send those to vswitchd", I 
> think we need to consider the iterations in my previous description. 
> ofctrl_can_put() returns true means the flow sending to vswitchd starts 
> in this iteration for sure, but it may or may not finish in this same 
> iteration. If oftrl_put() finishes sending the flow in this same 
> iteration, your current logic works perfectly. Otherwise, if multiple 
> iterations are needed, it is ofctrl_run() in the next iterations that 
> completes the flow sendings to vswitchd. The iterations you described is 
> not a start-to-end cycle of
> SB change handling (please see my comment inlined).
> 
> As you see, to measure a real complete cycle of "SB change - flow 
> compute - adding to ovs" is a little bit tricky, and ofctrl_can_put() 
> itself doesn't provide all the information, so that's why I proposed to 
> change the scope/rename the statistics of this patch just to measure the 
> controller loop time which is straighforward, and then have the *current 
> seq_no* mechanism implemented in ofctrl later so that we can measure the 
> "SB change - flow compute - adding to ovs" cycle. Hope I didn't make it 
> more confusing :)

This makes a great amount of sense and is not confusing. Thank you.

I think the main flaw between what I actually coded and what I said is 
the part about adding the flows to OVS. In reality, I'm way more 
concerned about the time it takes to generate flows than anything else, 
since that happens 100% in ovn-controller and is something that is good 
to have measurements for. I can think of two things I can do here that 
might be good:

1) Change the scope of the stopwatch (as well as the name) so that it 
only operates within the ofctrl_can_put() conditional. This way, all it 
is measuring is flow creation. I could call the stopwatch something like 
"ovn-controller-flow-process" or something. This way, I'm zeroing in on 
just the operation I think is a good one to be measuring. The southbound 
database doesn't play into it, nor does passing the flows down to OVS.

2) Keep the stopwatch measurements basically where they are and change 
the description of what I'm measuring. What it currently measures is the 
flow creation for a specific change to the southbound database, plus 
potentially passing it down to OVS. If I move the stopwatch_stop() call 
to before ofctrl_put(), then I think I'll be measuring the time it takes 
to process a specific change to the southbound database, but not taking 
into account sending the flows to OVS.

Another option is to implement both (1) and (2). (1) can give a general 
idea of how long it takes to generate flows, no matter the circumstance. 
(2) can give a general idea of how long it takes to create flows based 
on changes seen in the southbound database.

In either case, I think you're right that putting off the ofctrl seqno 
changes until later.

What do you think?

> 
> Thanks,
> Han
> 
>  >
>  >>  >                      hmap_destroy(&flow_table);
>  >>  >                  }
>  >>  >                  if (ctx.ovnsb_idl_txn) {
>  >>  > @@ -792,6 +808,7 @@ main(int argc, char *argv[])
>  >>  >              ofctrl_wait();
>  >>  >              pinctrl_wait(&ctx);
>  >>  >          }
>  >>  > +
>  >>  >          ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
>  >>  >
>  >>  >          if (ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop) == 1) {
>  >>  > --
>  >>  > 2.14.3
>  >>  >
>  >>  > _______________________________________________
>  >>  > dev mailing list
>  >>  > dev at openvswitch.org <mailto:dev at openvswitch.org> 
> <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org>>
>  >>  > https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>  >>
>  >> Acked-by: Han Zhou <hzhou8 at ebay.com <mailto:hzhou8 at ebay.com> 
> <mailto:hzhou8 at ebay.com <mailto:hzhou8 at ebay.com>>>
>  >
>  >



More information about the dev mailing list