[ovs-dev] [PATCH v2] ovn-northd: Add the option to pause and resume

Numan Siddique nusiddiq at redhat.com
Tue Jul 9 05:11:33 UTC 2019


On Mon, Jul 8, 2019 at 11:22 PM Mark Michelson <mmichels at redhat.com> wrote:

> Hi Numan,
>
> The code changes all look good, but there are a few problems with the
> documentation. I've noted them down below.
>
>
Oops. Lots of typos and mistakes.

Thanks for the review. I corrected them and submitted v3.

Thanks
Numan

On 7/8/19 11:41 AM, nusiddiq at redhat.com wrote:
> > From: Numan Siddique <nusiddiq at redhat.com>
> >
> > This patch adds 3 unixctl socket comments - pause, resume and is-paused.
> >
> > Usage: ovs-appctl -t ovn-northd pause/resume/is-paused
> >
> > This feature will be useful if the CMS wants to
> >    - deploy OVN DB servers in active/passive mode and
> >    - run ovn-northd on all these nodes and use unix ctl sockets to
> >      connect to the local OVN DB servers.
> >
> > On the nodes where OVN Db ovsdb-servers are in passive mode, the local
> ovn-northds
> > will process the DB changes and calculate logical flows to be throw out
> later
> > because write txns are not allowed by these ovsdb-servers. It results in
> > unncessary CPU usage.
> >
> > With these commands, CMS can pause ovn-northd on these node. A node
> > which becomes master, can resume the ovn-northd.
> >
> > This feature will be useful in ovn-kubernetes if the above deployment
> model
> > is chosen.
> >
> > Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> > ---
> > v1 -> v2
> > =======
> >    * Addressed the review comments from Ben and add more documentation
> >      about the runtime options added by this patch.
> >    * v1 had an issue - When paused, it was not even waking up to process
> >      the IDL updates. In v2, the main thread, wakes up to process any
> >      IDL updates, but doesn't do any logical flow computations.
> >
> >   ovn/northd/ovn-northd.8.xml |  48 +++++++++++++++
> >   ovn/northd/ovn-northd.c     | 117 ++++++++++++++++++++++++++++--------
> >   tests/ovn-northd.at         |  38 ++++++++++++
> >   3 files changed, 177 insertions(+), 26 deletions(-)
> >
> > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> > index e6417220f..0766902db 100644
> > --- a/ovn/northd/ovn-northd.8.xml
> > +++ b/ovn/northd/ovn-northd.8.xml
> > @@ -70,6 +70,23 @@
> >         <dd>
> >           Causes <code>ovn-northd</code> to gracefully terminate.
> >         </dd>
> > +
> > +      <dt><code>pause</code></dt>
> > +      <dd>
> > +        Pauses the ovn-northd operation from processing any Northbound
> and
> > +        Southbound database changes.
> > +      </dd>
> > +
> > +      <dt><code>resume</code></dt>
> > +      <dd>
> > +        Resumes the ovn-northd operation to process Northbound and
> > +        Southbound database contents and generate logical flows.
> > +      </dd>
> > +
> > +      <dt><code>is-paused</code></dt>
> > +      <dd>
> > +        Returns "true" if ovn-northd is currently paused, "false"
> otherwise.
> > +      </dd>
> >         </dl>
> >       </p>
> >
> > @@ -82,6 +99,37 @@
> >         of <code>ovn-northd</code> will automatically take over.
> >       </p>
> >
> > +    <h2> Active-Standby with multiple OVN DB servers</h2>
> > +    <p>
> > +      You may run multiple OVN DB servers in an OVN deployment with:
> > +      <ul>
> > +        <li>
> > +          OVN DB servers deployed in active/passive mode with one active
> > +          and multiple passive ovsdb-servers.
> > +        </li>
> > +
> > +        <li>
> > +          <code>ovn-northd</code> also deployed on all thes nodes
>
> "thes" should either be "these" or "the". I'm not sure which you intended.
>
> > +          using unix ctl sockets to connect to the local OVN DB servers.
> > +        </li>
> > +      </ul>
> > +    </p>
> > +
> > +    <p>
> > +      In such deployments, the ovn-northds on the passive nodes will
> process
> > +      the DB changes and calculate logical flows to be throw out later
>
> s/throw/thrown/
>
> > +      because write txns are not allowed by the passive ovsdb-servers.
>
> I think writing out the word "transaction" is preferred over "txn" for
> documentation.
>
> > +      It results in unncessary CPU usage.
>
> s/unncessary/unnecessary/
>
> > +    </p>
> > +
> > +    <p>
> > +      With the help of runtime management command <code>pause</code>,
> you can
> > +      pause <code>ovn-northd</code> on these nodes. When a passive node
> > +      becomes master, you can use the runtime management command
> > +      <code>resume</code> to resume the <code>ovn-northd</code> to
> process the
> > +      DB changes.
> > +    </p>
> > +
> >       <h1>Logical Flow Table Structure</h1>
> >
> >       <p>
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index 0b0a96a3a..05ddd60e3 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -50,6 +50,9 @@
> >   VLOG_DEFINE_THIS_MODULE(ovn_northd);
> >
> >   static unixctl_cb_func ovn_northd_exit;
> > +static unixctl_cb_func ovn_northd_pause;
> > +static unixctl_cb_func ovn_northd_resume;
> > +static unixctl_cb_func ovn_northd_is_paused;
> >
> >   struct northd_context {
> >       struct ovsdb_idl *ovnnb_idl;
> > @@ -8639,6 +8642,7 @@ main(int argc, char *argv[])
> >       struct unixctl_server *unixctl;
> >       int retval;
> >       bool exiting;
> > +    bool paused;
> >
> >       fatal_ignore_sigpipe();
> >       ovs_cmdl_proctitle_init(argc, argv);
> > @@ -8653,6 +8657,10 @@ main(int argc, char *argv[])
> >           exit(EXIT_FAILURE);
> >       }
> >       unixctl_command_register("exit", "", 0, 0, ovn_northd_exit,
> &exiting);
> > +    unixctl_command_register("pause", "", 0, 0, ovn_northd_pause,
> &paused);
> > +    unixctl_command_register("resume", "", 0, 0, ovn_northd_resume,
> &paused);
> > +    unixctl_command_register("is-paused", "", 0, 0,
> ovn_northd_is_paused,
> > +                             &paused);
> >
> >       daemonize_complete();
> >
> > @@ -8809,32 +8817,49 @@ main(int argc, char *argv[])
> >
> >       /* Main loop. */
> >       exiting = false;
> > +    paused = false;
> >       while (!exiting) {
> > -        struct northd_context ctx = {
> > -            .ovnnb_idl = ovnnb_idl_loop.idl,
> > -            .ovnnb_txn = ovsdb_idl_loop_run(&ovnnb_idl_loop),
> > -            .ovnsb_idl = ovnsb_idl_loop.idl,
> > -            .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
> > -            .sbrec_ha_chassis_grp_by_name =
> sbrec_ha_chassis_grp_by_name,
> > -        };
> > -
> > -        if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> > -            VLOG_INFO("ovn-northd lock acquired. "
> > -                      "This ovn-northd instance is now active.");
> > -            had_lock = true;
> > -        } else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl))
> {
> > -            VLOG_INFO("ovn-northd lock lost. "
> > -                      "This ovn-northd instance is now on standby.");
> > -            had_lock = false;
> > -        }
> > -
> > -        if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> > -            ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop);
> > -            if (ctx.ovnsb_txn) {
> > -                check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> > -                check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> > -                check_and_update_rbac(&ctx);
> > +        /* unixctl_server_run could modify the value of 'paused'.
> > +         * So store the value in local 'paused_' so that we run
> > +         * 'ovsdb_idl_loop_commit_and_wait() at the end of the loop. */
> > +        bool paused_ = paused;
> > +
> > +        if (!paused_) {
> > +            struct northd_context ctx = {
> > +                .ovnnb_idl = ovnnb_idl_loop.idl,
> > +                .ovnnb_txn = ovsdb_idl_loop_run(&ovnnb_idl_loop),
> > +                .ovnsb_idl = ovnsb_idl_loop.idl,
> > +                .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
> > +                .sbrec_ha_chassis_grp_by_name =
> sbrec_ha_chassis_grp_by_name,
> > +            };
> > +
> > +            if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> > +                VLOG_INFO("ovn-northd lock acquired. "
> > +                        "This ovn-northd instance is now active.");
> > +                had_lock = true;
> > +            } else if (had_lock &&
> !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> > +                VLOG_INFO("ovn-northd lock lost. "
> > +                        "This ovn-northd instance is now on standby.");
> > +                had_lock = false;
> >               }
> > +
> > +            if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> > +                ovn_db_run(&ctx, sbrec_chassis_by_name,
> &ovnsb_idl_loop);
> > +                if (ctx.ovnsb_txn) {
> > +                    check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
> > +                    check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
> > +                    check_and_update_rbac(&ctx);
> > +                }
> > +            }
> > +        } else {
> > +            /* ovn-northd is paused
> > +             *    - we still want to handle any db updates and update
> the
> > +             *      local IDL. Otherwise, when it is resumed, the local
> IDL
> > +             *      copy will be out of sync.
> > +             *    - but we don't want to create any txns.
> > +             * */
> > +            ovsdb_idl_run(ovnnb_idl_loop.idl);
> > +            ovsdb_idl_run(ovnsb_idl_loop.idl);
> >           }
> >
> >           unixctl_server_run(unixctl);
> > @@ -8842,8 +8867,16 @@ main(int argc, char *argv[])
> >           if (exiting) {
> >               poll_immediate_wake();
> >           }
> > -        ovsdb_idl_loop_commit_and_wait(&ovnnb_idl_loop);
> > -        ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
> > +
> > +        if (!paused_) {
> > +            ovsdb_idl_loop_commit_and_wait(&ovnnb_idl_loop);
> > +            ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
> > +        } else {
> > +            /* ovn-northd is paused, but we still want to wake up for
> any db
> > +             * updates. */
> > +            ovsdb_idl_wait(ovnnb_idl_loop.idl);
> > +            ovsdb_idl_wait(ovnsb_idl_loop.idl);
> > +        }
> >
> >           poll_block();
> >           if (should_service_stop()) {
> > @@ -8868,3 +8901,35 @@ ovn_northd_exit(struct unixctl_conn *conn, int
> argc OVS_UNUSED,
> >
> >       unixctl_command_reply(conn, NULL);
> >   }
> > +
> > +static void
> > +ovn_northd_pause(struct unixctl_conn *conn, int argc OVS_UNUSED,
> > +                const char *argv[] OVS_UNUSED, void *pause_)
> > +{
> > +    bool *pause = pause_;
> > +    *pause = true;
> > +
> > +    unixctl_command_reply(conn, NULL);
> > +}
> > +
> > +static void
> > +ovn_northd_resume(struct unixctl_conn *conn, int argc OVS_UNUSED,
> > +                  const char *argv[] OVS_UNUSED, void *pause_)
> > +{
> > +    bool *pause = pause_;
> > +    *pause = false;
> > +
> > +    unixctl_command_reply(conn, NULL);
> > +}
> > +
> > +static void
> > +ovn_northd_is_paused(struct unixctl_conn *conn, int argc OVS_UNUSED,
> > +                     const char *argv[] OVS_UNUSED, void *paused_)
> > +{
> > +    bool *paused = paused_;
> > +    if (*paused) {
> > +        unixctl_command_reply(conn, "true");
> > +    } else {
> > +        unixctl_command_reply(conn, "false");
> > +    }
> > +}
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index 62e58fd0e..0dea04edc 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -898,3 +898,41 @@ as northd
> >   OVS_APP_EXIT_AND_WAIT([ovn-northd])
> >
> >   AT_CLEANUP
> > +
> > +AT_SETUP([ovn -- ovn-northd pause and resume])
> > +AT_SKIP_IF([test $HAVE_PYTHON = no])
> > +ovn_start
> > +
> > +AT_CHECK([test xfalse = x`as northd ovs-appctl -t ovn-northd
> is-paused`])
> > +
> > +ovn-nbctl ls-add sw0
> > +
> > +OVS_WAIT_UNTIL([
> > +    ovn-sbctl lflow-list sw0
> > +    test 0 = $?])
> > +
> > +ovn-nbctl ls-del sw0
> > +OVS_WAIT_UNTIL([
> > +    ovn-sbctl lflow-list sw0
> > +    test 1 = $?])
> > +
> > +# Now pause the ovn-northd
> > +as northd ovs-appctl -t ovn-northd pause
> > +AT_CHECK([test xtrue = x`as northd ovs-appctl -t ovn-northd is-paused`])
> > +
> > +ovn-nbctl ls-add sw0
> > +
> > +# There should be no logical flows for sw0 datapath.
> > +OVS_WAIT_UNTIL([
> > +    ovn-sbctl lflow-list sw0
> > +    test 1 = $?])
> > +
> > +# Now resume ovn-northd
> > +as northd ovs-appctl -t ovn-northd resume
> > +AT_CHECK([test xfalse = x`as northd ovs-appctl -t ovn-northd
> is-paused`])
> > +
> > +OVS_WAIT_UNTIL([
> > +    ovn-sbctl lflow-list sw0
> > +    test 0 = $?])
> > +
> > +AT_CLEANUP
> >
>
>


More information about the dev mailing list