[ovs-dev] [PATCH] ovn-northd: Fix the ovn-northd continous looping

Numan Siddique nusiddiq at redhat.com
Tue Jul 16 18:01:51 UTC 2019


On Sat, Jul 13, 2019 at 2:16 AM Gregory Rose <gvrose8192 at gmail.com> wrote:

> On 7/12/2019 10:50 AM, nusiddiq at redhat.com wrote:
> > From: Numan Siddique <nusiddiq at redhat.com>
>
> In the subject line s/continous/continuous/
>
> >
> > ovn-northd wakes up continuously from poll_block(). This issue can be
> reproduced
> > in the sandbox with the below commands
> >
> > ovn-nbctl lr-add lr0
> > ovn-nbctl ls-add public
> > ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.100/24
> > ovn-nbctl lsp-add public public-lr0
> > ovn-nbctl lsp-set-type public-lr0 router
> > ovn-nbctl lsp-set-addresses public-lr0 router
> > ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
> > ovn-nbctl lrp-set-gateway-chassis lr0-public chassis-1 20
> >
> > This issue is seen after the commit [1], which makes use of the function
> -
> > sbrec_port_binding_update_nat_addresses_addvalue() to add a value to
> > Port_Binding.nat_addresses column.
> >
> > Looks like the IDL client code is sending the transactions to the
> ovsdb-server repeatedly
> > to update the Port_Binding.nat_addresses even though the Southbound DB
> has updated
> > the column when this function is used. The actual bug seems to be in the
> IDL client code
> > and that needs to be fixed. This patch as a quick fix, fixes
> ovn-northd's continuous loop
> > by not using this function, instead making use of
> sbrec_port_binding_set_nat_addresses().
> >
> > The below messages are seen continuously when the ovn-nortdh debug logs
> are enabled.
> >
> > ****
> >
> > 2019-07-12T17:26:13.837Z|74512|jsonrpc|DBG|unix:sb1.ovsdb: received
> reply,
> > result=[{},{"count":1},{"count":1}], id=18628
> > 2019-07-12T17:26:13.837Z|74513|poll_loop|DBG|wakeup due to 0-ms timeout
> at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
> > 2019-07-12T17:26:13.837Z|74514|jsonrpc|DBG|unix:sb1.ovsdb: send request,
> > method="transact",
> params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
> >
> {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"row":
> >
> {"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},{"mutations":[["nat_addresses",
> > "insert",["set",["00:00:20:20:12:13 172.168.0.100
> is_chassis_resident(\"cr-lr0-public\")"]]]],
> >
> "where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}],
> id=18629
> >
> > 2019-07-12T17:26:13.837Z|74516|jsonrpc|DBG|unix:sb1.ovsdb: received
> reply, result=[{},{"count":1},{"count":1}], id=18629
> > 2019-07-12T17:26:13.837Z|74517|poll_loop|DBG|wakeup due to 0-ms timeout
> at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
> > 2019-07-12T17:26:13.837Z|74518|jsonrpc|DBG|unix:sb1.ovsdb: send request,
> > method="transact",
> params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
> >
> {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],
> > "row":{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},
> > {"mutations":[["nat_addresses","insert",["set",["00:00:20:20:12:13
> 172.168.0.100
> >
> is_chassis_resident(\"cr-lr0-public\")"]]]],"where":[["_uuid","==",["uuid",
> >
> "56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}],
> id=18630
> > 2019-07-12T17:26:13.837Z|74520|jsonrpc|DBG|unix:sb1.ovsdb: received
> reply, result=[{},{"count":1},{"count":1}], id=18630
> > ******
> >
> > The OpenStack CI tests for networking-ovn is frequently failing few
> tests after this
> > commit. The failure seems to be related to timing issues as ovn-northd
> is hogging
> > the CPU continuously. We are also seeing travis CI test failures after
> this commit.
> >
> > [1] - ed198fb3b92e
> >
> > Fixes: ed198fb3b92e("ovn: Send GARP for the router ports with
> reside-on-redirect-chassis options set")
> > Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> > ---
> >   ovn/northd/ovn-northd.c | 19 ++++++++++---------
> >   1 file changed, 10 insertions(+), 9 deletions(-)
> >
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index ce382ac89..127227712 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -2530,13 +2530,6 @@ ovn_port_update_sbrec(struct northd_context *ctx,
> >                   }
> >               }
> >
> > -            sbrec_port_binding_set_nat_addresses(op->sb,
> > -                                                 (const char **) nats,
> n_nats);
> > -            for (size_t i = 0; i < n_nats; i++) {
> > -                free(nats[i]);
> > -            }
> > -            free(nats);
> > -
> >               /* Add the router mac and IPv4 addresses to
> >                * Port_Binding.nat_addresses so that GARP is sent for
> these
> >                * IPs by the ovn-controller on which the distributed
> gateway
> > @@ -2578,10 +2571,18 @@ ovn_port_update_sbrec(struct northd_context *ctx,
> >
>  op->peer->od->l3redirect_port->json_key);
> >                   }
> >
> > -                sbrec_port_binding_update_nat_addresses_addvalue(
> > -                    op->sb, ds_cstr(&garp_info));
> > +                n_nats++;
> > +                nats = xrealloc(nats, (n_nats * sizeof *nats));
> > +                nats[n_nats - 1] = ds_steal_cstr(&garp_info);
> >                   ds_destroy(&garp_info);
> >               }
> > +
> > +            sbrec_port_binding_set_nat_addresses(op->sb,
> > +                                                 (const char **) nats,
> n_nats);
> > +            for (size_t i = 0; i < n_nats; i++) {
> > +                free(nats[i]);
> > +            }
> > +            free(nats);
> >           }
> >
> >           sbrec_port_binding_set_parent_port(op->sb,
> op->nbsp->parent_name);
>
> Works for me.  With the subject line fixed up I think it's fine.
> Tested-by: Greg Rose <gvrose8192 at gmail.com>
> Reviewed-by: Greg Rose <gvrose8192 at gmail.com>
>

Thanks for the review. I submitted v2 fixing the typo.

Numan


Thanks.


More information about the dev mailing list