[ovs-dev] [PATCH] ovn-northd: Fix the ovn-northd continous looping

Gregory Rose gvrose8192 at gmail.com
Fri Jul 12 20:46:27 UTC 2019


On 7/12/2019 10:50 AM, nusiddiq at redhat.com wrote:
> From: Numan Siddique <nusiddiq at redhat.com>

In the subject line s/continous/continuous/

>
> ovn-northd wakes up continuously from poll_block(). This issue can be reproduced
> in the sandbox with the below commands
>
> ovn-nbctl lr-add lr0
> ovn-nbctl ls-add public
> ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.100/24
> ovn-nbctl lsp-add public public-lr0
> ovn-nbctl lsp-set-type public-lr0 router
> ovn-nbctl lsp-set-addresses public-lr0 router
> ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public
> ovn-nbctl lrp-set-gateway-chassis lr0-public chassis-1 20
>
> This issue is seen after the commit [1], which makes use of the function -
> sbrec_port_binding_update_nat_addresses_addvalue() to add a value to
> Port_Binding.nat_addresses column.
>
> Looks like the IDL client code is sending the transactions to the ovsdb-server repeatedly
> to update the Port_Binding.nat_addresses even though the Southbound DB has updated
> the column when this function is used. The actual bug seems to be in the IDL client code
> and that needs to be fixed. This patch as a quick fix, fixes ovn-northd's continuous loop
> by not using this function, instead making use of sbrec_port_binding_set_nat_addresses().
>
> The below messages are seen continuously when the ovn-nortdh debug logs are enabled.
>
> ****
>
> 2019-07-12T17:26:13.837Z|74512|jsonrpc|DBG|unix:sb1.ovsdb: received reply,
> result=[{},{"count":1},{"count":1}], id=18628
> 2019-07-12T17:26:13.837Z|74513|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
> 2019-07-12T17:26:13.837Z|74514|jsonrpc|DBG|unix:sb1.ovsdb: send request,
> method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
> {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"row":
> {"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},{"mutations":[["nat_addresses",
> "insert",["set",["00:00:20:20:12:13 172.168.0.100 is_chassis_resident(\"cr-lr0-public\")"]]]],
> "where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18629
>
> 2019-07-12T17:26:13.837Z|74516|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18629
> 2019-07-12T17:26:13.837Z|74517|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage)
> 2019-07-12T17:26:13.837Z|74518|jsonrpc|DBG|unix:sb1.ovsdb: send request,
> method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"},
> {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],
> "row":{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},
> {"mutations":[["nat_addresses","insert",["set",["00:00:20:20:12:13 172.168.0.100
> is_chassis_resident(\"cr-lr0-public\")"]]]],"where":[["_uuid","==",["uuid",
> "56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18630
> 2019-07-12T17:26:13.837Z|74520|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18630
> ******
>
> The OpenStack CI tests for networking-ovn is frequently failing few tests after this
> commit. The failure seems to be related to timing issues as ovn-northd is hogging
> the CPU continuously. We are also seeing travis CI test failures after this commit.
>
> [1] - ed198fb3b92e
>
> Fixes: ed198fb3b92e("ovn: Send GARP for the router ports with reside-on-redirect-chassis options set")
> Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> ---
>   ovn/northd/ovn-northd.c | 19 ++++++++++---------
>   1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index ce382ac89..127227712 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -2530,13 +2530,6 @@ ovn_port_update_sbrec(struct northd_context *ctx,
>                   }
>               }
>   
> -            sbrec_port_binding_set_nat_addresses(op->sb,
> -                                                 (const char **) nats, n_nats);
> -            for (size_t i = 0; i < n_nats; i++) {
> -                free(nats[i]);
> -            }
> -            free(nats);
> -
>               /* Add the router mac and IPv4 addresses to
>                * Port_Binding.nat_addresses so that GARP is sent for these
>                * IPs by the ovn-controller on which the distributed gateway
> @@ -2578,10 +2571,18 @@ ovn_port_update_sbrec(struct northd_context *ctx,
>                                     op->peer->od->l3redirect_port->json_key);
>                   }
>   
> -                sbrec_port_binding_update_nat_addresses_addvalue(
> -                    op->sb, ds_cstr(&garp_info));
> +                n_nats++;
> +                nats = xrealloc(nats, (n_nats * sizeof *nats));
> +                nats[n_nats - 1] = ds_steal_cstr(&garp_info);
>                   ds_destroy(&garp_info);
>               }
> +
> +            sbrec_port_binding_set_nat_addresses(op->sb,
> +                                                 (const char **) nats, n_nats);
> +            for (size_t i = 0; i < n_nats; i++) {
> +                free(nats[i]);
> +            }
> +            free(nats);
>           }
>   
>           sbrec_port_binding_set_parent_port(op->sb, op->nbsp->parent_name);

Works for me.  With the subject line fixed up I think it's fine.
Tested-by: Greg Rose <gvrose8192 at gmail.com>
Reviewed-by: Greg Rose <gvrose8192 at gmail.com>




More information about the dev mailing list