[ovs-discuss] ovn-controller and northd trashing 100% cpu due to l3 logical flow update2->transaction->update2->...

Flaviof flavio at flaviof.com
Mon Jul 18 02:25:50 UTC 2016


On Sun, Jul 17, 2016 at 10:00 PM, Flaviof <flavio at flaviof.com> wrote:

> Hi folks,
>
> This could be that I'm configuring something wrong, but I consistently get
> my test VM setup
> spinning at 100% CPU utilization after doing the following config:
>
>    3 VM: db, compute1, compute2
>    2 ls, each with 1 lsp, 1 lr that has logical ports on both ls
>
>

Minor update: I took out the compute nodes, since I can make this issue
happen w/out creating
any OVS ports. I can clearly see that this is an issue in northd (or a
config mistake on my part).

   https://gist.github.com/a81d48ccf6a1d3ebe120a5ecc160f287

I'm looking forward to your thoughts on this. ;)

-- flaviof



> https://gist.github.com/a5547b0b98a9e29f6e52b7142072b905
>
> # Create a logical switches "ls1" and "ls2".
> sudo ovn-nbctl ls-add ls1
> sudo ovn-nbctl ls-add ls2
>
> # Create logical port on "ls1" and "ls2".
> sudo ovn-nbctl lsp-add ls1 ls1-port1
> sudo ovn-nbctl lsp-add ls2 ls2-port1
>
> # Set a MAC address for each of the two logical ports.
> sudo ovn-nbctl lsp-set-addresses ls1-port1 00:00:00:00:00:01
> sudo ovn-nbctl lsp-set-addresses ls2-port1 00:00:00:00:00:02
>
> # Set up port security for the two logical ports.
> sudo ovn-nbctl lsp-set-port-security ls1-port1 00:00:00:00:00:01
> sudo ovn-nbctl lsp-set-port-security ls2-port1 00:00:00:00:00:02
>
> # Add a logical router, so 1.0.0.1 can reach 2.0.0.1
> sudo ovn-nbctl lr-add lr0
>
> sudo ovn-nbctl lrp-add lr0 lrp1 00:00:00:01:00:01 1.0.0.2/24
> peer=lrp1-attachment
> sudo ovn-nbctl -- lsp-add ls1 lrp1-attachment \
>                -- set Logical_Switch_Port lrp1-attachment \
>                   type=router \
>                   options:router-port=lrp1 \
>                   addresses='"00:00:00:01:00:01 1.0.0.2"'
>
> sudo ovn-nbctl lrp-add lr0 lrp2 00:00:00:01:00:02 2.0.0.2/24
> peer=lrp2-attachment
> sudo ovn-nbctl -- lsp-add ls2 lrp2-attachment \
>                -- set Logical_Switch_Port lrp2-attachment \
>                   type=router \
>                   options:router-port=lrp2 \
>                   addresses='"00:00:00:01:00:02 2.0.0.2"'
>
> Note that I make this happen even w/out creating any OVS ports in the
> compute nodes.
> The logs are available here [1], but I observe that northd and
> onv-controllers
> appear to be reacting to ovsdb db update2, and that is causing this
> vicious cycle by
> northd [2]:
>
> central/db:  https://gist.github.com/62acf7b41860b3ed510e1f7802677264
> c1: https://gist.github.com/6556f6fc30656a7d5dd6f7d3051173d5
> c2: https://gist.github.com/9437c2b0b5fbd64c7af14d196e0cf50f
>
> Output of flows and client dumps:
>
> central: https://gist.github.com/63eb195977a28d790d74cb0cf500c72d
> c1: https://gist.github.com/12fcb1dfe091d5cbcc53b657f67f080a
> c2: https://gist.github.com/87dffff7ccb6de9231031e72606e68b8
>
> Any ideas/suggestion on further debugging this? Maybe you see something
> I'm doing wrong?
>
> Thanks,
>
> -- flaviof
>
> [1]:
> https://www.dropbox.com/sh/r5neb4nfdyktgi9/AACRGmtKZNPky1EE4QRw4CwRa?dl=0
>
> [2]:
>
> central:
> 2016-07-18T01:26:28.147Z|25613|poll_loop|DBG|wakeup due to [POLLIN] on fd
> 12 (<->/var/run/openvswitch/ovnsb_db.sock) at lib/stream-fd.c:155 (47% CPU
> usage)
> 2016-07-18T01:26:28.147Z|25614|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.sock:
> received notification, method="update2",
> params=[null,{"Logical_Flow":{"e3651924-6a8c-4bc2-9696-be12719b52d9":{"delete":null},"f5b5091e-23ee-4ce8-8b9f-8e2630901bd6":{"insert":{"match":"outport
> == \"lrp2-attachment\" && reg0 ==
> 2.0.0.2","pipeline":"ingress","priority":100,"logical_datapath":["uuid","39e023cc-594a-43a1-8934-ff530d47602c"],"table_id":5,"actions":"eth.dst
> = 00:00:00:01:00:02;
> next;"}},"7a6b2ce7-f3f9-486b-a094-15eadd331b4c":{"insert":{"match":"outport
> == \"lrp1-attachment\" && reg0 ==
> 1.0.0.2","pipeline":"ingress","priority":100,"logical_datapath":["uuid","f3d761aa-4350-400c-9472-43efe9a81cc5"],"table_id":5,"actions":"eth.dst
> = 00:00:00:01:00:01;
> next;"}},"0927e555-45eb-4c8d-9ad7-22a14545e31f":{"delete":null}}}]
> 2016-07-18T01:26:28.147Z|25615|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.sock:
> received reply,
> result=[{"count":1},{"uuid":["uuid","f5b5091e-23ee-4ce8-8b9f-8e2630901bd6"]},{"count":1},{"uuid":["uuid","7a6b2ce7-f3f9-486b-a094-15eadd331b4c"]}],
> id=5049
> 2016-07-18T01:26:28.147Z|25616|poll_loop|DBG|wakeup due to 0-ms timeout at
> lib/ovsdb-idl.c:3505 (47% CPU usage)
> 2016-07-18T01:26:28.148Z|25617|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.sock:
> send request, method="transact",
> params=["OVN_Southbound",{"uuid-name":"rowae010ada_2573_4d58_babf_7cacdd193683","row":{"pipeline":"ingress","match":"outport
> == \"lrp1-attachment\" && reg0 ==
> 1.0.0.2","priority":100,"logical_datapath":["uuid","f3d761aa-4350-400c-9472-43efe9a81cc5"],"table_id":5,"actions":"eth.dst
> = 00:00:00:01:00:01;
> next;","external_ids":["map",[["stage-name","lr_in_arp_resolve"]]]},"op":"insert","table":"Logical_Flow"},{"where":[["_uuid","==",["uuid","f5b5091e-23ee-4ce8-8b9f-8e2630901bd6"]]],"op":"delete","table":"Logical_Flow"},{"uuid-name":"row9ace491f_7392_49ae_af90_c8981d5332d4","row":{"pipeline":"ingress","match":"outport
> == \"lrp2-attachment\" && reg0 ==
> 2.0.0.2","priority":100,"logical_datapath":["uuid","39e023cc-594a-43a1-8934-ff530d47602c"],"table_id":5,"actions":"eth.dst
> = 00:00:00:01:00:02;
> next;","external_ids":["map",[["stage-name","lr_in_arp_resolve"]]]},"op":"insert","table":"Logical_Flow"},{"where":[["_uuid","==",["uuid","7a6b2ce7-f3f9-486b-a094-15eadd331b4c"]]],"op":"delete","table":"Logical_Flow"}],
> id=5050
> ===
> c1:
> 2016-07-18T01:26:24.343Z|05024|jsonrpc|DBG|tcp:192.168.33.11:6642:
> received notification, method="update2",
> params=[null,{"Logical_Flow":{"0a3e67e2-fcbd-41e9-936a-f88455f170e4":{"insert":{"match":"outport
> == \"lrp2-attachment\" && reg0 ==
> 2.0.0.2","pipeline":"ingress","priority":100,"logical_datapath":["uuid","39e023cc-594a-43a1-8934-ff530d47602c"],"table_id":5,"external_ids":["map",[["stage-name","lr_in_arp_resolve"]]],"actions":"eth.dst
> = 00:00:00:01:00:02;
> next;"}},"d75e17c9-f71b-4b61-a792-bf0f79f58acf":{"delete":null},"fa8c8b4e-ec34-47a5-ae28-1ec78573aa04":{"insert":{"match":"outport
> == \"lrp1-attachment\" && reg0 ==
> 1.0.0.2","pipeline":"ingress","priority":100,"logical_datapath":["uuid","f3d761aa-4350-400c-9472-43efe9a81cc5"],"table_id":5,"external_ids":["map",[["stage-name","lr_in_arp_resolve"]]],"actions":"eth.dst
> = 00:00:00:01:00:01;
> next;"}},"dc541d60-0969-4da5-83df-cad58e541fa0":{"delete":null}}}]
> ===
> c2:
> 2016-07-18T01:26:25.884Z|06227|jsonrpc|DBG|tcp:192.168.33.11:6642:
> received notification, method="update2",
> params=[null,{"Logical_Flow":{"9ba59604-cbc8-489e-8c09-b67625df8f1e":{"delete":null},"25063148-af81-4242-883c-b3cc729488ec":{"insert":{"match":"outport
> == \"lrp1-attachment\" && reg0 ==
> 1.0.0.2","pipeline":"ingress","priority":100,"logical_datapath":["uuid","f3d761aa-4350-400c-9472-43efe9a81cc5"],"table_id":5,"external_ids":["map",[["stage-name","lr_in_arp_resolve"]]],"actions":"eth.dst
> = 00:00:00:01:00:01;
> next;"}},"8d0bff01-9672-4dc7-a761-41fa840b86af":{"delete":null},"7983f3e5-9808-457e-9ad3-23c390dd419b":{"insert":{"match":"outport
> == \"lrp2-attachment\" && reg0 ==
> 2.0.0.2","pipeline":"ingress","priority":100,"logical_datapath":["uuid","39e023cc-594a-43a1-8934-ff530d47602c"],"table_id":5,"external_ids":["map",[["stage-name","lr_in_arp_resolve"]]],"actions":"eth.dst
> = 00:00:00:01:00:02; next;"}}}}]
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20160717/cfcf21f4/attachment-0002.html>


More information about the discuss mailing list