[ovs-discuss] [External] : Re: /etc/openvswitch/conf.db filling up with lost of "ovn-controller: modifying OVS tunnels" updates

Brendan Doyle brendan.doyle at oracle.com
Tue Nov 9 10:49:12 UTC 2021



On 28/10/2021 16:41, Numan Siddique wrote:
> On Thu, Oct 28, 2021 at 5:20 AM Brendan Doyle <brendan.doyle at oracle.com> wrote:
>> Numan,
>>
>> Just wondering if you got  a chance to look at those logs?
> I looked into the logs,  and as I had mentioned earlier you need this
> fix - https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXTrxBXf8$
>
> Please let me know if you still see this issue with the latest OVN or
> with the version of OVN which has this fix.
> This fix is available from OVN 21.03 and onwards.

So we have verified with OVN 21.09.0 and OVS 2.16.90 that this does 
indeed fix the
"runaway" conf.db issue.

But I have a question, I still see conf.db growing over time, though in 
much smaller
increments. Does this DB ever reduce? I mean at present my NB is empty 
and a very
small SB (see below). Yet my chassis conf.db's steadily grew from a few 
KB to over a
MB whilst I created/deleted various switches, routers and gateways for 
example:

ls -lh /etc/openvswitch/conf.db
-rw-r-----. 1 root root 1.6M Nov  9 10:41 /etc/openvswitch/conf.db

Will this file just always grow and grow, and eventually get to GBs. Or does
it ever reduce when switches, gateways and routers are deleted?


#ovn-nbctl show
#


# ovn-sbctl show
Chassis "b4ba6c5b-0c85-4db7-adeb-a722ced41fac"
     hostname: pcamn01
     Encap geneve
         ip: "253.255.0.33"
         options: {csum="true"}
Chassis pcacn005
     hostname: pcacn005
     Encap geneve
         ip: "253.255.2.68"
         options: {csum="true"}
Chassis pcacn002
     hostname: pcacn002.ovca2.us.oracle.com
     Encap geneve
         ip: "253.255.2.65"
         options: {csum="true"}
Chassis "48aa3fd0-1d0f-4c6a-a444-64e96a80ce72"
     hostname: pcamn03
     Encap geneve
         ip: "253.255.0.35"
         options: {csum="true"}
Chassis pcacn001
     hostname: pcacn001.ovca2.us.oracle.com
     Encap geneve
         ip: "253.255.2.64"
         options: {csum="true"}
Chassis pcacn003
     hostname: pcacn003.ovca2.us.oracle.com
     Encap geneve
         ip: "253.255.2.66"
         options: {csum="true"}
Chassis "07eecb27-cd68-46d6-83ff-cb1bb84b85f3"
     hostname: pcamn02
     Encap geneve
         ip: "253.255.0.34"
         options: {csum="true"}

>
> Thanks
> Numan
>
>> Thanks
>>
>> Brendan
>>
>> On 27/10/2021 11:25, Brendan Doyle wrote:
>>
>> Hi,
>>
>> I finally got some debug logs, truncated after the failure occurs, the truncated entries just
>> are repeated updates of the same entry.
>>
>> So some more light on this, It seems this is a timing issue. The test being run involves
>> creating  a number of Logical switches (LS), Routers (LR) and Distributed Router Port
>> gateways (DR). And then immediately deleting them, with the last created DR being
>> deleted first. Our CMs is using the ovsdbapp python lib to do this.
>>
>> So it occurs to me that perhaps the objects get created in NB, but before they have been
>> propagated to SB and to the HV chassis, we get the delete, and this causes updates to
>> be sent to the chassis for a logical port that does not exist? Just a hypothesis.
>>
>> The ovn-nbctl has synchronization flags (--wait) to guard against such behavior, does
>> ovsdbapp I wonder?
>>
>> In any-case the test fails (we see a runaway conf.db) pretty regularly, but not every time.
>> The failure is always observed on the delete operations. If I put a delay after create and
>> before delete, then we don't see the failure.
>>
>> If anyone can shed light on this from the logs would be much appreciated.
>>
>> Thanks
>>
>> Brendan
>>
>>
>>
>>
>>
>>
>>
>> On 26/10/2021 17:11, Brendan Doyle wrote:
>>
>>
>>
>> On 26/10/2021 15:50, Numan Siddique wrote:
>>
>> On Tue, Oct 26, 2021 at 8:20 AM Brendan Doyle <brendan.doyle at oracle.com> wrote:
>>
>> Hi,
>>
>>
>> So what is very odd here, is that I have used ovn-nbctl to delete the NB
>> config, so
>> # ovn-nbctl show
>> # ovn-sbctl lflow-list
>>
>> Yet I still see /etc/openvswitch/conf.db growing with updates for
>> Logical switch ports that no longer exist!
>>
>> "],["ct-zone-ln-ls_vcn9195577_external_ugw","220"],["ct-zone-ln-ls_vcn9206002_external_igw","110"],["ct-zone-ln-ls_vcn9210052_external_igw","110"],["ct-zone-ln-ls_vcn9232395_external_ugw","75"],["ct-zone-ln-ls_vcn9236987_external_igw","110"],["ct-zone-ln-ls_vcn9236987_external_ugw","78"],["ct-zone-ln-ls_vcn9255861_external_igw","118"],["ct-zone-ln-ls_vcn9255861_external_ugw","100"],["ct-zone-ln-ls_vcn9319435_external_igw","87"],["ct-zone-ln-ls_vcn9352502_external_igw","40"],["ct-zone-ln-ls_vcn9402504_external_ugw","99"],["ct-zone-ln-ls_vcn9403404_external_igw","133"],["ct-zone-ln-ls_vcn9403404_external_ugw","114"],["ct-zone-ln-ls_vcn9461566_external_ugw","191"],["ct-zone-ln-ls_vcn9480000_external_igw","254"],["ct-zone-ln-ls_vcn9480000_external_ugw","236"],["ct-zone-ln-ls_vcn9492134_external_igw","262"],["ct-zone-ln-ls_vcn9523503_external_igw","207"],["ct-zone-ln-ls_vcn9542102_external_igw","133"],["ct-zone-ln-ls_vcn9542102_external_ugw","115"],["ct-zone-ln-ls_vcn9559658_external_igw","125"],["ct-zone-ln-ls_vcn9559658_external_ugw","78"],["ct-zone-ln-ls_vcn9594034_external_igw","49"],["ct-zone-ln-ls_vcn9619021_external_igw","133"],["ct-zone-ln-ls_vcn9634773_external_igw","292"],["ct-zone-ln-ls_vcn9649169_external_igw","132"],["ct-zone-ln-ls_vcn9649169_external_ugw","110"],["ct-zone-ln-ls_vcn9661290_external_ugw","78"],["ct-zone-ln-ls_vcn9734192_external_ugw","114"],["ct-zone-ln-ls_vcn9774252_external_igw","262"],["ct-zone-ln-ls_vcn9796262_external_igw","72"],["ct-zone-ln-ls_vcn9796262_external_ugw","54"],["ct-zone-ln-ls_vcn9805903_external_igw","147"],["ct-zone-ln-ls_vcn9805903_external_ugw","126"],["ct-zone-ln-ls_vcn9809895_external_igw","246"],["ct-zone-ln-ls_vcn9812576_external_ugw","78"],["ct-zone-ln-ls_vcn9834728_external_igw","110"],["ct-zone-ln-ls_vcn9886683_external_ugw","114"],["ct-zone-ln-ls_vcn9903419_external_ugw","235"],["ct-zone-ln-ls_vcn9917510_external_igw","56"],["ct-zone-ln-ls_vcn9917510_external_ugw","38"]]]}},"_comment":"ovn-controller:
>> modifying OVS tunnels 'pcacn001'"}
>>
>> A shortened version of one entry Could it be that switch ports must be
>> deleted before
>> deleting the switch? I was under the impression once a switch is deleted
>> it's ports get deleted?
>>
>> Yes.  If you delete the switch,  the switch ports get deleted too.
>>
>> After deleting the logical switch (or switch ports) do you see them to
>> be deleted by
>> ovn-northd in SB DB ?
>>
>> Run - ovn-sbctl list port_binding <deleted_port>
>> or/and
>>
>> ovn-sbctl list datapath_binding <deleted_lswitch>
>>
>> I'd suggest you enable jsonrpc debug in ovn-controller and see what's happening.
>> It would be helpful if you can share the ovn-controller debug logs.
>>
>> ovn-appctl -t ovn-controller vlog/set jsonrpc:dbg
>>
>>
>>
>> So in my test I create a simple network then delete it so NB DB and SB DB
>> are empty.
>>
>> # ovn-sbctl list port_binding
>> # ovn-sbctl list datapath_binding
>> #
>>
>> The network has a number of LS's and LR's and two Distributed Router (DR) ports (on
>> separate LRs).  When I just create one DR all seems fine, but when I add the second into
>> the mix I get a runaway openvswitch/conf.db but NOT on all chassis. I  have 4 chassis
>> that I can schedule  the DR ports to. In this latest test I observed  the runaway conf.db
>> on pcacn003 & pcacn005. The logs are too large to send in email, is there an ftp server
>> that I can upload to?
>>
>> I will redo with debug  enabled and collect updated logs. The conf.db on both pcacn003 &
>> pcacn005 is several GBs.
>>
>>
>> The only way to recover is to stop the OVS/OVN procs, then delete /etc/openvswitch/conf.db
>> and restart them.
>>
>> Brendan
>>
>>
>>
>>
>> Thanks
>> Numan
>>
>>
>> switch 712757c3-2481-4f8b-940c-05dc13ce37a5 (ls_vcn9319435_external_ugw)
>>        port ls_vcn9319435_external_ugw-lr_vcn9319435
>>            type: router
>>            router-port: lr_vcn9319435-ls_vcn9319435_external_ugw
>>        port ln-ls_vcn9319435_external_ugw
>>            type: localnet
>>            addresses: ["unknown"]
>>
>> router 80c281af-319b-416b-8a17-0ce7b8901bb1 (lr_vcn9319435)
>>        port lr_vcn9319435-ls_vcn9319435_external_ugw
>>            mac: "00:13:97:88:31:90"
>>            networks: ["253.255.80.4/16"]
>>            gateway chassis: [pcacn002 pcacn003 pcacn001]
>>        port lr_vcn9319435-lsb_vcn9319435
>>            mac: "00:13:97:d4:26:ec"
>>            networks: ["253.255.29.2/25"]
>>        nat 6c87050f-cd27-423e-815e-deda74bd9bc6
>>            external ip: "253.255.80.4"
>>            logical ip: "10.221.0.0/16"
>>            type: "snat"
>>
>> Do each port have to be deleted or is it ok to just delete the switch
>> and router?
>>
>> Brendan
>>
>> On 25/10/2021 16:10, Brendan Doyle wrote:
>>
>>
>> On 25/10/2021 15:08, Numan Siddique wrote:
>>
>> On Fri, Oct 22, 2021 at 9:30 AM Brendan Doyle
>> <brendan.doyle at oracle.com> wrote:
>>
>> Hi,
>>
>>
>> Looking at /etc/openvswitch/conf.db I see it getting very large:
>>
>> [root at pcacn001 ~]#  ls -l /etc/openvswitch/conf.db
>> -rw-r--r--. 1 root root 6069248828 Oct 22 11:55
>> /etc/openvswitch/conf.db
>>
>> And has lots and lots (mostly)  "ovn-controller: modifying OVS tunnels"
>> updates entries, like below.
>> What are these? it does not seem normal?
>> OVSDB JSON 4687 00e8788dd5d9af2aac5ca7724759017c52ddd580
>> {"_date":1634903752117,"Bridge":{"745726c4-0451-4f52-a52b-1f9c5e85c703":{"external_ids":["map",[["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_dnat","4"],["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_snat","1"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_dnat","4"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_snat","5"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_dnat","17"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_snat","7"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_dnat","9"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_snat","10"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_dnat","3"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_snat","4"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_dnat","19"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_snat","18"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_dnat","10"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_snat","16"],["ct-zone-423896cb-5573-4c54-b6e2-38f192eacae3_dnat","9"],["ct-zone-423896cb-5573
>>
>> -4c54-b6e2-38f192eacae3_snat","12"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_dnat","10"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_snat","11"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_dnat","2"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_snat","5"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_dnat","18"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_snat","3"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_dnat","12"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_snat","11"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_dnat","10"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_snat","12"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_dnat","1"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_snat","11"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_dnat","2"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_snat","13"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_dnat","11"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_snat","12"],["ct-zone-9498aca9-762
>>
>> 3-4ce0-a0ff-d4d5c17d7223_dnat","19"],["ct-zone-9498aca9-7623-4ce0-a0ff-d4d5c17d7223_snat","15"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_dnat","18"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_snat","17"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_dnat","13"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_snat","10"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_dnat","8"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_snat","14"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_dnat","12"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_snat","11"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_dnat","13"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_snat","10"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_dnat","9"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_snat","1"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_dnat","16"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_snat","15"],["ct-zone-cfa46699-cc79-445e-a902-f1e37ff99806_dnat","5"],["ct-zone-cfa46699-c
>>
>> c79-445e-a902-f1e37ff99806_snat","2"],["ct-zone-cr-lr_vcn0747157-ls_vcn0747157_external_ugw","9"],["ct-zone-cr-lr_vcn1645571_igw-ls_vcn1645571_external_igw","21"],["ct-zone-cr-lr_vcn7319607-ls_vcn7319607_external_ugw","14"],["ct-zone-cr-lr_vcn7319607_igw-ls_vcn7319607_external_igw","21"],["ct-zone-cr-lr_vcn7395327_igw-ls_vcn7395327_external_igw","21"],["ct-zone-cr-lr_vcn9567153-ls_vcn9567153_external_ugw","1"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_dnat","9"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_snat","8"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_dnat","3"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_snat","6"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_dnat","12"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_snat","13"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_dnat","15"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_snat","14"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_dnat","16"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_snat","17
>>
>> "],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_dnat","6"],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_snat","7"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_dnat","13"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_snat","11"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_dnat","20"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_snat","19"],["ct-zone-ln-ls_vcn6603036_external_ugw","7"],["ct-zone-ln-ls_vcn7319607_external_igw","20"],["ct-zone-ln-ls_vcn7395327_external_ugw","7"],["ct-zone-ln-ls_vcn7836024_external_igw","20"],["ct-zone-ln-ls_vcn9567153_external_igw","21"],["ct-zone-ln-ls_vcn9567153_external_ugw","8"]]]}},"_comment":"ovn-controller:
>>
>> modifying OVS tunnels 'pcacn001'"}
>>
>> In which OVN version are you seeing this ?
>>
>> ovs-vsctl -V
>> ovs-vsctl (Open vSwitch) 2.14.0_r0.0.0
>> DB Schema 8.2.0
>> # ovn-nbctl -V
>> ovn-nbctl 20.09.0_r1.0.0
>> Open vSwitch Library 2.14.0
>> DB Schema 5.27.0
>>
>>
>>
>> I wonder if you're seeing this issue -
>> https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgReUsdtEU$
>>
>> Have to step out for a bit will look at this when I can
>> What I can say is that we are using ovsdbapp to configure central, and
>> I see /etc/openvswitch/conf.db
>>
>> getting up to several Gb! so much so that systemd times out when you
>> try start the service using it.
>> I am also seeing ovs-vswitchd getting a SEGV on a regular basis which
>> I think is related.
>> I wondering if this patch might help
>>
>> [External] : Re: [ovs-dev] [PATCH branch-2.14] python:
>>                 idl: Avoid sending transactions when the DB is not synced
>>                 up.
>>
>> I'm not sure.   /etc/openvswitch/conf.db is the local ovsdb-server database
>> and not the OVN database.
>>
>> Numan
>>
>> If you run a tail on /etc/openvswitch/conf.db, do you see the ct zone
>> ids toggling between 2 values constantly ?
>>
>> Thanks
>> Numan
>>
>> Thanks
>>
>> Brendan
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgR-G4xGfo$
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!cR934SfxrIJu507dsVUIyZ7JHH9WWkNjqT4uWiSsnnfk72lkytha0jMrSq39KbktpyU$
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!aXU0ishuScB8BUBe7ocXxXDlPWZCYdhri_dfVWZN8rSI68YA6J3XGRVlo1SQy9umVfs$
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!c1HxNgHI2KosY03K_FFa5GpfOez9mAgB_8fm8G8Z-hCxG9RpSlq-pE8OO1R0lILyU-k$
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!fD4xiCtsxdVfl4DnJx7GuPacUj3Tt3j19-f571D1i2v_sJfL7xvt0W_aJeZva9Y7nh8$
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXJ6POWWk$



More information about the discuss mailing list