[ovs-discuss] [ovs-dev] ovsdb-server core dump and ovsdb corruption using raft cluster

Girish Moodalbail gmoodalbail at gmail.com
Wed Aug 1 00:39:19 UTC 2018


Hello Ben/Guru,

Wanted to check if you were able to reproduce the issue on your end, and
whether you guys needed any more info from me.
If you guys have any patch, then we are more than happy to verify it.

regards,
~Girish

On Thu, Jul 26, 2018 at 11:14 PM, Girish Moodalbail <gmoodalbail at gmail.com>
wrote:

> Hello Ben,
>
> Sorry, got distracted with something else at work. I am still able to
> reproduce the issue, and this is what I have and what I did
> (if you need the core, let me know and I can share it with you)
>
> - 3-cluster RAFT setup in Ubuntu VM (2 VCPUs with 8GB RAM)
>   $ uname -r
>   Linux u1804-HVM-domU 4.15.0-23-generic #25-Ubuntu SMP Wed May 23
> 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> - On all of the VMs, I have installed openvswitch-switch=2.9.2,
> openvswitch-dbg=2.9.2, and ovn-central=2.9.2
>   (all of these packages are from http://packages.wand.net.nz/)
>
> - I bring up the node in the cluster one after the other -- leader 1st and
> followed by two followers
> - I check for cluster status and everything is healthy
> - ovn-nbctl show and ovn-sbctl show is all empty
>
> - on the leader with OVN_NB_DB set to comma-separated-NB connection
> strings I did
>    for i in `seq 1 50`; do ovn-nbclt ls-add ls$i; ovn-nbctl lsp-add ls$i
> port0_$i; done
>
> - Check for the presence of 50 logical switches and 50 logical ports (one
> on each switch). Compact the database on all the nodes.
>
> - Next I try to delete the ports and whilst the deletion is happening I
> run compact on one of the followers
>
>   leader_node# for i in `seq  1 50`; do ovn-nbctl lsp-del port0_$i;done
>   follower_node# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl
> ovsdb-server/compact OVN_Northbound
>
> - On the follower node I see the crash:
>
> ● ovn-central.service - LSB: OVN central components
>    Loaded: loaded (/etc/init.d/ovn-central; generated)
>    Active: active (running) since Thu 2018-07-26 22:48:53 PDT; 19min ago
>      Docs: man:systemd-sysv-generator(8)
>   Process: 21883 ExecStop=/etc/init.d/ovn-central stop (code=exited,
> status=0/SUCCESS)
>   Process: 21934 ExecStart=/etc/init.d/ovn-central start (code=exited,
> status=0/SUCCESS)
>     Tasks: 10 (limit: 4915)
>    CGroup: /system.slice/ovn-central.service
>            ├─22047 ovsdb-server: monitoring pid 22134 (*1 crashes: pid
> 22048 died, killed (Aborted), core dumped*
>            ├─22059 ovsdb-server: monitoring pid 22060 (healthy)
>            ├─22060 ovsdb-server -vconsole:off -vfile:info
> --log-file=/var/log/openvswitch/ovsdb-server-sb.log -
>            ├─22072 ovn-northd: monitoring pid 22073 (healthy)
>            ├─22073 ovn-northd -vconsole:emer -vsyslog:err -vfile:info
> --ovnnb-db=tcp:10.0.7.33:6641,tcp:10.0.7.
>            └─22134 ovsdb-server -vconsole:off -vfile:info
> --log-file=/var/log/openvswitch/ovsdb-server-nb.log
>
>
> Same call trace and reason:
>
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f79599a1801 in __GI_abort () at abort.c:79
> #2  0x00005596879c017c in json_serialize (json=<optimized out>,
> s=<optimized out>) at ../lib/json.c:1554
> #3  0x00005596879c01eb in json_serialize_object_member (i=<optimized out>,
> s=<optimized out>, node=<optimized out>, node=<optimized out>) at
> ../lib/json.c:1583
> #4  0x00005596879c0132 in json_serialize_object (s=0x7ffc17013bf0,
> object=0x55968993dcb0) at ../lib/json.c:1612
> #5  json_serialize (json=<optimized out>, s=0x7ffc17013bf0) at
> ../lib/json.c:1533
> #6  0x00005596879c249c in json_to_ds (json=json at entry=0x559689950670,
> flags=flags at entry=0, ds=ds at entry=0x7ffc17013c80) at ../lib/json.c:1511
> #7  0x00005596879ae8df in ovsdb_log_compose_record (json=json at entry=0x559689950670,
> magic=0x55968993dc60 "CLUSTER", header=header at entry=0x7ffc17013c60,
>     data=data at entry=0x7ffc17013c80) at ../ovsdb/log.c:570
> #8  0x00005596879aebbf in ovsdb_log_write (file=0x5596899b5df0,
> json=0x559689950670) at ../ovsdb/log.c:618
> #9  0x00005596879aed3e in ovsdb_log_write_and_free (log=log at entry=0x5596899b5df0,
> json=0x559689950670) at ../ovsdb/log.c:651
> #10 0x00005596879b0954 in raft_write_snapshot (raft=raft at entry=0x5596899151a0,
> log=0x5596899b5df0, new_log_start=new_log_start at entry=166,
>     new_snapshot=new_snapshot at entry=0x7ffc17013e30) at
> ../ovsdb/raft.c:3588
> #11 0x00005596879b0ec3 in raft_save_snapshot (raft=raft at entry=0x5596899151a0,
> new_start=new_start at entry=166, new_snapshot=new_snapshot@
> entry=0x7ffc17013e30)
>     at ../ovsdb/raft.c:3647
> #12 0x00005596879b8aed in raft_store_snapshot (raft=0x5596899151a0,
> new_snapshot_data=new_snapshot_data at entry=0x5596899505f0) at
> ../ovsdb/raft.c:3849
> #13 0x00005596879a579e in ovsdb_storage_store_snapshot__
> (storage=0x5596899137a0, schema=0x559689938ca0, data=0x559689946ea0) at
> ../ovsdb/storage.c:541
> #14 0x00005596879a625e in ovsdb_storage_store_snapshot
> (storage=0x5596899137a0, schema=schema at entry=0x559689938ca0,
> data=data at entry=0x559689946ea0) at ../ovsdb/storage.c:568
> #15 0x000055968799f5ab in ovsdb_snapshot (db=0x5596899137e0) at
> ../ovsdb/ovsdb.c:519
> #16 0x0000559687999f23 in ovsdb_server_compact (conn=0x559689938440,
> argc=<optimized out>, argv=<optimized out>, dbs_=0x7ffc170141c0) at
> ../ovsdb/ovsdb-server.c:1443
> #17 0x00005596879d9cc0 in process_command (request=<optimized out>,
> conn=0x559689938440) at ../lib/unixctl.c:315
> #18 run_connection (conn=0x559689938440) at ../lib/unixctl.c:349
> #19 unixctl_server_run (server=0x559689937370) at ../lib/unixctl.c:400
> #20 0x0000559687996e1e in main_loop (is_backup=0x7ffc1701412e,
> exiting=0x7ffc1701412f, run_process=0x0, remotes=0x7ffc17014180,
> unixctl=0x559689937370, all_dbs=0x7ffc170141c0,
>     jsonrpc=0x559689915120, config=0x7ffc170141e0) at
> ../ovsdb/ovsdb-server.c:201
> #21 main (argc=<optimized out>, argv=<optimized out>) at
> ../ovsdb/ovsdb-server.c:457
>
> Thanks,
> ~Girish
>
>
>
> On Wed, Jul 25, 2018 at 3:06 PM, Ben Pfaff <blp at ovn.org> wrote:
>
>> On Wed, Jul 18, 2018 at 10:48:08AM -0700, Girish Moodalbail wrote:
>> > Hello all,
>> >
>> > We are able to reproduce this issue on OVS 2.9.2 at will. The OVSDB NB
>> > server or OVSDB SB server dumps core while it is trying to compact the
>> > database.
>> >
>> > You can reproduce the issue by using:
>> >
>> > root at u1804-HVM-domU:/var/crash# ovs-appctl -t
>> > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact OVN_Southbound
>> >
>> > 2018-07-18T17:34:29Z|00001|unixctl|WARN|error communicating with
>> > unix:/var/run/openvswitch/ovnsb_db.ctl: End of file
>> > ovs-appctl: /var/run/openvswitch/ovnsb_db.ctl: transaction error (End
>> of
>> > file)
>>
>> Hmm.  I've now spent some time playing with clustered OVSDB, in 3-server
>> and 5-server configurations, and triggering compaction at various points
>> while starting and stopping servers.  But I haven't yet managed to
>> trigger this crash.
>>
>> Is there anything else that seems to be an important element?
>>
>> Thanks,
>>
>> Ben.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180731/fb055a53/attachment-0001.html>


More information about the discuss mailing list