[ovs-discuss] [ovs-dev] ovsdb-server core dump and ovsdb corruption using raft cluster

Girish Moodalbail gmoodalbail at gmail.com
Fri Jul 27 06:14:59 UTC 2018


Hello Ben,

Sorry, got distracted with something else at work. I am still able to
reproduce the issue, and this is what I have and what I did
(if you need the core, let me know and I can share it with you)

- 3-cluster RAFT setup in Ubuntu VM (2 VCPUs with 8GB RAM)
  $ uname -r
  Linux u1804-HVM-domU 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 18:02:16
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

- On all of the VMs, I have installed openvswitch-switch=2.9.2,
openvswitch-dbg=2.9.2, and ovn-central=2.9.2
  (all of these packages are from http://packages.wand.net.nz/)

- I bring up the node in the cluster one after the other -- leader 1st and
followed by two followers
- I check for cluster status and everything is healthy
- ovn-nbctl show and ovn-sbctl show is all empty

- on the leader with OVN_NB_DB set to comma-separated-NB connection strings
I did
   for i in `seq 1 50`; do ovn-nbclt ls-add ls$i; ovn-nbctl lsp-add ls$i
port0_$i; done

- Check for the presence of 50 logical switches and 50 logical ports (one
on each switch). Compact the database on all the nodes.

- Next I try to delete the ports and whilst the deletion is happening I run
compact on one of the followers

  leader_node# for i in `seq  1 50`; do ovn-nbctl lsp-del port0_$i;done
  follower_node# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl
ovsdb-server/compact OVN_Northbound

- On the follower node I see the crash:

● ovn-central.service - LSB: OVN central components
   Loaded: loaded (/etc/init.d/ovn-central; generated)
   Active: active (running) since Thu 2018-07-26 22:48:53 PDT; 19min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 21883 ExecStop=/etc/init.d/ovn-central stop (code=exited,
status=0/SUCCESS)
  Process: 21934 ExecStart=/etc/init.d/ovn-central start (code=exited,
status=0/SUCCESS)
    Tasks: 10 (limit: 4915)
   CGroup: /system.slice/ovn-central.service
           ├─22047 ovsdb-server: monitoring pid 22134 (*1 crashes: pid
22048 died, killed (Aborted), core dumped*
           ├─22059 ovsdb-server: monitoring pid 22060 (healthy)
           ├─22060 ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-sb.log -
           ├─22072 ovn-northd: monitoring pid 22073 (healthy)
           ├─22073 ovn-northd -vconsole:emer -vsyslog:err -vfile:info
--ovnnb-db=tcp:10.0.7.33:6641,tcp:10.0.7.
           └─22134 ovsdb-server -vconsole:off -vfile:info
--log-file=/var/log/openvswitch/ovsdb-server-nb.log


Same call trace and reason:

#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f79599a1801 in __GI_abort () at abort.c:79
#2  0x00005596879c017c in json_serialize (json=<optimized out>,
s=<optimized out>) at ../lib/json.c:1554
#3  0x00005596879c01eb in json_serialize_object_member (i=<optimized out>,
s=<optimized out>, node=<optimized out>, node=<optimized out>) at
../lib/json.c:1583
#4  0x00005596879c0132 in json_serialize_object (s=0x7ffc17013bf0,
object=0x55968993dcb0) at ../lib/json.c:1612
#5  json_serialize (json=<optimized out>, s=0x7ffc17013bf0) at
../lib/json.c:1533
#6  0x00005596879c249c in json_to_ds (json=json at entry=0x559689950670,
flags=flags at entry=0, ds=ds at entry=0x7ffc17013c80) at ../lib/json.c:1511
#7  0x00005596879ae8df in ovsdb_log_compose_record
(json=json at entry=0x559689950670,
magic=0x55968993dc60 "CLUSTER", header=header at entry=0x7ffc17013c60,
    data=data at entry=0x7ffc17013c80) at ../ovsdb/log.c:570
#8  0x00005596879aebbf in ovsdb_log_write (file=0x5596899b5df0,
json=0x559689950670) at ../ovsdb/log.c:618
#9  0x00005596879aed3e in ovsdb_log_write_and_free
(log=log at entry=0x5596899b5df0,
json=0x559689950670) at ../ovsdb/log.c:651
#10 0x00005596879b0954 in raft_write_snapshot (raft=raft at entry=0x5596899151a0,
log=0x5596899b5df0, new_log_start=new_log_start at entry=166,
    new_snapshot=new_snapshot at entry=0x7ffc17013e30) at ../ovsdb/raft.c:3588
#11 0x00005596879b0ec3 in raft_save_snapshot (raft=raft at entry=0x5596899151a0,
new_start=new_start at entry=166, new_snapshot=new_snapshot at entry
=0x7ffc17013e30)
    at ../ovsdb/raft.c:3647
#12 0x00005596879b8aed in raft_store_snapshot (raft=0x5596899151a0,
new_snapshot_data=new_snapshot_data at entry=0x5596899505f0) at
../ovsdb/raft.c:3849
#13 0x00005596879a579e in ovsdb_storage_store_snapshot__
(storage=0x5596899137a0, schema=0x559689938ca0, data=0x559689946ea0) at
../ovsdb/storage.c:541
#14 0x00005596879a625e in ovsdb_storage_store_snapshot
(storage=0x5596899137a0, schema=schema at entry=0x559689938ca0,
data=data at entry=0x559689946ea0)
at ../ovsdb/storage.c:568
#15 0x000055968799f5ab in ovsdb_snapshot (db=0x5596899137e0) at
../ovsdb/ovsdb.c:519
#16 0x0000559687999f23 in ovsdb_server_compact (conn=0x559689938440,
argc=<optimized out>, argv=<optimized out>, dbs_=0x7ffc170141c0) at
../ovsdb/ovsdb-server.c:1443
#17 0x00005596879d9cc0 in process_command (request=<optimized out>,
conn=0x559689938440) at ../lib/unixctl.c:315
#18 run_connection (conn=0x559689938440) at ../lib/unixctl.c:349
#19 unixctl_server_run (server=0x559689937370) at ../lib/unixctl.c:400
#20 0x0000559687996e1e in main_loop (is_backup=0x7ffc1701412e,
exiting=0x7ffc1701412f, run_process=0x0, remotes=0x7ffc17014180,
unixctl=0x559689937370, all_dbs=0x7ffc170141c0,
    jsonrpc=0x559689915120, config=0x7ffc170141e0) at
../ovsdb/ovsdb-server.c:201
#21 main (argc=<optimized out>, argv=<optimized out>) at
../ovsdb/ovsdb-server.c:457

Thanks,
~Girish



On Wed, Jul 25, 2018 at 3:06 PM, Ben Pfaff <blp at ovn.org> wrote:

> On Wed, Jul 18, 2018 at 10:48:08AM -0700, Girish Moodalbail wrote:
> > Hello all,
> >
> > We are able to reproduce this issue on OVS 2.9.2 at will. The OVSDB NB
> > server or OVSDB SB server dumps core while it is trying to compact the
> > database.
> >
> > You can reproduce the issue by using:
> >
> > root at u1804-HVM-domU:/var/crash# ovs-appctl -t
> > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact OVN_Southbound
> >
> > 2018-07-18T17:34:29Z|00001|unixctl|WARN|error communicating with
> > unix:/var/run/openvswitch/ovnsb_db.ctl: End of file
> > ovs-appctl: /var/run/openvswitch/ovnsb_db.ctl: transaction error (End of
> > file)
>
> Hmm.  I've now spent some time playing with clustered OVSDB, in 3-server
> and 5-server configurations, and triggering compaction at various points
> while starting and stopping servers.  But I haven't yet managed to
> trigger this crash.
>
> Is there anything else that seems to be an important element?
>
> Thanks,
>
> Ben.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180726/6fed5879/attachment-0001.html>


More information about the discuss mailing list