[ovs-dev] [ovs-discuss] ovsdb-server core dump and ovsdb corruption using raft cluster

Girish Moodalbail gmoodalbail at gmail.com
Wed Jul 18 17:48:08 UTC 2018


Hello all,

We are able to reproduce this issue on OVS 2.9.2 at will. The OVSDB NB
server or OVSDB SB server dumps core while it is trying to compact the
database.

You can reproduce the issue by using:

root at u1804-HVM-domU:/var/crash# ovs-appctl -t
/var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact OVN_Southbound

2018-07-18T17:34:29Z|00001|unixctl|WARN|error communicating with
unix:/var/run/openvswitch/ovnsb_db.ctl: End of file
ovs-appctl: /var/run/openvswitch/ovnsb_db.ctl: transaction error (End of
file)
root at u1804-HVM-domU:/var/crash#
root at u1804-HVM-domU:/var/crash#
root at u1804-HVM-domU:/var/crash# ERROR: apport (pid 17393) Wed Jul 18
10:34:23 2018: called for pid 14683, signal 6, core limit 0, dump mode 1
ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: executable:
/usr/sbin/ovsdb-server (command line "ovsdb-server -vconsole:off
-vfile:info --log-file=/var/log/openvswitch/ovsdb-server-sb.log
--remote=punix:/var/run/openvswitch/ovnsb_db.sock
--pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach
--monitor --remote=db:OVN_Southbound,SB_Global,connections
--private-key=db:OVN_Southbound,SSL,private_key
--certificate=db:OVN_Southbound,SSL,certificate
--ca-cert=db:OVN_Southbound,SSL,ca_cert
--ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
--ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
--remote=ptcp:6642:10.0.7.33 /etc/openvswitch/ovnsb_db.db")
ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: is_closing_session():
no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 17393) Wed Jul 18 10:34:29 2018: wrote report
/var/crash/_usr_sbin_ovsdb-server.0.crash

Looking through the crash we see the following stack:

(gdb) bt
#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f7c9a43c801 in __GI_abort () at abort.c:79
#2  0x00007f7c9aaa633c in json_serialize (json=<optimized out>,
s=<optimized out>) at lib/json.c:1554
#3  0x00007f7c9aaa63ab in json_serialize_object_member (i=<optimized out>,
s=<optimized out>, node=<optimized out>, node=<optimized out>)
    at lib/json.c:1583
#4  0x00007f7c9aaa62f2 in json_serialize_object (s=0x7ffca2173ea0,
object=0x5568dc5d5b10) at lib/json.c:1612
#5  json_serialize (json=<optimized out>, s=0x7ffca2173ea0) at
lib/json.c:1533
#6  0x00007f7c9aaa863c in json_to_ds (json=json at entry=0x5568dc5d4a20,
flags=flags at entry=0, ds=ds at entry=0x7ffca2173f30) at lib/json.c:1511
#7  0x00007f7c9ae6750f in ovsdb_log_compose_record
(json=json at entry=0x5568dc5d4a20,
magic=0x5568dc5d5a60 "CLUSTER",
    header=header at entry=0x7ffca2173f10, data=data at entry=0x7ffca2173f30) at
ovsdb/log.c:570
#8  0x00007f7c9ae677ef in ovsdb_log_write (file=0x5568dc5d5a80,
json=0x5568dc5d4a20) at ovsdb/log.c:618
#9  0x00007f7c9ae6796e in ovsdb_log_write_and_free
(log=log at entry=0x5568dc5d5a80,
json=0x5568dc5d4a20) at ovsdb/log.c:651
#10 0x00007f7c9ae6d684 in raft_write_snapshot (raft=raft at entry=0x5568dc1e3720,
log=0x5568dc5d5a80, new_log_start=new_log_start at entry=539578,
    new_snapshot=new_snapshot at entry=0x7ffca21740e0) at ovsdb/raft.c:3588
#11 0x00007f7c9ae6dbf3 in raft_save_snapshot (raft=raft at entry=0x5568dc1e3720,
new_start=new_start at entry=539578,
    new_snapshot=new_snapshot at entry=0x7ffca21740e0) at ovsdb/raft.c:3647
#12 0x00007f7c9ae757bd in raft_store_snapshot (raft=0x5568dc1e3720,
new_snapshot_data=new_snapshot_data at entry=0x5568dc5d49a0)
    at ovsdb/raft.c:3849
#13 0x00007f7c9ae7c7ae in ovsdb_storage_store_snapshot__
(storage=0x5568dc6b2fb0, schema=0x5568dd66f5a0, data=0x5568dca67880)
    at ovsdb/storage.c:541
#14 0x00007f7c9ae7d1de in ovsdb_storage_store_snapshot
(storage=0x5568dc6b2fb0, schema=schema at entry=0x5568dd66f5a0,
    data=data at entry=0x5568dca67880) at ovsdb/storage.c:568
#15 0x00007f7c9ae69cab in ovsdb_snapshot (db=0x5568dc6b3020) at
ovsdb/ovsdb.c:519
#16 0x00005568daec1f82 in main_loop (is_backup=0x7ffca21742be,
exiting=0x7ffca21742bf, run_process=0x0, remotes=0x7ffca2174310,
    unixctl=0x5568dc71ade0, all_dbs=0x7ffca2174350, jsonrpc=0x5568dc1e36a0,
config=0x7ffca2174370) at ovsdb/ovsdb-server.c:239
#17 main (argc=<optimized out>, argv=<optimized out>) at
ovsdb/ovsdb-server.c:457

Walking through the JSON objects being serialized we see that
"prev_servers" is malformed.

(gdb) print *((struct shash *)0x5568dc5d5b10)
$3 = {
  map = {
    buckets = 0x5568dc5d1d30,
    one = 0x0,
    mask = 7,
    n = 9
  }
}

(gdb) x/6a 0x5568dc5d1d30
0x5568dc5d1d30:    0x5568dc5d6000    0x0
0x5568dc5d1d40:    0x0    0x5568dc5d5f30
0x5568dc5d1d50:    0x5568dc5d5e30    0x5568dc5d5bc0

Let us look at the next one

(gdb) print *((struct shash_node *)0x5568dc5d5e30)
$7 = {
  node = {
    hash = 2043875868,
    next = 0x0
  },
  name = 0x5568dc5d5e10 "prev_servers",
  data = 0x5568dc688cd0
}

(gdb) print *((struct json *)0x5568dc688cd0)
$10 = {
  type = 3697839232,
  count = 34,
  u = {
    object = 0x5568dc688cb0,
    array = {
      n = 93908862799024,
      n_allocated = 93908862798944,
      elems = 0x5568dc22f050
    },
    integer = 93908862799024,
    real = 4.6397142949016804e-310,
    string = 0x5568dc688cb0 "\a"
  }
}

So, this is malformed. Somehow "prev_servers" is getting malformed.

That information is coming in from 'struct raft`snap`servers'

As anyone seen this before?


On Fri, Jul 13, 2018 at 3:49 PM, Yun Zhou <yunz at nvidia.com> wrote:

> Hi,
>
> We are running into some issues while we are trying out the 3 nodes raft
> ovsdb cluster in our lab, and hopefully we can get some help from the
> community.
>
> We are using ovs 2.9.2.
> -------------------------
>
> We found that on one of the 3 nodes, the SB ovsdb-server was not started,
> and was not able to be restarted because its database was already corrupted:
>
>    "ovsdb-server: syntax "{"encaps":["uuid","7f0f7605-
> c1d1-43fb-826a-1718ea70e088"],"hostname":"nd-sdn-dgx-010"}": syntax
> error: hostname is not a UUID"
>
> Seeing from the ovsdb-server-sb log file history, SB ovsdb-server core
> dumped several days ago:
>
>        "2018-07-08T06:58:15.267Z|00002|daemon_unix(monitor)|ERR|1
> crashes: pid 937 died, killed (Aborted), core dumped, restarting"
>
> Unfortunately, core dump was not generated.
>
> FWIW, we saw core dumps for the NB ovsdb on all 3 cluster nodes, here is
> one of the stack:
>
> (gdb) bt
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007fc48f8c2801 in __GI_abort () at abort.c:79
> #2  0x00007fc48ff2c33c in ?? () from /usr/lib/x86_64-linux-gnu/
> libopenvswitch-2.9.so.0
> #3  0x00007fc48ff2c2f2 in ?? () from /usr/lib/x86_64-linux-gnu/
> libopenvswitch-2.9.so.0
> #4  0x00007fc48ff2e63c in json_to_ds ()
>    from /usr/lib/x86_64-linux-gnu/libopenvswitch-2.9.so.0
> #5  0x00007fc4902ed50f in ovsdb_log_compose_record ()
>    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> #6  0x00007fc4902ed7ef in ovsdb_log_write ()
>    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> #7  0x00007fc4902ed96e in ovsdb_log_write_and_free ()
>    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> #8  0x00007fc4902f3684 in ?? () from /usr/lib/x86_64-linux-gnu/
> libovsdb-2.9.so.0
> #9  0x00007fc4902f3bf3 in ?? () from /usr/lib/x86_64-linux-gnu/
> libovsdb-2.9.so.0
> #10 0x00007fc4902fb7bd in raft_store_snapshot ()
>    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> #11 0x00007fc4903027ae in ?? () from /usr/lib/x86_64-linux-gnu/
> libovsdb-2.9.so.0
> #12 0x00007fc4903031de in ovsdb_storage_store_snapshot ()
>    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> #13 0x00007fc4902efcab in ovsdb_snapshot ()
>    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> #14 0x0000561e47a8cf82 in ?? ()
> #15 0x00007fc48f8a3b97 in __libc_start_main (main=0x561e47a8bef0, argc=17,
>     argv=0x7ffe000ce2c8, init=<optimized out>, fini=<optimized out>,
>     rtld_fini=<optimized out>, stack_end=0x7ffe000ce2b8) at
> ../csu/libc-start.c:310
> #16 0x0000561e47a8db9a in ?? ()
>
> Please let us know if any more information is needed. Thanks very much!
>
> - Yun
>
>
> ------------------------------------------------------------
> -----------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
> ------------------------------------------------------------
> -----------------------
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>


More information about the dev mailing list