[ovs-dev] [ovs-discuss] ovsdb-server core dump and ovsdb corruption using raft cluster

Yifeng Sun pkusunyifeng at gmail.com
Tue Jul 24 20:40:36 UTC 2018


Hi Yun and Girish,

I submitted a patch, do you mind testing and reviewing it? Thanks.

[PATCH] dynamic-string: Fix a bug that leads to assertion fail

diff --git a/lib/dynamic-string.c b/lib/dynamic-string.c
index 6f7b610a9908..4564e420544d 100644
--- a/lib/dynamic-string.c
+++ b/lib/dynamic-string.c
@@ -158,7 +158,7 @@ ds_put_format_valist(struct ds *ds, const char *format,
va_list args_)
     if (needed < available) {
         ds->length += needed;
     } else {
-        ds_reserve(ds, ds->length + needed);
+        ds_reserve(ds, ds->allocated + needed);

         va_copy(args, args_);
         available = ds->allocated - ds->length + 1;


Thanks,
Yifeng Sun

On Wed, Jul 18, 2018 at 10:48 AM, Girish Moodalbail <gmoodalbail at gmail.com>
wrote:

> Hello all,
>
> We are able to reproduce this issue on OVS 2.9.2 at will. The OVSDB NB
> server or OVSDB SB server dumps core while it is trying to compact the
> database.
>
> You can reproduce the issue by using:
>
> root at u1804-HVM-domU:/var/crash# ovs-appctl -t
> /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact OVN_Southbound
>
> 2018-07-18T17:34:29Z|00001|unixctl|WARN|error communicating with
> unix:/var/run/openvswitch/ovnsb_db.ctl: End of file
> ovs-appctl: /var/run/openvswitch/ovnsb_db.ctl: transaction error (End of
> file)
> root at u1804-HVM-domU:/var/crash#
> root at u1804-HVM-domU:/var/crash#
> root at u1804-HVM-domU:/var/crash# ERROR: apport (pid 17393) Wed Jul 18
> 10:34:23 2018: called for pid 14683, signal 6, core limit 0, dump mode 1
> ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: executable:
> /usr/sbin/ovsdb-server (command line "ovsdb-server -vconsole:off
> -vfile:info --log-file=/var/log/openvswitch/ovsdb-server-sb.log
> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
> --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl
> --detach
> --monitor --remote=db:OVN_Southbound,SB_Global,connections
> --private-key=db:OVN_Southbound,SSL,private_key
> --certificate=db:OVN_Southbound,SSL,certificate
> --ca-cert=db:OVN_Southbound,SSL,ca_cert
> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
> --remote=ptcp:6642:10.0.7.33 /etc/openvswitch/ovnsb_db.db")
> ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: is_closing_session():
> no DBUS_SESSION_BUS_ADDRESS in environment
> ERROR: apport (pid 17393) Wed Jul 18 10:34:29 2018: wrote report
> /var/crash/_usr_sbin_ovsdb-server.0.crash
>
> Looking through the crash we see the following stack:
>
> (gdb) bt
> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x00007f7c9a43c801 in __GI_abort () at abort.c:79
> #2  0x00007f7c9aaa633c in json_serialize (json=<optimized out>,
> s=<optimized out>) at lib/json.c:1554
> #3  0x00007f7c9aaa63ab in json_serialize_object_member (i=<optimized out>,
> s=<optimized out>, node=<optimized out>, node=<optimized out>)
>     at lib/json.c:1583
> #4  0x00007f7c9aaa62f2 in json_serialize_object (s=0x7ffca2173ea0,
> object=0x5568dc5d5b10) at lib/json.c:1612
> #5  json_serialize (json=<optimized out>, s=0x7ffca2173ea0) at
> lib/json.c:1533
> #6  0x00007f7c9aaa863c in json_to_ds (json=json at entry=0x5568dc5d4a20,
> flags=flags at entry=0, ds=ds at entry=0x7ffca2173f30) at lib/json.c:1511
> #7  0x00007f7c9ae6750f in ovsdb_log_compose_record
> (json=json at entry=0x5568dc5d4a20,
> magic=0x5568dc5d5a60 "CLUSTER",
>     header=header at entry=0x7ffca2173f10, data=data at entry=0x7ffca2173f30) at
> ovsdb/log.c:570
> #8  0x00007f7c9ae677ef in ovsdb_log_write (file=0x5568dc5d5a80,
> json=0x5568dc5d4a20) at ovsdb/log.c:618
> #9  0x00007f7c9ae6796e in ovsdb_log_write_and_free
> (log=log at entry=0x5568dc5d5a80,
> json=0x5568dc5d4a20) at ovsdb/log.c:651
> #10 0x00007f7c9ae6d684 in raft_write_snapshot (raft=raft at entry=
> 0x5568dc1e3720,
> log=0x5568dc5d5a80, new_log_start=new_log_start at entry=539578,
>     new_snapshot=new_snapshot at entry=0x7ffca21740e0) at ovsdb/raft.c:3588
> #11 0x00007f7c9ae6dbf3 in raft_save_snapshot (raft=raft at entry=
> 0x5568dc1e3720,
> new_start=new_start at entry=539578,
>     new_snapshot=new_snapshot at entry=0x7ffca21740e0) at ovsdb/raft.c:3647
> #12 0x00007f7c9ae757bd in raft_store_snapshot (raft=0x5568dc1e3720,
> new_snapshot_data=new_snapshot_data at entry=0x5568dc5d49a0)
>     at ovsdb/raft.c:3849
> #13 0x00007f7c9ae7c7ae in ovsdb_storage_store_snapshot__
> (storage=0x5568dc6b2fb0, schema=0x5568dd66f5a0, data=0x5568dca67880)
>     at ovsdb/storage.c:541
> #14 0x00007f7c9ae7d1de in ovsdb_storage_store_snapshot
> (storage=0x5568dc6b2fb0, schema=schema at entry=0x5568dd66f5a0,
>     data=data at entry=0x5568dca67880) at ovsdb/storage.c:568
> #15 0x00007f7c9ae69cab in ovsdb_snapshot (db=0x5568dc6b3020) at
> ovsdb/ovsdb.c:519
> #16 0x00005568daec1f82 in main_loop (is_backup=0x7ffca21742be,
> exiting=0x7ffca21742bf, run_process=0x0, remotes=0x7ffca2174310,
>     unixctl=0x5568dc71ade0, all_dbs=0x7ffca2174350, jsonrpc=0x5568dc1e36a0,
> config=0x7ffca2174370) at ovsdb/ovsdb-server.c:239
> #17 main (argc=<optimized out>, argv=<optimized out>) at
> ovsdb/ovsdb-server.c:457
>
> Walking through the JSON objects being serialized we see that
> "prev_servers" is malformed.
>
> (gdb) print *((struct shash *)0x5568dc5d5b10)
> $3 = {
>   map = {
>     buckets = 0x5568dc5d1d30,
>     one = 0x0,
>     mask = 7,
>     n = 9
>   }
> }
>
> (gdb) x/6a 0x5568dc5d1d30
> 0x5568dc5d1d30:    0x5568dc5d6000    0x0
> 0x5568dc5d1d40:    0x0    0x5568dc5d5f30
> 0x5568dc5d1d50:    0x5568dc5d5e30    0x5568dc5d5bc0
>
> Let us look at the next one
>
> (gdb) print *((struct shash_node *)0x5568dc5d5e30)
> $7 = {
>   node = {
>     hash = 2043875868,
>     next = 0x0
>   },
>   name = 0x5568dc5d5e10 "prev_servers",
>   data = 0x5568dc688cd0
> }
>
> (gdb) print *((struct json *)0x5568dc688cd0)
> $10 = {
>   type = 3697839232,
>   count = 34,
>   u = {
>     object = 0x5568dc688cb0,
>     array = {
>       n = 93908862799024,
>       n_allocated = 93908862798944,
>       elems = 0x5568dc22f050
>     },
>     integer = 93908862799024,
>     real = 4.6397142949016804e-310,
>     string = 0x5568dc688cb0 "\a"
>   }
> }
>
> So, this is malformed. Somehow "prev_servers" is getting malformed.
>
> That information is coming in from 'struct raft`snap`servers'
>
> As anyone seen this before?
>
>
> On Fri, Jul 13, 2018 at 3:49 PM, Yun Zhou <yunz at nvidia.com> wrote:
>
> > Hi,
> >
> > We are running into some issues while we are trying out the 3 nodes raft
> > ovsdb cluster in our lab, and hopefully we can get some help from the
> > community.
> >
> > We are using ovs 2.9.2.
> > -------------------------
> >
> > We found that on one of the 3 nodes, the SB ovsdb-server was not started,
> > and was not able to be restarted because its database was already
> corrupted:
> >
> >    "ovsdb-server: syntax "{"encaps":["uuid","7f0f7605-
> > c1d1-43fb-826a-1718ea70e088"],"hostname":"nd-sdn-dgx-010"}": syntax
> > error: hostname is not a UUID"
> >
> > Seeing from the ovsdb-server-sb log file history, SB ovsdb-server core
> > dumped several days ago:
> >
> >        "2018-07-08T06:58:15.267Z|00002|daemon_unix(monitor)|ERR|1
> > crashes: pid 937 died, killed (Aborted), core dumped, restarting"
> >
> > Unfortunately, core dump was not generated.
> >
> > FWIW, we saw core dumps for the NB ovsdb on all 3 cluster nodes, here is
> > one of the stack:
> >
> > (gdb) bt
> > #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/
> raise.c:51
> > #1  0x00007fc48f8c2801 in __GI_abort () at abort.c:79
> > #2  0x00007fc48ff2c33c in ?? () from /usr/lib/x86_64-linux-gnu/
> > libopenvswitch-2.9.so.0
> > #3  0x00007fc48ff2c2f2 in ?? () from /usr/lib/x86_64-linux-gnu/
> > libopenvswitch-2.9.so.0
> > #4  0x00007fc48ff2e63c in json_to_ds ()
> >    from /usr/lib/x86_64-linux-gnu/libopenvswitch-2.9.so.0
> > #5  0x00007fc4902ed50f in ovsdb_log_compose_record ()
> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> > #6  0x00007fc4902ed7ef in ovsdb_log_write ()
> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> > #7  0x00007fc4902ed96e in ovsdb_log_write_and_free ()
> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> > #8  0x00007fc4902f3684 in ?? () from /usr/lib/x86_64-linux-gnu/
> > libovsdb-2.9.so.0
> > #9  0x00007fc4902f3bf3 in ?? () from /usr/lib/x86_64-linux-gnu/
> > libovsdb-2.9.so.0
> > #10 0x00007fc4902fb7bd in raft_store_snapshot ()
> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> > #11 0x00007fc4903027ae in ?? () from /usr/lib/x86_64-linux-gnu/
> > libovsdb-2.9.so.0
> > #12 0x00007fc4903031de in ovsdb_storage_store_snapshot ()
> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> > #13 0x00007fc4902efcab in ovsdb_snapshot ()
> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
> > #14 0x0000561e47a8cf82 in ?? ()
> > #15 0x00007fc48f8a3b97 in __libc_start_main (main=0x561e47a8bef0,
> argc=17,
> >     argv=0x7ffe000ce2c8, init=<optimized out>, fini=<optimized out>,
> >     rtld_fini=<optimized out>, stack_end=0x7ffe000ce2b8) at
> > ../csu/libc-start.c:310
> > #16 0x0000561e47a8db9a in ?? ()
> >
> > Please let us know if any more information is needed. Thanks very much!
> >
> > - Yun
> >
> >
> > ------------------------------------------------------------
> > -----------------------
> > This email message is for the sole use of the intended recipient(s) and
> > may contain
> > confidential information.  Any unauthorized review, use, disclosure or
> > distribution
> > is prohibited.  If you are not the intended recipient, please contact the
> > sender by
> > reply email and destroy all copies of the original message.
> > ------------------------------------------------------------
> > -----------------------
> > _______________________________________________
> > discuss mailing list
> > discuss at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list