[ovs-discuss] [ovs-dev] ovsdb-server core dump and ovsdb corruption using raft cluster

Yifeng Sun pkusunyifeng at gmail.com
Tue Jul 24 23:41:46 UTC 2018


My apologize, the patch has some issue. I need to dig further.

Yifeng

On Tue, Jul 24, 2018 at 1:40 PM, Yifeng Sun <pkusunyifeng at gmail.com> wrote:

> Hi Yun and Girish,
>
> I submitted a patch, do you mind testing and reviewing it? Thanks.
>
> [PATCH] dynamic-string: Fix a bug that leads to assertion fail
>
> diff --git a/lib/dynamic-string.c b/lib/dynamic-string.c
> index 6f7b610a9908..4564e420544d 100644
> --- a/lib/dynamic-string.c
> +++ b/lib/dynamic-string.c
> @@ -158,7 +158,7 @@ ds_put_format_valist(struct ds *ds, const char
> *format, va_list args_)
>      if (needed < available) {
>          ds->length += needed;
>      } else {
> -        ds_reserve(ds, ds->length + needed);
> +        ds_reserve(ds, ds->allocated + needed);
>
>          va_copy(args, args_);
>          available = ds->allocated - ds->length + 1;
>
>
> Thanks,
> Yifeng Sun
>
> On Wed, Jul 18, 2018 at 10:48 AM, Girish Moodalbail <gmoodalbail at gmail.com
> > wrote:
>
>> Hello all,
>>
>> We are able to reproduce this issue on OVS 2.9.2 at will. The OVSDB NB
>> server or OVSDB SB server dumps core while it is trying to compact the
>> database.
>>
>> You can reproduce the issue by using:
>>
>> root at u1804-HVM-domU:/var/crash# ovs-appctl -t
>> /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact OVN_Southbound
>>
>> 2018-07-18T17:34:29Z|00001|unixctl|WARN|error communicating with
>> unix:/var/run/openvswitch/ovnsb_db.ctl: End of file
>> ovs-appctl: /var/run/openvswitch/ovnsb_db.ctl: transaction error (End of
>> file)
>> root at u1804-HVM-domU:/var/crash#
>> root at u1804-HVM-domU:/var/crash#
>> root at u1804-HVM-domU:/var/crash# ERROR: apport (pid 17393) Wed Jul 18
>> 10:34:23 2018: called for pid 14683, signal 6, core limit 0, dump mode 1
>> ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: executable:
>> /usr/sbin/ovsdb-server (command line "ovsdb-server -vconsole:off
>> -vfile:info --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>> --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl
>> --detach
>> --monitor --remote=db:OVN_Southbound,SB_Global,connections
>> --private-key=db:OVN_Southbound,SSL,private_key
>> --certificate=db:OVN_Southbound,SSL,certificate
>> --ca-cert=db:OVN_Southbound,SSL,ca_cert
>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>> --remote=ptcp:6642:10.0.7.33 /etc/openvswitch/ovnsb_db.db")
>> ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: is_closing_session():
>> no DBUS_SESSION_BUS_ADDRESS in environment
>> ERROR: apport (pid 17393) Wed Jul 18 10:34:29 2018: wrote report
>> /var/crash/_usr_sbin_ovsdb-server.0.crash
>>
>> Looking through the crash we see the following stack:
>>
>> (gdb) bt
>> #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>> #1  0x00007f7c9a43c801 in __GI_abort () at abort.c:79
>> #2  0x00007f7c9aaa633c in json_serialize (json=<optimized out>,
>> s=<optimized out>) at lib/json.c:1554
>> #3  0x00007f7c9aaa63ab in json_serialize_object_member (i=<optimized out>,
>> s=<optimized out>, node=<optimized out>, node=<optimized out>)
>>     at lib/json.c:1583
>> #4  0x00007f7c9aaa62f2 in json_serialize_object (s=0x7ffca2173ea0,
>> object=0x5568dc5d5b10) at lib/json.c:1612
>> #5  json_serialize (json=<optimized out>, s=0x7ffca2173ea0) at
>> lib/json.c:1533
>> #6  0x00007f7c9aaa863c in json_to_ds (json=json at entry=0x5568dc5d4a20,
>> flags=flags at entry=0, ds=ds at entry=0x7ffca2173f30) at lib/json.c:1511
>> #7  0x00007f7c9ae6750f in ovsdb_log_compose_record
>> (json=json at entry=0x5568dc5d4a20,
>> magic=0x5568dc5d5a60 "CLUSTER",
>>     header=header at entry=0x7ffca2173f10, data=data at entry=0x7ffca2173f30)
>> at
>> ovsdb/log.c:570
>> #8  0x00007f7c9ae677ef in ovsdb_log_write (file=0x5568dc5d5a80,
>> json=0x5568dc5d4a20) at ovsdb/log.c:618
>> #9  0x00007f7c9ae6796e in ovsdb_log_write_and_free
>> (log=log at entry=0x5568dc5d5a80,
>> json=0x5568dc5d4a20) at ovsdb/log.c:651
>> #10 0x00007f7c9ae6d684 in raft_write_snapshot (raft=raft at entry
>> =0x5568dc1e3720,
>> log=0x5568dc5d5a80, new_log_start=new_log_start at entry=539578,
>>     new_snapshot=new_snapshot at entry=0x7ffca21740e0) at ovsdb/raft.c:3588
>> #11 0x00007f7c9ae6dbf3 in raft_save_snapshot (raft=raft at entry
>> =0x5568dc1e3720,
>> new_start=new_start at entry=539578,
>>     new_snapshot=new_snapshot at entry=0x7ffca21740e0) at ovsdb/raft.c:3647
>> #12 0x00007f7c9ae757bd in raft_store_snapshot (raft=0x5568dc1e3720,
>> new_snapshot_data=new_snapshot_data at entry=0x5568dc5d49a0)
>>     at ovsdb/raft.c:3849
>> #13 0x00007f7c9ae7c7ae in ovsdb_storage_store_snapshot__
>> (storage=0x5568dc6b2fb0, schema=0x5568dd66f5a0, data=0x5568dca67880)
>>     at ovsdb/storage.c:541
>> #14 0x00007f7c9ae7d1de in ovsdb_storage_store_snapshot
>> (storage=0x5568dc6b2fb0, schema=schema at entry=0x5568dd66f5a0,
>>     data=data at entry=0x5568dca67880) at ovsdb/storage.c:568
>> #15 0x00007f7c9ae69cab in ovsdb_snapshot (db=0x5568dc6b3020) at
>> ovsdb/ovsdb.c:519
>> #16 0x00005568daec1f82 in main_loop (is_backup=0x7ffca21742be,
>> exiting=0x7ffca21742bf, run_process=0x0, remotes=0x7ffca2174310,
>>     unixctl=0x5568dc71ade0, all_dbs=0x7ffca2174350,
>> jsonrpc=0x5568dc1e36a0,
>> config=0x7ffca2174370) at ovsdb/ovsdb-server.c:239
>> #17 main (argc=<optimized out>, argv=<optimized out>) at
>> ovsdb/ovsdb-server.c:457
>>
>> Walking through the JSON objects being serialized we see that
>> "prev_servers" is malformed.
>>
>> (gdb) print *((struct shash *)0x5568dc5d5b10)
>> $3 = {
>>   map = {
>>     buckets = 0x5568dc5d1d30,
>>     one = 0x0,
>>     mask = 7,
>>     n = 9
>>   }
>> }
>>
>> (gdb) x/6a 0x5568dc5d1d30
>> 0x5568dc5d1d30:    0x5568dc5d6000    0x0
>> 0x5568dc5d1d40:    0x0    0x5568dc5d5f30
>> 0x5568dc5d1d50:    0x5568dc5d5e30    0x5568dc5d5bc0
>>
>> Let us look at the next one
>>
>> (gdb) print *((struct shash_node *)0x5568dc5d5e30)
>> $7 = {
>>   node = {
>>     hash = 2043875868,
>>     next = 0x0
>>   },
>>   name = 0x5568dc5d5e10 "prev_servers",
>>   data = 0x5568dc688cd0
>> }
>>
>> (gdb) print *((struct json *)0x5568dc688cd0)
>> $10 = {
>>   type = 3697839232,
>>   count = 34,
>>   u = {
>>     object = 0x5568dc688cb0,
>>     array = {
>>       n = 93908862799024,
>>       n_allocated = 93908862798944,
>>       elems = 0x5568dc22f050
>>     },
>>     integer = 93908862799024,
>>     real = 4.6397142949016804e-310,
>>     string = 0x5568dc688cb0 "\a"
>>   }
>> }
>>
>> So, this is malformed. Somehow "prev_servers" is getting malformed.
>>
>> That information is coming in from 'struct raft`snap`servers'
>>
>> As anyone seen this before?
>>
>>
>> On Fri, Jul 13, 2018 at 3:49 PM, Yun Zhou <yunz at nvidia.com> wrote:
>>
>> > Hi,
>> >
>> > We are running into some issues while we are trying out the 3 nodes raft
>> > ovsdb cluster in our lab, and hopefully we can get some help from the
>> > community.
>> >
>> > We are using ovs 2.9.2.
>> > -------------------------
>> >
>> > We found that on one of the 3 nodes, the SB ovsdb-server was not
>> started,
>> > and was not able to be restarted because its database was already
>> corrupted:
>> >
>> >    "ovsdb-server: syntax "{"encaps":["uuid","7f0f7605-
>> > c1d1-43fb-826a-1718ea70e088"],"hostname":"nd-sdn-dgx-010"}": syntax
>> > error: hostname is not a UUID"
>> >
>> > Seeing from the ovsdb-server-sb log file history, SB ovsdb-server core
>> > dumped several days ago:
>> >
>> >        "2018-07-08T06:58:15.267Z|00002|daemon_unix(monitor)|ERR|1
>> > crashes: pid 937 died, killed (Aborted), core dumped, restarting"
>> >
>> > Unfortunately, core dump was not generated.
>> >
>> > FWIW, we saw core dumps for the NB ovsdb on all 3 cluster nodes, here is
>> > one of the stack:
>> >
>> > (gdb) bt
>> > #0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/rai
>> se.c:51
>> > #1  0x00007fc48f8c2801 in __GI_abort () at abort.c:79
>> > #2  0x00007fc48ff2c33c in ?? () from /usr/lib/x86_64-linux-gnu/
>> > libopenvswitch-2.9.so.0
>> > #3  0x00007fc48ff2c2f2 in ?? () from /usr/lib/x86_64-linux-gnu/
>> > libopenvswitch-2.9.so.0
>> > #4  0x00007fc48ff2e63c in json_to_ds ()
>> >    from /usr/lib/x86_64-linux-gnu/libopenvswitch-2.9.so.0
>> > #5  0x00007fc4902ed50f in ovsdb_log_compose_record ()
>> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
>> > #6  0x00007fc4902ed7ef in ovsdb_log_write ()
>> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
>> > #7  0x00007fc4902ed96e in ovsdb_log_write_and_free ()
>> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
>> > #8  0x00007fc4902f3684 in ?? () from /usr/lib/x86_64-linux-gnu/
>> > libovsdb-2.9.so.0
>> > #9  0x00007fc4902f3bf3 in ?? () from /usr/lib/x86_64-linux-gnu/
>> > libovsdb-2.9.so.0
>> > #10 0x00007fc4902fb7bd in raft_store_snapshot ()
>> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
>> > #11 0x00007fc4903027ae in ?? () from /usr/lib/x86_64-linux-gnu/
>> > libovsdb-2.9.so.0
>> > #12 0x00007fc4903031de in ovsdb_storage_store_snapshot ()
>> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
>> > #13 0x00007fc4902efcab in ovsdb_snapshot ()
>> >    from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0
>> > #14 0x0000561e47a8cf82 in ?? ()
>> > #15 0x00007fc48f8a3b97 in __libc_start_main (main=0x561e47a8bef0,
>> argc=17,
>> >     argv=0x7ffe000ce2c8, init=<optimized out>, fini=<optimized out>,
>> >     rtld_fini=<optimized out>, stack_end=0x7ffe000ce2b8) at
>> > ../csu/libc-start.c:310
>> > #16 0x0000561e47a8db9a in ?? ()
>> >
>> > Please let us know if any more information is needed. Thanks very much!
>> >
>> > - Yun
>> >
>> >
>> > ------------------------------------------------------------
>> > -----------------------
>> > This email message is for the sole use of the intended recipient(s) and
>> > may contain
>> > confidential information.  Any unauthorized review, use, disclosure or
>> > distribution
>> > is prohibited.  If you are not the intended recipient, please contact
>> the
>> > sender by
>> > reply email and destroy all copies of the original message.
>> > ------------------------------------------------------------
>> > -----------------------
>> > _______________________________________________
>> > discuss mailing list
>> > discuss at openvswitch.org
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> >
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180724/006850a8/attachment-0001.html>


More information about the discuss mailing list