[ovs-dev] [replication SMv2 7/7] ovsdb: Replication usability improvements
Numan Siddique
nusiddiq at redhat.com
Wed Aug 31 12:06:10 UTC 2016
On Wed, Aug 31, 2016 at 12:03 AM, Andy Zhou <azhou at ovn.org> wrote:
>
>
> On Tue, Aug 30, 2016 at 4:17 AM, Numan Siddique <nusiddiq at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Aug 30, 2016 at 1:11 AM, Andy Zhou <azhou at ovn.org> wrote:
>>
>>>
>>>
>>> On Mon, Aug 29, 2016 at 3:14 AM, Numan Siddique <nusiddiq at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Aug 27, 2016 at 4:45 AM, Andy Zhou <azhou at ovn.org> wrote:
>>>>
>>>>> Added the '--no-sync' option base on feedbacks of current
>>>>> implementation.
>>>>>
>>>>> Added appctl command "ovsdb-server/sync-status" based on feedbacks
>>>>> of current implementation.
>>>>>
>>>>> Added a test to simulate the integration of HA manager with OVSDB
>>>>> server using replication.
>>>>>
>>>>> Other documentation and API improvements.
>>>>>
>>>>> Signed-off-by: Andy Zhou <azhou at ovn.org>
>>>>> ------
>>>>>
>>>>> I hope to get some review comments on the command line and appctl
>>>>> interfaces for replication. Since 2.6 is the first release of those
>>>>> interfaces, it is easier to making changes, compare to future
>>>>> releases.
>>>>>
>>>>> ----
>>>>> v1->v2: Fix creashes reported at:
>>>>> http://openvswitch.org/pipermail/dev/2016-August/078591.html
>>>>> ---
>>>>>
>>>>
>>>> I haven't tested these patches yet. This patch seems to have a white
>>>> space warning when applied.
>>>>
>>> Thanks for the reported. I will fold the fix in the next version when
>>> posting.
>>>
>>> In case it helps, you can also access the patches from my private repo
>>> at:
>>> https://github.com/azhou-nicira/ovs-review/tree/ovsdb-replic
>>> ation-sm-v2
>>>
>>>
>>
>> Hi Andy,
>>
>> I am seeing the below crash when
>>
>> - The ovsdb-server changes from
>> master to standby and the active-ovsdb-server it is about to connect to
>> is killed just before that or it is not reachable.
>>
>> -
>> The pacemaker OCF script calls the sync-status cmd soon after that.
>>
>>
>> Please let me know if you need more information.
>>
>>
>> Core was generated by `ovsdb-server -vdbg --log-file=/opt/stack/logs/ovsdb-server-sb.log
>> --remote=puni'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0 0x000000000041241d in replication_status () at ovsdb/replication.c:875
>> 875 SHASH_FOR_EACH (node, replication_dbs) {
>> Missing separate debuginfos, use: dnf debuginfo-install
>> glibc-2.23.1-10.fc24.x86_64 openssl-libs-1.0.2h-3.fc24.x86_64
>> (gdb) bt
>> #0 0x000000000041241d in replication_status () at ovsdb/replication.c:875
>> #1 0x0000000000406eda in ovsdb_server_get_sync_status (conn=0x1421fd0,
>> argc=<optimized out>, argv=<optimized out>, config_=<optimized out>)
>> at ovsdb/ovsdb-server.c:1480
>> #2 0x00000000004324ee in process_command (request=0x1421f30,
>> conn=0x1421fd0) at lib/unixctl.c:313
>> #3 run_connection (conn=0x1421fd0) at lib/unixctl.c:347
>> #4 unixctl_server_run (server=server at entry=0x141e140) at
>> lib/unixctl.c:400
>> #5 0x0000000000405bdc in main_loop (is_backup=0x7fff08062256,
>> exiting=0x7fff08062257, run_process=0x0, remotes=0x7fff080622a0,
>> unixctl=0x141e140,
>> all_dbs=0x7fff080622e0, jsonrpc=0x13f6f00) at ovsdb/ovsdb-server.c:182
>> #6 main (argc=<optimized out>, argv=<optimized out>) at
>> ovsdb/ovsdb-server.c:430
>>
>> Numan, thanks for the report. I think I spotted the bug:
>
> Currently, when replication state machine is reset, the state update
> takes place after a round of main loop run. this time lag
> could lead to the back trace in case the unixctl commands was issued
> during this time lag. I have a fix that add another
> state to represent the reset condition. The fix is at:
>
> https://github.com/azhou-nicira/ovs-review/tree/ovsdb-replication-sm-v3
>
> Would you please let me know if this version works any better?. Thanks!
>
Sure. I would test and let you know.
Thanks
Numan
More information about the dev
mailing list