[ovs-dev] [PATCH] ovsdb-server: Write manager status information to Manager table.

Ben Pfaff blp at nicira.com
Tue Feb 1 17:55:24 UTC 2011


On Mon, Jan 31, 2011 at 05:05:58PM -0800, Andrew Evans wrote:
> This commit makes the status of manager connections visible via the Manager
> table in the database.  Two new columns have been created for this purpose:
> 'is_connected' and 'status'.  The former is a boolean flag, and the latter is a
> string-string map which may contain the keys "last_error", "state", and
> "time_in_state".
> 
> Feature #3692.
> Requested-by: Keith Amidon <keith at nicira.com>

Thanks for writing this up!

I think that parse_db_column() will segfault if its 'name_' argument
does not contain a colon.  Indeed:

    blp at blp:~/nicira/openflow/_build(139)$ valgrind ovsdb/ovsdb-server --remote=x conf.db --unixctl=$PWD/socket
    ==32657== Syscall param timer_create(evp) points to uninitialised byte(s)
    ==32657==    at 0x4854008: timer_create (timer_create.c:99)
    ==32657==    by 0x8077022: set_up_timer (timeval.c:139)
    ==32657==    by 0x807723C: refresh_monotonic_if_ticked (timeval.c:175)
    ==32657==    by 0x80772D0: time_msec (timeval.c:227)
    ==32657==    by 0x807AEA8: vlog_init (vlog.c:455)
    ==32657==    by 0x80628A8: daemonize_start (daemon.c:474)
    ==32657==    by 0x8055E54: main (ovsdb-server.c:105)
    ==32657==  Address 0xfe98d3a4 is on thread 1's stack
    ==32657== 
    Feb 01 09:44:37|00001|lockfile|WARN|.conf.db.~lock~: waited 8 ms for lock file
    Feb 01 09:44:38|00002|reconnect|INFO|x: connecting...
    Feb 01 09:44:38|00003|reconnect|WARN|x: connection attempt failed (Address family not supported by protocol)
    Feb 01 09:44:38|00004|reconnect|INFO|x: waiting 1 seconds before reconnect
    ==32657== Invalid read of size 1
    ==32657==    at 0x47DC0B8: strlen (mc_replace_strmem.c:282)
    ==32657==    by 0x80794B1: xstrdup (util.c:99)
    ==32657==    by 0x8055070: parse_db_column (ovsdb-server.c:196)
    ==32657==    by 0x805625E: main (ovsdb-server.c:492)
    ==32657==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
    ==32657== 
    ==32657== 
    ==32657== Process terminating with default action of signal 11 (SIGSEGV)
    ==32657==  Access not within mapped region at address 0x0
    ==32657==    at 0x47DC0B8: strlen (mc_replace_strmem.c:282)
    ==32657==    by 0x80794B1: xstrdup (util.c:99)
    ==32657==    by 0x8055070: parse_db_column (ovsdb-server.c:196)
    ==32657==    by 0x805625E: main (ovsdb-server.c:492)
    ==32657==  If you believe this happened as a result of a stack
    ==32657==  overflow in your program's main thread (unlikely but
    ==32657==  possible), you can try to increase the size of the
    ==32657==  main thread stack using the --main-stacksize= flag.
    ==32657==  The main thread stack size used in this run was 8388608.
    Segmentation fault

I'm not sure why you removed the warning in add_manager_options().  Did
it just get logged too much?  You could use VLOG_INFO_ONCE instead.

What's the reason for doing the update in two phases, first collecting
rows in get_remote_rows() and then updating them in the caller?  Does
this somehow work better than just iterating over the rows and modifying
them in a single pass?  It looks like this way actually involves more
code and more work, so I'm curious about the design motivation here.

I think that write_string_string_column() could use
ovsdb_datum_sort_assert(), since its caller never adds a duplicate key.

In update_remote_status(), I'm not sure that it's guaranteed that
'status' will be nonnull.  Are you sure about that?




More information about the dev mailing list