[ovs-dev] [PATCH v3] ovs-vsctl: Improve error reporting

Fri Mar 28 03:29:54 UTC 2014

On Thu, Mar 27, 2014 at 05:10:31PM +0100, Thomas Graf wrote:
> From: Andy Zhou <azhou at nicira.com>
> 
> ovs-vsctl is a command-line interface to the Open vSwitch database,
> and as such it just modifies the desired Open vSwitch configuration in
> the database.  ovs-vswitchd, on the other hand, monitors the database
> and implements the actual configuration specified in the database.
> This can lead to surprises when the user makes a change to the
> database, with ovs-vsctl, that ovs-vswitchd cannot actually
> implement. In such a case, the ovs-vsctl command silently succeeds
> (because the database was successfully updated) but its desired
> effects don't actually take place. One good example of such a change
> is attempting to add a port with a misspelled name (e.g. ``ovs-vsctl
> add-port br0 fth0'', where fth0 should be eth0); another is creating
> a bridge or a port whose name is longer than supported
> (e.g. ``ovs-vsctl add-br'' with a 16-character bridge name on
> Linux). It can take users a long time to realize the error, because it
> requires looking through the ovs-vswitchd log.
> 
> The patch improves the situation by checking whether operations that
> ovs executes succeed and report an error when
> they do not.  This patch only report add-br and add-port
> operation errors by examining the `ofport' value that
> ovs-vswitchd stores into the database record for the newly created
> interface.  Until ovs-vswitchd finishes implementing the new
> configuration, this column is empty, and after it finishes it is
> either -1 (on failure) or a positive number (on success).
> 
> Signed-off-by: Andy Zhou <azhou at nicira.com>
> Signed-off-by: Thomas Graf <tgraf at redhat.com>
> 
> ---
> V3: Fixed tests/ovs-vsctl.at to expect improved error message
> V2: Improved error message to indicate error occurs during setup and
>     that additional details can be found in the log.

I found this patch inspiring.  For some reason I always thought that
implementing this behavior would involve a lot of work, and a
significant amount of tricky code, but this patch demonstrated that it
isn't that bad.

I learned a new word, "neoteric".

On the other hand, I have a lot of actual comments, but on the third
hand I'm going to post a v4 that incorporates those comments in a
minute.

So here are the comments.

I'm afraid that checking for success on an interface based on just the
name of the interface could have false positives in cases where clients
are adding and deleting interfaces, that might be different, with common
names.  These cases aren't common, but it's possible to avoid them by
using the UUID of the inserted row, as returned by the database
transactions, instead of the name of the row, and it's not difficult to
do it that way, so I'd prefer to do it that way.  That's the bulk of
what I changed.

When we use UUID instead of name, there's no need to "unexpect"
anything, since the database can tell us that the rows were never
actually constructed, which simplifies things slightly.

I don't think it's a good idea to actually exit with an error code,
because it's a change in behavior that could break scripts.  For
example, I bet that some people use "ovs-vsctl add-port br0 <name>" and
then separately "ovs-vsctl set interface <name> type=gre options:...".
Even though it would be better to add and configure in a single command,
it used to work fine, with exit code 0, to do add in one and configure
in another, and I wouldn't want to break that if the script checks exit
status (e.g. with "set -e").

I improved the log message slightly.