[ovs-dev] [PATCH 3/5] ovsdb-idl: IDL Compound Indexes Implementation

Wed Apr 6 20:22:58 UTC 2016

> From: Ben Pfaff [mailto:blp at ovn.org]
> Sent: martes, 22 de marzo de 2016 14:55
> To: Rodriguez Betancourt, Esteban <estebarb at hpe.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 3/5] ovsdb-idl: IDL Compound Indexes 
> Implementation
> 
> On Wed, Mar 09, 2016 at 12:03:54AM +0000, Rodriguez Betancourt, 
> Esteban
> wrote:
> > In the C IDL, allows to create multicolumn indexes in the tables, 
> > that are keep synched with the data in the replica.
> >
> > Signed-off-by: Esteban Rodriguez Betancourt <estebarb at hpe.com>
> 
> Until now, ovsdb-idl.h has been divided into sections separated by 
> page- breaks (control-L), and each section has had a comment 
> explaining basically what that section describes.  Some of the 
> sections, like the one for transactions, has a lot of detail.  This 
> commit adds a new section, but it doesn't add a page break or an 
> explanatory comment.  I think it should do both.  The design document 
> from patch 1 might be a good place from which to draw for the comment.

Changed in v2 patch.

> This patch and patch 2 has a couple of typedefs for function pointers.
> CodingStyle.md says:
> 
>       A function type is a good use for a typedef because it can clarify
>     code.  The type should be a function type, not a pointer-to-function
>     type.  That way, the typedef name can be used to declare function
>     prototypes.  (It cannot be used for function definitions, because that
>     is explicitly prohibited by C89 and C99.)

Changed in v2 patch.

> Do you think that the sorting order in struct ovsdb_idl_index is 
> useful, that is, do you see a common use for indexes in descending 
> order?  If indexes in descending order are only rarely useful, then 
> they could be implemented through a custom comparison function.

This is more needed by tools that expose the data to a user, and the user wants to sort the data in different ways.
Also this enables the option to have in a single index some columns in one direction and other columns in the other direction.

> I don't understand, in ovsdb_idl_index_compare(), why NULL != NULL.  
> It seems to me that this is risky and likely to cause confusion, if 
> not now then later.
Changed in v2 patch.

> I don't understand why indexes have string names.  It seems like kind 
> of a curious design.  Why isn't the client expected to retain a 
> pointer to the index that it wants to traverse, instead of a name?

One reason is to prevent the developer from modifying the values of the index by
referencing it using a string instead of giving direct access to the index data structure.
The other one is to allow a more intuitive way to refer to the indexes, and to
select them when programming. 

> Please use xmalloc() instead of malloc().
Changed in v2 patch.

> I found myself wondering whether there's value in doing dynamic 
> allocation of the three parallel arrays inside struct ovsdb_idl_index.
> It would be very unusual to have an index over more than, say, 3 
> columns, and so it might be easier to have fixed-size arrays of 3 
> elements (or a few more, to allow room for expansion).  I imagine that 
> the columns in indexes will be fixed at build time, after all.
> 
> ovsdb_idl_index_add_column() seems to be working hard to avoid using 
> xrealloc(), but I don't know why.

Changed in v2 patch. The indexes support any number of "columns" for doing the comparison. But that number isn't limited to the number of columns in a row, because eventually one column can have several values indexed, for example, different keys from a map column.

For some "queries", the need of filtering the data first raise the number of columns need to 5 or more.

> I don't understand why only certain columns have a default comparison 
> function.  Why not use ovsdb_datum_compare_3way() as the default for 
> every column?  I guess that the ovsdb_datums and ovsdb_type would have 
> to be passed into column_comparator instead of a pair of void * 
> pointers, but that would arguably be an improvement anyhow, certainly 
> from a type- safety viewpoint.

We decided against using the datums for the comparison because it seems intuitive that the parsed values in the ovsrec are easier and faster to compare.

In some cases, a table could have a really really high number of rows, and receive a lot of updates per second, so is important that the comparison functions are fast.

> Do you have an example of a useful use for a custom comparison function?

Sorting a string column, that contains IPv4 or IPv6 values.
Sort an specific member inside a map column Sort a string as a number Sort strings as dates Sort by a value contained in a referenced row

> Did you consider defining a struct of three members instead of using 
> three parallel arrays in struct ovsdb_idl_index?  It might well be 
> cleaner, and certainly would be for memory allocation issues.

Changed in v2 patch

> It looks like the clients don't actually need to see the definition of 
> struct ovsdb_idl_index, so perhaps it should be moved into ovsdb-idl-provider.h.

Changed in v2 patch

> This patch has some minor coding style issues; I suggest reading 
> CodingStyle.md.

Changed in v2 patch

> The generic comparison functions takes 'row_sync' into account.  Do 
> custom comparison functions also need to take 'row_sync' into account?
> (Why or why not?)

No they don't. The row_sync is used exclusively by the generic comparison function.
More about it below.

> It looks like row_sync is needed because the skiplist data structure 
> implemented in the previous patch does not tolerate insertion of duplicates.
> That seems like a serious drawback for a data structure that is 
> supposed to be used as an index that may contain duplicates.
> Did you consider implementing a skip list (or other balanced tree
> structure) that gracefully supports duplicates?  That would be a much 
> more straightforward solution to the problem.

Allowing duplicates in the skiplist would make difficult to have a fast mechanism for deleting/modifying rows.

If the "key" is duplicated then we would need to iterate over all the equal keys, until we found the one with the value (pointer to row) that we want. If the "key"
is uncommon (e.g.: an UUID) that couldn't be very bad, but if the "key" is common (e.g.: a Boolean) each update/deletion of a row would need to iterate over many rows, potentially thousands.

That would be O(n), not O(log(n)). The row_sync solution was a balance between good performance for compound indexes, and adding a data structure for general use.

> Thanks,
> 
> Ben.

I'm going to send the updated patches soon, that takes in consideration that observations made here and in the responses to the other patches. The pull request was updated as well.

Thanks,
Esteban