[ovs-dev] [RFC 1/3] OVN: introduce Controller_Event table
Lorenzo Bianconi
lorenzo.bianconi at redhat.com
Wed May 29 14:05:07 UTC 2019
> On Thu, May 16, 2019 at 06:05:24PM +0200, Lorenzo Bianconi wrote:
> > Add Controller_Event table to OVN SBDB in order to
> > report CMS related event.
> > Introduce event_table hashmap array and controller_event related
> > structures to ovn-controller in order to track pending events
> > forwarded by ovs-vswitchd. Moreover integrate event_table hashmap
> > array with event_table ovn-sbdb table
> >
> > Signed-off-by: Mark Michelson <mmichels at redhat.com>
> > Co-authored-by: Mark Michelson <mmichels at redhat.com>
> > Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi at redhat.com>
Hi Ben,
thanks for the review, few comments inline.
Regards,
Lorenzo
>
> This seems like one potentially reasonable way for ovn-controller to
> report events to the CMS. I want to point out some design aspects that
> might need to be considered:
>
> 1. Controller_Event doesn't include a column to attribute an event to a
> particular Chassis. This might be important for accounting: how can
> an ovn-controller tell which events it owns and should eventually
> delete? Currently it looks like every ovn-controller always iterates
> over every Controller_Event; is that really desirable and scalable?
> Such a column would also allow update_sb_monitors() to monitor only
> the rows associated with the current chassis, increasing scalability.
ack, I will add it in v2
>
> Keep in mind that ovn-controller probably can't just keep in memory
> which events belong to it, because it might be restarted and needs to
> be able to recover that information.
>
> 2. Are these events (or each individual event, anyway) single-consumer?
> I believe that this design only works for single-consumer events
> because only one consumer can mark them as 'handled'.
Yes, I guess a single-consumer approach would work, at least for the
architecture we were thinking about
>
> 3. Is there value in having 'handled', instead of just having the CMS
> delete rows that it has processed? Using 'handled' requires an extra
> round-trip between ovn-controller and the database.
The main reason is to manage the Controller_Event rows in the controller, but I
guess we can delete 'handled' rows even in the CMS. I will fix it in v2
>
> 4. What is the tolerance for events that are never delivered or that are
> delivered more than once? What can actually be guaranteed, given
> that the database can die and that ovn-controller can die? (Also,
> OVSDB transactions cannot guarantee exactly-once semantics in corner
> cases unless the transactions are idempotent.)
If the ovn-controller dies I think there is no too much we can do, events will
be lost until the controller restarts properly.
If ovn-northd or the connection to the db dies, controller_event_run() will not
manage the Controller_Event table and pinctrl_handle_event() will queue the
pending events in the event_table hash until the upper limit is reached.
We can probably add a garbage collector for the pending events in the table.
What do you think?
>
> 5. How is the number of per-controller rows limited?
If it enough to add a limit to the controller hash size
(https://github.com/LorenzoBianconi/ovs/blob/ovn_unidling/ovn/controller/pinctrl.c#L3488)
or do you mean to add a limit to per-controller rows in the Controller_Event
table?
>
> I think the workflow for these events should be clearly specified. I
> guess it's something like this:
>
> 1. ovn-controller detects that an event has occurred and adds a
> corresponding row to the Controller_Event table with false 'handled'.
>
> 2. CMS consumes the row and acts on it.
>
> 3. CMS changes 'handled' to true.
>
> 4. ovn-controller deletes row.
>
> although I think that "3. CMS deletes row." seems more straightforward.
>
> Did you consider the inter-version compatibility issues of making
> event_type an enum? It will force an upgrade order of first ovn-northd,
> then the database schema, then ovn-controller, because any other order
> will make it possible that ovn-controller tries to add an invalid event
> type (which ovsdb-server will reject) or that an event that ovn-northd
> doesn't understand nevertheless reaches it. However,
> Documentation/intro/install/ovn-upgrades.rst recommends a different
> order. In any case the upgrade implications need to be considered.
I guess there are no compatibility issues during upgrade, since:
- if we upgrade the ovn-northd first and then the controller 'unknown' events
will not be forwarded by ovn-controller
- if we upgrade ovn-controller first and then ovn-northd 'unknown' events will
not be forwarded since the related logical flows are missing in ovn-northd
Am I missing something?
>
> Some typo fixes:
applied, thx
>
> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> index ae5225e18fd0..a010c95b221b 100644
> --- a/ovn/ovn-sb.xml
> +++ b/ovn/ovn-sb.xml
> @@ -3476,8 +3476,8 @@ tcp.flags = RST;
> </table>
> <table name="Controller_Event" title="Controller Event table">
> <p>
> - Database table used by ovn-controller to report CMS related
> - events
> + Database table used by <code>ovn-controller</code> to report CMS related
> + events. The workflow for event processing is:
> </p>
> <column name="event_type"
> type='{"type": "string", "enum": ["set", ["empty_lb_backends"]]}'>
> @@ -3485,7 +3485,7 @@ tcp.flags = RST;
> </column>
> <column name="event_info">
> <p>
> - Key-value pairs used to spcify evento info to the CMS.
> + Key-value pairs used to specify event info to the CMS.
> Possible values are:
> </p>
> <ul>
> @@ -3498,7 +3498,7 @@ tcp.flags = RST;
> <code>empty_lb_backends</code> event
> </li>
> <li>
> - <code>load_balancer</code>: UUID fo the load balancer reported for
> + <code>load_balancer</code>: UUID of the load balancer reported for
> the <code>empty_lb_backends</code> event
> </li>
> </ul>
More information about the dev
mailing list