[ovs-dev] [PATCH ovn] [RFC] Add chassis liveness monitoring mechanism

Wed Mar 11 17:31:27 UTC 2020

On Wed, Mar 11, 2020 at 6:41 AM Lucas Alvares Gomes <lucasagomes at gmail.com>
wrote:
>
> Hi Han,
>
> Thank you very much for the feedback, much appreciated!
>
> On Wed, Mar 11, 2020 at 7:06 AM Han Zhou <hzhou at ovn.org> wrote:
> >
> > Thanks Lucas for working on this! Please see my comments below.
> >
> > On Thu, Feb 20, 2020 at 1:56 AM Lucas Alvares Gomes <
lucasagomes at gmail.com> wrote:
> > >
> > > Thanks for the review Dumitry!
> > >
> > > On Thu, Feb 20, 2020 at 9:19 AM Dumitru Ceara <dceara at redhat.com>
wrote:
> > > >
> > > > On 2/19/20 4:37 PM, lmartins at redhat.com wrote:
> > > > > From: Lucas Alvares Gomes <lucasagomes at gmail.com>
> > > > >
> > > > > NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK
ABOUT
> > > > > THE IDEA PRIOR TO WRITTING TESTS TO IT.
> > > > >
> > > > > CMSes integrating with OVN often uses the nb_cfg mechanism as a
way to
> > > > > check the health status of the ovn-controller process but, the
current
> > > > > implementation isn't ideal for that purpose because it floods the
control
> > > > > plane with update notifications every time the nb_cfg value is
> > > > > incremented.
> > > > >
> > > > > This patch is merging two ideas:
> > > > >
> > > > > 1) Han Zhou proposed a patch [0] creating a table called
Chassis_Private
> > > > >    where each hypervisor *only* writes and monitors its own
record to
> > > > >    avoid this flooding problem.
> > > > >
> > > > > 2) Having ovn-controller to periodically log that it's alive
instead of
> > > > >    relying on the nb_cfg mechanism.
> > > > >
> > > > > By using this mechanism, a CMS can more easily just read this new
> > > > > Chassis_Private table and figure out the status of each Chassis
> > > > > (ovn-controller) in the cluster.
> > > > >
> > > >
> > > > Hi Lucas,
> > > >
> > > > Thanks a lot for working on this!
> > > >
> > > > > Here's some reasons why I believe this approach is better than
having
> > > > > to bump the  nb_cfg value:
> > > > >
> > > > > 1) Simple integration. Before, the CMS had to increment the nb_cfg
> > > > >    value in the NB_Global table and wait for it to be propagated
to the
> > > > >    Southbound database (ovn-northd and then ovn-controllers)
Chassis
> > > > >    table in order to know the status of the Chassis.
> > > > >
> > > > >    Now, it can just read the Chassis_Private table and compare the
> > > > >    alive_at column against the current time.
> >
> > I think the nb_cfg mechanism doesn't have to wait. There can be one
monitor updating the counter periodically, and any other worker can compare
the desired value and each chassis's value and see which ones are stale,
e.g. if the gap is bigger than 3, meaning the chassis missed 3 round of
heartbeat.
> > But it is true that it is more costly because there are 2 directions of
communications. It may or may not be a concern, depending on the scale and
the frequency of heartbeat.
> >
>
> That's true, it can work as you suggested.
>
> In the OVN neutron driver we relaxed the checks to account for some
> miss synchronization [0] and added some code to guard against the
> frequency of the checks as well [1].
>
> [0] https://review.opendev.org/#/c/703612/
> [1] https://review.opendev.org/#/c/704530/
>
> > > >
> > > > I guess the only comment I have about this approach is that it
doesn't
> > > > feel completely OK to store a timestamp representation as string in
> > > > "alive_at". I'm afraid this might be too inflexible and force CMSes
to
> > > > compare their local time with the value in this column.
> > > >
> > > > I know we discussed offline that using a counter that is
periodically
> > > > incremented by ovn-controller instead of a timestamp would
complicate
> > > > the implementation in Openstack but maybe other people on this
mailing
> > > > list have ideas on how to deal with this in a more generic way.
> > > >
> > >
> > > Agreed, let's see if people have more ideas here.
> > >
> > > Also, let me expand my explanation a little. The reason why I think
> > > having a counter is not ideal is because services would need to track
> > > this number to know what the value was before and then compare with
> > > the current value to figure out whether it's being incremented or not.
> > >
> > > In OpenStack, we have multiple API workers spread across the
> > > controllers nodes and the API request to check for the services'
> > > health status will land on any of those workers. Which means that I
> > > can't keep the track of the counter in-memory. Mostly likely, I would
> > > need to set an IDL event on the field being incremented and store the
> > > status of those chassis somewhere else accessible by all API workers.
> > >
> > > The advantage with the nb_cfg mechanism compared with the incremenal
> > > counter is that those values are already in the OVSDB, so we can
> > > compare what's in the NB DB with the SB DB to figure whether the
> > > services are alive or not. But, the price is that we need more code to
> > > deal with waiting for the synchronization of the OVSDBs and, if we
> > > move it to the Chasiss_Private table it won't be backwards compatible.
> > >
> > > Therefore I think a timestamp would be better. It's easy to understand
> > > by either a service or a even a person. When u look at the alive_at
> > > field u know the last time the service signilized it was alive. For
> > > CMSes, the service can just read the Chassis_Private table and compare
> > > the alive_at field with the current time to figure the status of the
> > > service (right now it's UTC, but, we can change it to include the TZ
> > > info if people think it's easier). No writes, sync issues and also
> > > it's backwards compatible with the nb_cfg approach.
> > >
> > > > >
> > > > > 2) Less costy. Just one read from the db is needed, no writing.
Code
> > > > >    using the nb_cfg mechanism had to implement a few safe-guard
code to
> > > > >    make less error prone. See [1] and [2] for example.
> > > > >
> > > > > 3) Backwards compatibility. The patch [0] was moving the nb_cfg
value
> > > > >    to new table so, systems relying on it would need to update
their
> > > > >    code upon updating OVN.
> > > >
> > I have a different opinion on the backward compatibility. It is true
that moving the nb_cfg to a new table may cause compatibilty issue.
However, I think it shouldn't be a problem in reality, mainly because that
the flooding problem of nb_cfg today prevents it being used in live
producation environment.
> > In fact, the nb_cfg mechanism has it's unique capability of performing
a "control plane" ping, and it can be utilized to measure the end-to-end
latency of the whole control plane, i.e. when there is a change in NB, how
long it takes for all chassises to enforce that change. It is very useful
in scalability evaluation. Because of this, I strongly suggest to move the
nb_cfg mechanism to chassis_private table as well. I think it is not
harmful to have both mechanism co-exist (but both should be in
chassis_private table to avoid causing flooding problem), and each would
have its own advantages in different use cases. We could even combine both
approaches:
> >
> > - Whenever nb_cfg is updated, each chassis should update its own record
in SB with the same value. When updating this value, it updates the
timestamp column as well, which could tell the latency of the specific
chassis for handling the update.
> >
> > - If nb_cfg is not update, but chassis_liveness_interval is set, each
chassis should update the timestamp only (the nb_cfg value in its own
record should keep the same as the current NB nb_cfg value. If for some
reason they are different, then it should be updated taking this chance).
> >
> > What do you think?
> >
>
> I think you've made great points, I haven't thought about the latency
> before and it seems pretty useful for that indeed.
>
> Regarding the suggestions, I like it but I don't want to create an
> over-engineered mechanism. One of the main reasons why I've thought
> about using timestamps instead of the nb_cfg was the backward
> compatibly issue, but as you've said the nb_cfg mechanism at the
> moment is pretty much unusable in production so maybe we shouldn't
> care too much about the backward compatibility and fix that mechanism
> instead.
>
> Moving the nb_cfg mechanism to the Chassis_Private table should be
> enough to fix our problems and the timestamp idea can be discarded
> (less knobs to configure OVN).
>
> If you don't mind, I will rebase your Chassis_Private patch against
> master and re-submit it.
>
Sounds good to me.

Thanks,
Han