[ovs-dev] [PATCH] reconnect: Add graceful reconnect.
Dumitru Ceara
dceara at redhat.com
Wed Jul 21 10:58:18 UTC 2021
On 7/20/21 4:32 PM, Dumitru Ceara wrote:
> On 7/20/21 4:29 PM, Ilya Maximets wrote:
>> On 7/20/21 4:25 PM, Ilya Maximets wrote:
>>> On 7/20/21 3:39 PM, Dumitru Ceara wrote:
>>>> On 7/20/21 3:18 PM, Ilya Maximets wrote:
>>>>> On 7/13/21 5:03 PM, Dumitru Ceara wrote:
>>>>>> On 7/12/21 10:36 PM, Ilya Maximets wrote:
>>>>>>> On 6/29/21 1:20 PM, Dumitru Ceara wrote:
>>>>>>>> Until now clients that needed to reconnect immediately could only use
>>>>>>>> reconnect_force_reconnect(). However, reconnect_force_reconnect()
>>>>>>>> doesn't reset the backoff for connections that were alive long enough
>>>>>>>> (more than backoff seconds).
>>>>>>>>
>>>>>>>> Moreover, the reconnect library cannot determine the exact reason why a
>>>>>>>> client wishes to initiate a reconnection. In most cases reconnection
>>>>>>>> happens because of a fatal error when communicating with the remote,
>>>>>>>> e.g., in the ovsdb-cs layer, when invalid messages are received from
>>>>>>>> ovsdb-server. In such cases it makes sense to not reset the backoff
>>>>>>>> because the remote seems to be unhealthy.
>>>>>>>>
>>>>>>>> There are however cases when reconnection is needed for other reasons.
>>>>>>>> One such example is when ovsdb-clients require "leader-only" connections
>>>>>>>> to clustered ovsdb-server databases. Whenever the client determines
>>>>>>>> that the remote is not a leader anymore, it decides to reconnect to a
>>>>>>>> new remote from its list, searching for the new leader. Using
>>>>>>>> jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will
>>>>>>>> not reset backoff even though the former leader is still likely in good
>>>>>>>> shape.
>>>>>>>>
>>>>>>>> Since 3c2d6274bcee ("raft: Transfer leadership before creating
>>>>>>>> snapshots.") leadership changes inside the clustered database happen
>>>>>>>> more often and therefore "leader-only" clients need to reconnect more
>>>>>>>> often too. Not resetting the backoff every time a leadership change
>>>>>>>> happens will cause all reconnections to happen with the maximum backoff
>>>>>>>> (8 seconds) resulting in significant latency.
>>>>>>>>
>>>>>>>> This commit also updates the Python reconnect and IDL implementations
>>>>>>>> and adds tests for force-reconnect and graceful-reconnect.
>>>>>>>>
>>>>>>>> Reported-at: https://bugzilla.redhat.com/1977264
>>>>>>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
>>>>>>>> ---
>>>>>>>
>>>>>>> Hi, Dumitru.
>>>>>>>
>>>>>>
>>>>>> Hi Ilya,
>>>>>>
>>>>>>> Thanks for working on this issue. I've seen it in practice while running
>>>>>>> OVN tests, but I still don't quiet understand why it happens. Could you,
>>>>>>> please, describe how state transitioning work here for the ovsdb-idl case?
>>>>>>>
>>>>>>
>>>>>> Without the patch, assuming a current backoff of X seconds, the sequence
>>>>>> of events (even for a connection that has seen activity after backoff)
>>>>>> is:
>>>>>>
>>>>>> - reconnect_force_reconnect() -> move to S_RECONNECT
>>>>>> - ovsdb_cs_run()
>>>>>> -> jsonrpc_session_run()
>>>>>> -> reconnect_run()
>>>>>> -> reconnect_disconnected()
>>>>>> -> because state is not S_ACTIVE | S_IDLE backoff is not
>>>>>> changed, stays X.
>>>>>
>>>>> Hmm, I see. Thanks for explanation!
>>>>>
>>>>>>
>>>>>>>> +# Forcefully reconnect.
>>>>>>>> +force-reconnect
>>>>>>>> + in RECONNECT for 0 ms (2000 ms backoff)
>>>>>>>> + 1 successful connections out of 3 attempts, seqno 2
>>>>>>>> + disconnected
>>>>>>>> +run
>>>>>>>> + should disconnect
>>>>>>>> +connecting
>>>>>>>> + in CONNECTING for 0 ms (2000 ms backoff)
>>>>>>>
>>>>>>> Especially this part seems wrong to me. Because after 'should disconnect'
>>>>>>> there should be 'disconnect' of 'connect-fail', but not 'connecting'. We
>>>>>>> literally should disconnect here, otherwise it's a violation of the reconnect
>>>>>>> API. And my concern is that ovsdb-cs or jsonrpc violates the API somewhere
>>>>>>> by not calling reconnect_disconnectd() when it is required, or there is some
>>>>>>> other bug that makes 'reconnect' module to jump over few states in a fsm.
>>>>>>>
>>>>>>> The logical workflow for the force-reconnect, from what I see in the code
>>>>>>> should be:
>>>>>>>
>>>>>>> 1. force-reconnect --> transition to S_RECONNECT
>>>>>>> 2. run -> in S_RECONNECT, so returning RECONNECT_DISCONNECT
>>>>>>> 3. disconnect -> check the state, update backoff and transition to S_BACKOFF
>>>>>>> 4. run -> in S_BACKOFF, so returning RECONNECT_CONNECT
>>>>>>> 5. connected ....
>>>>>>
>>>>>> This is just a bug in the test case I added, I need to issue
>>>>>> "disconnect" in the test case to trigger reconnect_disconnected()
>>>>>> to be called (like ovsdb-cs does).
>>>>>>
>>>>>>>
>>>>>>> Something is fishy here, because ovsdb-cs somehow jumps over step #3 and
>>>>>>> maybe also #4.
>>>>>>
>>>>>> As mentioned above, it's not the case for ovsdb-cs.
>>>>>>
>>>>>> However, while checking the test case I realized that the patch will
>>>>>> cause all "graceful" reconnects to happen with an initial backoff of 2
>>>>>> seconds for long lived sessions (instead of 1s).
>>>>>>
>>>>>> That's because:
>>>>>>
>>>>>> reconnect_graceful_reconnect()
>>>>>> -> reconnect_reset_backoff__()
>>>>>> -> session was active after the initial 'backoff' seconds, so
>>>>>> reset backoff to minimum.
>>>>>> -> reconnect_transition__(..., S_RECONNECT);
>>>>>> ovsdb_cs_run()
>>>>>> -> jsonrpc_session_run()
>>>>>> -> reconnect_run()
>>>>>> -> reconnect_disconnected():
>>>>>> -> if (!reconnect_reset_backoff__(..)) then double backoff.
>>>>>>
>>>>>> I see a couple of potential ways to fix that:
>>>>>> 1. reconnect_graceful_reconnect() could bump fsm->backoff_free_tries to
>>>>>> allow a single backoff free reconnect.
>>>>>> 2. reconnect_graceful_reconnect() could reset fsm->backoff to 0 if it
>>>>>> was active recently (instead of calling reconnect_reset_backoff__()).
>>>>>>
>>>>>> However, this is all under the assumption that we want to add support
>>>>>> for two types of reconnect "semantics":
>>>>>>
>>>>>> a. forced, e.g., when the client detects inconsistencies in the data
>>>>>> received from the server (the server is in "bad shape") in which we
>>>>>> should never reset the backoff.
>>>>>>
>>>>>> b. graceful, e.g., when the client reconnects because it needs a
>>>>>> leader-only connection and a leadership transfer happened on the
>>>>>> server side (the server is likely in "good shape" just not a leader
>>>>>> anymore).
>>>>>>
>>>>>> An alternative to all this is to change reconnect_disconnected() to
>>>>>> reset backoff for connections in states S_ACTIVE | S_IDLE | S_RECONNECT
>>>>>> (used to be just active and idle).
>>>>>>
>>>>>> I guess we could treat this as a bug fix and continue discussion for a
>>>>>> separate follow up patch to add the "graceful vs force" semantics if
>>>>>> they turn out to make sense for the future.
>>>>>>
>>>>>> In essence (excluding potential python and test changes) the patch would
>>>>>> become:
>>>>>>
>>>>>> diff --git a/lib/reconnect.c b/lib/reconnect.c
>>>>>> index a929ddfd2d..3a6d93f9d1 100644
>>>>>> --- a/lib/reconnect.c
>>>>>> +++ b/lib/reconnect.c
>>>>>> @@ -385,7 +385,7 @@ reconnect_disconnected(struct reconnect *fsm, long long int now, int error)
>>>>>> if (fsm->backoff_free_tries > 1) {
>>>>>> fsm->backoff_free_tries--;
>>>>>> fsm->backoff = 0;
>>>>>> - } else if (fsm->state & (S_ACTIVE | S_IDLE)
>>>>>> + } else if (fsm->state & (S_ACTIVE | S_IDLE | S_RECONNECT)
>>>>>> && (fsm->last_activity - fsm->last_connected >= fsm->backoff
>>>>>> || fsm->passive)) {
>>>>>> fsm->backoff = fsm->passive ? 0 : fsm->min_backoff;
>>>>>> ---
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> The problem I see with adding S_RECONNECT to this condition is that
>>>>> 'last_activity' and 'last_connected' doesn't belong to current
>>>>> connection in case we're forcefully interrupting connection in a
>>>>> S_CONNECTING state. So, we will re-set backoff based on outdated
>>>>> information. Also, I think, in this case, 'last_connected' might
>>>>> be larger than 'last_activity' and we will have a negative value here.
>>>>
>>>> You're right, I missed this.
>>>>
>>>>> All values are signed, so it should not be an issue, but it's not
>>>>> a clean solution.
>>>>
>>>> It's not, I agree.
>>>>
>>>>>
>>>>> Bumping the number of free tries seems like a better solution to me,
>>>>> because:
>>>>>
>>>>> 1. I'm not sure that we need 2 types of reconnect. I mean, backoff
>>>>> is intended to avoid overloading the already overloaded server with
>>>>> connection attempts. In case of forceful re-connection the server
>>>>> is not overloaded, so we should not increase a backoff.
>>>>> IMO, backoff should only be involved in cases where we have
>>>>> problems on the server side related to load, not the leader-only
>>>>> stuff or even database inconsistency problems.
>>>>>
>>>>> 2. force-reconnect should not consume free tries for exactly same
>>>>> reasons as in point 1. Free tries are to avoid backoff, but
>>>>> backoff exists only to avoid storming the remote server with
>>>>> connections while it's overloaded.
>>>>> No overload -> No need for backoff -> No need to consume free tries.
>>>>>
>>>>> 3. With bumping of free tries we're avoiding logical issues around
>>>>> using 'last_activity' and 'last_connected' from the old connection.
>>>>>
>>>>> 4. Faster reconnection, since backoff will be zero. And that is
>>>>> fine, because as far as we know, the server is not overloaded,
>>>>> it's just not suitable for our needs for some reason.
>>>>> If it is overloaded, we will backoff after the first failed
>>>>> connection attempt.
>>>>
>>>> This is even better (instead of min_backoff, I mean).
>>>>
>>>>>
>>>>> One potential problem I see with this solution is that if there
>>>>> are no servers on a list that are suitable for ovsdb-cs, we will
>>>>> have a log full of annoying reconnect logs. Backoff kind of
>>>>> rate-limits ovsdb-cs right now and we have no internal rate-limit
>>>>> inside ovsdb-cs. This looks more like an issue of ovsdb-cs and
>>>>> not the issue of reconnect module itself. But we need to have a
>>>>> solution for this.
>>>>>
>>>>> And actually, the problem here is that from the jsonrpc point of
>>>>> view the connection is perfectly fine and functional, but ovsdb-cs
>>>>> implements high-level logic on top of it and decides that
>>>>> connection is not good on a much later stage based on the actual
>>>>> received data. So, we need to somehow propagate this information
>>>>> from ovsdb-cs down to reconnect module or allow ovsdb-cs to control
>>>>> the state machine, otherwise we will need two separate backoffs:
>>>>
>>>> That's what I was trying to implement with the "graceful reconnect:
>>>> allow ovsdb-cs to decide how to reconnect. But there are corner cases
>>>> like the ones pointed out during the review of this patch.
>>>>
>>>>> one inside the reconnect for usual connection problems and one
>>>>> in ovsdb-cs for high-level data inconsistencies and leader changes.
>>>>>
>>>>> Thinking this way led me to a different solution. We could expose
>>>>> something like jsonrpc_session_set_backoff_free_tries() and allow
>>>>> ovsdb-cs to make a decision. E.g.:
>>>>>
>>>>> diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
>>>>> index f13065c6c..900597b96 100644
>>>>> --- a/lib/ovsdb-cs.c
>>>>> +++ b/lib/ovsdb-cs.c
>>>>> @@ -729,6 +729,17 @@ void
>>>>> ovsdb_cs_force_reconnect(struct ovsdb_cs *cs)
>>>>> {
>>>>> if (cs->session) {
>>>>> + if (cs->state == CS_S_MONITORING) {
>>>>> + /* The ovsdb-cs was in MONITORING state, so we either had data
>>>>> + * inconsistency on this server, or it stopped being the cluster
>>>>> + * leader, or the user requested to re-connect. Avoiding backoff
>>>>> + * in these cases, as we need to re-connect as soon as possible.
>>>>> + * Connections that are not in MONITORING state should have their
>>>>> + * backoff to avoid constant flood of re-connection attempts in
>>>>> + * case there is no suitable database server. */
>>>>> + jsonrpc_session_set_backoff_free_tries(
>>>>> + cs->session, jsonrpc_session_get_n_remotes(cs->session));
>>>>> + }
>>>>> jsonrpc_session_force_reconnect(cs->session);
>>>>> }
>>>>> }
>>>>> ---
>>>>>
>>>>> This way, if the server looses leadership or inconsistency detected,
>>>>> the client will have 3 free attempts to find a new suitable server.
>>>>> After that it will start to backoff as it does now. No changes in
>>>>> reconnect module required.
>>>>>
>>>>> Thoughts?
>>>>
>>>> This works for me. I just have a question regarding the new API: should
>>>> we allow jsonrpc users to set the free tries to any value or shall we
>>>> make it more strict, e.g., jsonrpc_session_reset_backoff_free_tries(),
>>>> which would reset the number of free tries to 'n_remotes'?
>>
>> And it should, probably, set number of free tries to n_remotes + 1,
>> because it will count current disconnection as an attempt.
>>
>
> Makes sense.
>
>>>
>>> jsonrpc_session_reset_backoff_free_tries() seems better, because ovsdb-cs
>>> doesn't actually set number of free tries for jsonrpc from the start.
>>> Maybe we can name it just jsonrpc_session_reset_backoff() ? I'm not
>>> sure, but I just don't like this very long name.
>>>
>>>>
>>>> Will you be sending a patch or shall I add your "Suggested-by"?
>>>
>>> I'm OK with "Suggested-by".
>>> Would be also great to have some type of a test for this functionality.
>>>
>
> Let me try to come up with something. I'll post a v2 these days.
>
v2 available at:
https://patchwork.ozlabs.org/project/openvswitch/patch/20210721105614.25498-1-dceara@redhat.com/
Thanks,
Dumitru
More information about the dev
mailing list