[ovs-dev] [PATCH] raft: Avoid sending equal snapshots.

Fri Jun 5 14:54:57 UTC 2020

On 5/28/20 7:06 PM, Ilya Maximets wrote:
> On 5/23/20 8:36 PM, Han Zhou wrote:
>>
>>
>> On Sat, May 23, 2020 at 10:34 AM Ilya Maximets <i.maximets at ovn.org <mailto:i.maximets at ovn.org>> wrote:
>>>
>>> Snapshots are huge.  In some cases we could receive several outdated
>>> append replies from the remote server.  This could happen in high
>>> scale cases if the remote server is overloaded and not able to process
>>> all the raft requests in time.  As an action to each outdated append
>>> reply we're sending full database snapshot.  While remote server is
>>> already overloaded those snapshots will stuck in jsonrpc backlog for
>>> a long time making it grow up to few GB.  Since remote server wasn't
>>> able to timely process incoming messages it will likely not able to
>>> process snapshots leading to the same situation with low chances to
>>> recover.  Remote server will likely stuck in 'candidate' state, other
>>> servers will grow their memory consumption due to growing jsonrpc
>>> backlogs:
>>
>> Hi Ilya, this patch LGTM. Just not not clear about this last part of the commit message. Why would remote server stuck in 'candidate' state if there are pending messages from leader for it to handle? If the follower was busy processing older messages, it wouldn't have had a chance to see election timer timeout without receiving heartbeat from leader, so it shouldn't try to start voting, right? Otherwise:
>>
>> Acked-by: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
> 
> Thanks!  Applied to master.

As agreed during OVN weekly irc meeting, I also backported this fix
to branch-2.13.

Best regards, Ilya Maximets.