[ovs-dev] [PATCH 1/2] bond: Fix broken rebalancing after link state changes.

Mon Jun 7 11:01:33 UTC 2021

On 8/2/17 11:09 PM, Andy Zhou wrote:
> On Thu, Jul 20, 2017 at 10:21 AM, Ilya Maximets <i.maximets at samsung.com> wrote:
>> There are 3 constraints for moving hashes from one slave to another:
>>
>>         1. The load difference is larger than ~3% of one slave's load.
>>         2. The load difference between slaves exceeds 100000 bytes.
>>         3. Moving of the hash makes the load difference lower by > 10%.
>>
>> In current implementation if one of the slaves goes DOWN state, all
>> the hashes assigned to it will be moved to other slaves. After that,
>> if slave will go UP it will wait for rebalancing to get some hashes.
>> But in case where we have more than 10 equally loaded hashes it
>> will never fit constraint #3, because each hash will handle less than
>> 10% of the load. Situation become worse when number of flows grows
>> higher and it's almost impossible to migrate any hash when all the
>> 256 hash entries are used which is very likely when we have few
>> hundreds/thousands of flows.
>>
>> As a result, if one of the slaves goes down and up while traffic
>> flows, it will never be used again for packet transmission.
>> Situation will not be fixed even if we'll stop traffic completely
>> and start it again because first two constraints will block
>> rebalancing on the earlier stages while we have low amount of traffic.
>>
>> Moving of one hash if destination has no hashes as it was before
>> commit c460a6a7bc75 ("ofproto/bond: simplify rebalancing logic")
>> will not help because having one hash isn't enough to make load
>> difference less than 10% of total load and this slave will
>> handle only that one hash forever.
>>
>> To fix this lets try to move few hashes simultaniously to fit
>> constraint #3.
> 
> Thanks for working on this.

Sorry for not replying for almost 4 years. :)

And sorry for resurrecting the thread, but the issue still exists and
I think that we still need to fix it.  The first patch needs a
minor rebase, but it still works fine.  The test in the second patch
is still valid.

> 
> Current interface tries to only rebalance one hash bucket at a time.
> There are two reasons I can think of:
> 
> In general, It is good idea to keep flow binding stable. Moving flow
> introduces side effects, such as packet re-ordering, that impacts
> application performances significantly.

Yes.  I agree with that.  We should try to minimize moving of flows.
Current algorithm moves hashes until difference between heaviest
member and the lightest one is more than ~3% or it's more than ~1Mbps
and there is at least one hash that could be moved to make the load
difference better by 10%.  That is, in general, a lot of hash migrations.
New algorithm keeps all the previous restrictions, but allows to move
more than 1 hash at a time to fulfill the last 10% restriction.
It's hard to tell if it will result in more movements in general case.
But, I think, we can tune it better to keep as close to the current one
as possible. See below. 

> 
> The second reason is the bandwidth each flow consumes tends to
> be more dynamic in practice.  It is usually pointless to use a snapshot
> of bandwidth consumption to come up a "perfect" bond port binding.
> For example, TCP flows rate adapt to available bandwidth.  It is
> probably better to make smaller changes in each rebalancing step.

Yes, I agree with this statement too.  But new algorithm doesn't try
to create a perfect distribution.  It stops collecting hashes for
migration when distribution is close enough.  In this patch it's 10%
difference with the "ideal" value.  I agree that this might be too
much.  We can, probably, tune this value.  For example, 90% difference
with "ideal" for a single movement will be somewhat close to the
current algorithm as we will move hashes that reduces the difference
by 10%.   I'm afraid, though, that this will result in much more
movements, so it might make sense to settle somewhere in the middle,
e.g. 40-50% of improvement for a single migration.  This should be
fair enough.

Thoughts?

> 
> Is it still possible to address the issue mentioned while maintaining
> current interface? Can we find some way to break the tie in case current
> algorithm fail to nominate any bucket to move?

It's hard to tell.  I've been thinking about this for a long time
before sending this patch and didn't came up with anything elegant.
It would be easy to identify this kind of case where all hashes are
smaller than 10% of the difference, but the difference itself is high,
and use the new algorithm only for this case, but I suspect that in
this case we will always move hashes one by one several times with
the old algorithm and use the new one on the last iteration to move
a bunch of smaller hashes.  The end result might be not any different
than just running new algorithm from the beginning.  Also this schema
will take much more processing time than just running a new algorithm
from the start.

For now I'm going to rebase this patch-set and, probably, change the
moving threshold from 10% to 40-50%, so we will not move too many
small hashes at a time.  Will send v2.

Best regards, Ilya Maximets.