[ovs-dev] [PATCH 1/2] bond: Fix broken rebalancing after link state changes.
blp at ovn.org
Thu Jun 10 23:24:15 UTC 2021
On Mon, Jun 07, 2021 at 01:01:33PM +0200, Ilya Maximets wrote:
> On 8/2/17 11:09 PM, Andy Zhou wrote:
> > On Thu, Jul 20, 2017 at 10:21 AM, Ilya Maximets <i.maximets at samsung.com> wrote:
> >> There are 3 constraints for moving hashes from one slave to another:
> >> 1. The load difference is larger than ~3% of one slave's load.
> >> 2. The load difference between slaves exceeds 100000 bytes.
> >> 3. Moving of the hash makes the load difference lower by > 10%.
> >> In current implementation if one of the slaves goes DOWN state, all
> >> the hashes assigned to it will be moved to other slaves. After that,
> >> if slave will go UP it will wait for rebalancing to get some hashes.
> >> But in case where we have more than 10 equally loaded hashes it
> >> will never fit constraint #3, because each hash will handle less than
> >> 10% of the load. Situation become worse when number of flows grows
> >> higher and it's almost impossible to migrate any hash when all the
> >> 256 hash entries are used which is very likely when we have few
> >> hundreds/thousands of flows.
> >> As a result, if one of the slaves goes down and up while traffic
> >> flows, it will never be used again for packet transmission.
> >> Situation will not be fixed even if we'll stop traffic completely
> >> and start it again because first two constraints will block
> >> rebalancing on the earlier stages while we have low amount of traffic.
> >> Moving of one hash if destination has no hashes as it was before
> >> commit c460a6a7bc75 ("ofproto/bond: simplify rebalancing logic")
> >> will not help because having one hash isn't enough to make load
> >> difference less than 10% of total load and this slave will
> >> handle only that one hash forever.
> >> To fix this lets try to move few hashes simultaniously to fit
> >> constraint #3.
> > Thanks for working on this.
> Sorry for not replying for almost 4 years. :)
> And sorry for resurrecting the thread, but the issue still exists and
> I think that we still need to fix it. The first patch needs a
> minor rebase, but it still works fine. The test in the second patch
> is still valid.
I don't think Andy is working on OVS these days. I'm the original
author of the rebalancing algorithm, so I went back and took a look at
patch 1. I see a little bit of coding style I'd do differently these
days (e.g. declare 'i' in the 'for' loops rather than at the top of a
block) but the code and the rationale for it seems solid to me.
I'll read v2 but I expect to ack it.
More information about the dev