[ovs-dev] [PATCH v2] bond: Fix broken rebalancing after link state changes.
Ben Pfaff
blp at ovn.org
Thu Jul 15 23:51:08 UTC 2021
On Tue, Jul 13, 2021 at 05:32:06PM +0200, Ilya Maximets wrote:
> There are 3 constraints for moving hashes from one member to another:
>
> 1. The load difference exceeds ~ 3% of the load of one member.
> 2. The difference in load between members exceeds 100,000 bytes.
> 3. Moving the hash reduces the load difference by more than 10%.
>
> In the current implementation, if one of the members transitions to
> the DOWN state, all hashes assigned to it will be moved to the other
> members. After that, if the member goes UP, it will wait for
> rebalancing to get hashes. But in case we have more than 10 equally
> loaded hashes, it will never meet constraint # 3, because each hash
> will handle less than 10% of the load. The situation gets worse when
> the number of flows grows and it is almost impossible to transfer any
> hash when all 256 hash records are used, which is very likely when we
> have few hundred/thousand flows.
>
> As a result, if one of the members goes down and back up while traffic
> flows, it will never be used to transmit packets again. This will not
> be fixed even if we completely stop the traffic and start it again,
> because the first two constraints will block rebalancing in the
> earlier stages, while we have low traffic volume.
>
> Moving a single hash if the destination does not have any hashes,
> as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the
> rebalancing logic"), will not help, because a single hash is not
> enough to make the difference in load less than 10% of the total load,
> and this member will handle only that one hash forever.
>
> To fix this, let's try to move multiple hashes at the same time to
> meet constraint # 3.
>
> The implementation includes sorting the "records" to be able to
> collect records with a cumulative load close enough to the ideal value.
>
> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
I reread this and it still looks good to me.
I spotted one typo in a comment:
diff --git a/ofproto/bond.c b/ofproto/bond.c
index b9bfa45493b8..c3e2083575b0 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -1216,7 +1216,7 @@ choose_entries_to_migrate(const struct bond_member *from, uint64_t to_tx_bytes,
}
if (!cnt) {
- /* There is no entry which load less than or equal to 'ideal_delta'.
+ /* There is no entry with load less than or equal to 'ideal_delta'.
* Lets try closest one. The closest is the last in sorted list. */
struct bond_entry *closest;
Acked-by: Ben Pfaff <blp at ovn.org>
More information about the dev
mailing list