[ovs-dev] [PATCH v2] bond: Fix broken rebalancing after link state changes.

Ben Pfaff blp at ovn.org
Thu Jul 15 23:51:08 UTC 2021


On Tue, Jul 13, 2021 at 05:32:06PM +0200, Ilya Maximets wrote:
> There are 3 constraints for moving hashes from one member to another:
> 
>   1. The load difference exceeds ~ 3% of the load of one member.
>   2. The difference in load between members exceeds 100,000 bytes.
>   3. Moving the hash reduces the load difference by more than 10%.
> 
> In the current implementation, if one of the members transitions to
> the DOWN state, all hashes assigned to it will be moved to the other
> members.  After that, if the member goes UP, it will wait for
> rebalancing to get hashes.  But in case we have more than 10 equally
> loaded hashes, it will never meet constraint # 3, because each hash
> will handle less than 10% of the load.  The situation gets worse when
> the number of flows grows and it is almost impossible to transfer any
> hash when all 256 hash records are used, which is very likely when we
> have few hundred/thousand flows.
> 
> As a result, if one of the members goes down and back up while traffic
> flows, it will never be used to transmit packets again.  This will not
> be fixed even if we completely stop the traffic and start it again,
> because the first two constraints will block rebalancing in the
> earlier stages, while we have low traffic volume.
> 
> Moving a single hash if the destination does not have any hashes,
> as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the
> rebalancing logic"), will not help, because a single hash is not
> enough to make the difference in load less than 10% of the total load,
> and this member will handle only that one hash forever.
> 
> To fix this, let's try to move multiple hashes at the same time to
> meet constraint # 3.
> 
> The implementation includes sorting the "records" to be able to
> collect records with a cumulative load close enough to the ideal value.
> 
> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

I reread this and it still looks good to me.

I spotted one typo in a comment:

diff --git a/ofproto/bond.c b/ofproto/bond.c
index b9bfa45493b8..c3e2083575b0 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -1216,7 +1216,7 @@ choose_entries_to_migrate(const struct bond_member *from, uint64_t to_tx_bytes,
     }
 
     if (!cnt) {
-        /* There is no entry which load less than or equal to 'ideal_delta'.
+        /* There is no entry with load less than or equal to 'ideal_delta'.
          * Lets try closest one. The closest is the last in sorted list. */
         struct bond_entry *closest;
 

Acked-by: Ben Pfaff <blp at ovn.org>


More information about the dev mailing list