[ovs-dev] [PATCH v1] Avoid dp_hash recirculation for balance-tcp bond selection mode

Manohar Krishnappa Chidambaraswamy manohar.krishnappa.chidambaraswamy at ericsson.com
Tue Jun 19 06:17:20 UTC 2018


Hi Ilya,

Thanx for taking a look. Please see inline.

Thanx
Manu

On 18/06/18, 4:04 PM, "Ilya Maximets" <i.maximets at samsung.com> wrote:

    > Hi
    
    Hi,
    I just wanted to clarify few things about RSS hash. See inline.
    
    One more thing:
    Despite of usual OVS bonding, this implementation doesn't support
    shifting the load between ports. Am I right?
    This could be an issue, because few heavy flows could be mapped to
    a single port, while other ports will be underloaded. This will
    be a bad case for tunnelling where we have only few heavy flows.
    As I understood, this version of bonding doesn't support any load
    statistics.
[manu] Yes that’s correct. This implementation does not yet support
accumulation of per slave stats (in "struct bond_entry"). Since load
balancing is done without using the dp_hashed flows, rule level stats
can't be used and bond_rebalance() won't take effect. I was planning
to add per-slave stats collection/accumulation in OVS_ACTION_ATTR_LB_OUTPUT
handling. This will be done in another patch set.
    
    Best regards, Ilya Maximets.
    
    > Problem:
    > --------
    > In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
    > (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
    > "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
    > forwarded to the bond member port based on 8-bits of the datapath hash
    > value computed through dp_hash. This causes performance degradation in the
    > following ways:
    > 
    > 1. L4-Hash computation in software is CPU intensive, it consumes
    > considerable CPU cycles of the PMD.
    
    RSS is in use in most cases in current master and 2.9. Details below.
[manu] OK Thanx. I was working on an earlier version of OVS and didn’t notice it
while porting to master.
    
    > 
    > 2. The recirculation of the packet implies another lookup of the packet’s
    > flow key in the exact match cache (EMC) and potentially Megaflow classifier
    > (DPCLS). This is the biggest cost factor.
    > 
    > 3. The recirculated packets have a new “RSS” hash and compete with the
    > original packets for the scarce number of EMC slots. This implies more
    > EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
    > 
    > 4. The 256 extra megaflow entries per bond for dp_hash bond selection put
    > additional load on the revalidation threads.
    >  
    > Owing to this performance degradation, deployments stick to “balance-slb”
    > bond mode even though it does not do active-active load balancing for 
    > VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
    > source MAC address.
    >  
    > Proposed optimization:
    > ---------------------- 
    > This proposal has 2 main optimizations in balance-tcp handling at egress.
    >  
    > 1. When feasible, re-use the existing L4 RSS-hash of the packet for bond
    > selection instead of computing another L4-hash in software.
    
    This is already done. See commit
    95a6cb3497c3 ("odp-execute: Reuse rss hash in OVS_ACTION_ATTR_HASH.")
    
    It was done a year ago and, currently, if RSS is available it's used for
    OVS_ACTION_ATTR_HASH while balanced bonding handling.
    
    So, at least, you should reword a lot of RSS related comments around the code.
[manu] With this I think OVS_ACTION_ATTR_HASH can be reused and only
OVS_ACTION_ATTR_RECIRC action can be replaced with OVS_ACTION_ATTR_LB_OUTPUT.
So it will be "HASH + LB-OUTPUT" instead of existing "HASH + RECIRC".
Will evaluate this and then send v2 diffs.
    
    >  
    > 2. Introduce a new load-balancing output action instead of recirculation:
    >    
    > Maintain one table per-bond (could just be an array of uint16's) and
    > Program it the same way internal flows are created today for each possible
    > hash value(256 entries) from ofproto layer. Use this table to load-balance
    > flows as part of output action processing.
    >  
    > Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
    > -> bond_may_recirc() and compose_output_action__() generate
    > “dp_hash(hash_l4(0))” and “recirc(<RecircID>)” actions. In this case the
    > RecircID identifies the bond. For the recirculated packets the ofproto layer
    > installs megaflow entries that match on RecircID and masked dp_hash and send
    > them to the corresponding output port.
    >  
    > Instead, we will now generate a new action "lb_output(bond,<bond id>)" which
    > combines hash computation (only if needed, else re-use RSS hash) and inline
    > load-balancing over the bond. This action is used *only* for balance-tcp bonds
    > in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
    
    



More information about the dev mailing list