[ovs-dev] [PATCH v1] Avoid dp_hash recirculation for balance-tcp bond selection mode

Ilya Maximets i.maximets at samsung.com
Mon Jun 18 10:34:21 UTC 2018

> Hi

I just wanted to clarify few things about RSS hash. See inline.

One more thing:
Despite of usual OVS bonding, this implementation doesn't support
shifting the load between ports. Am I right?
This could be an issue, because few heavy flows could be mapped to
a single port, while other ports will be underloaded. This will
be a bad case for tunnelling where we have only few heavy flows.
As I understood, this version of bonding doesn't support any load

Best regards, Ilya Maximets.

> Problem:
> --------
> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
> 1. L4-Hash computation in software is CPU intensive, it consumes
> considerable CPU cycles of the PMD.

RSS is in use in most cases in current master and 2.9. Details below.

> 2. The recirculation of the packet implies another lookup of the packet’s
> flow key in the exact match cache (EMC) and potentially Megaflow classifier
> (DPCLS). This is the biggest cost factor.
> 3. The recirculated packets have a new “RSS” hash and compete with the
> original packets for the scarce number of EMC slots. This implies more
> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
> 4. The 256 extra megaflow entries per bond for dp_hash bond selection put
> additional load on the revalidation threads.
> Owing to this performance degradation, deployments stick to “balance-slb”
> bond mode even though it does not do active-active load balancing for 
> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> source MAC address.
> Proposed optimization:
> ---------------------- 
> This proposal has 2 main optimizations in balance-tcp handling at egress.
> 1. When feasible, re-use the existing L4 RSS-hash of the packet for bond
> selection instead of computing another L4-hash in software.

This is already done. See commit
95a6cb3497c3 ("odp-execute: Reuse rss hash in OVS_ACTION_ATTR_HASH.")

It was done a year ago and, currently, if RSS is available it's used for
OVS_ACTION_ATTR_HASH while balanced bonding handling.

So, at least, you should reword a lot of RSS related comments around the code.

> 2. Introduce a new load-balancing output action instead of recirculation:
> Maintain one table per-bond (could just be an array of uint16's) and
> Program it the same way internal flows are created today for each possible
> hash value(256 entries) from ofproto layer. Use this table to load-balance
> flows as part of output action processing.
> Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
> -> bond_may_recirc() and compose_output_action__() generate
> “dp_hash(hash_l4(0))” and “recirc(<RecircID>)” actions. In this case the
> RecircID identifies the bond. For the recirculated packets the ofproto layer
> installs megaflow entries that match on RecircID and masked dp_hash and send
> them to the corresponding output port.
> Instead, we will now generate a new action "lb_output(bond,<bond id>)" which
> combines hash computation (only if needed, else re-use RSS hash) and inline
> load-balancing over the bond. This action is used *only* for balance-tcp bonds
> in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).

More information about the dev mailing list