[ovs-dev] [PATCH v1] Avoid dp_hash recirculation for balance-tcp bond selection mode
Manohar Krishnappa Chidambaraswamy
manohar.krishnappa.chidambaraswamy at ericsson.com
Tue Jun 19 06:17:20 UTC 2018
Hi Ilya,
Thanx for taking a look. Please see inline.
Thanx
Manu
On 18/06/18, 4:04 PM, "Ilya Maximets" <i.maximets at samsung.com> wrote:
> Hi
Hi,
I just wanted to clarify few things about RSS hash. See inline.
One more thing:
Despite of usual OVS bonding, this implementation doesn't support
shifting the load between ports. Am I right?
This could be an issue, because few heavy flows could be mapped to
a single port, while other ports will be underloaded. This will
be a bad case for tunnelling where we have only few heavy flows.
As I understood, this version of bonding doesn't support any load
statistics.
[manu] Yes that’s correct. This implementation does not yet support
accumulation of per slave stats (in "struct bond_entry"). Since load
balancing is done without using the dp_hashed flows, rule level stats
can't be used and bond_rebalance() won't take effect. I was planning
to add per-slave stats collection/accumulation in OVS_ACTION_ATTR_LB_OUTPUT
handling. This will be done in another patch set.
Best regards, Ilya Maximets.
> Problem:
> --------
> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
>
> 1. L4-Hash computation in software is CPU intensive, it consumes
> considerable CPU cycles of the PMD.
RSS is in use in most cases in current master and 2.9. Details below.
[manu] OK Thanx. I was working on an earlier version of OVS and didn’t notice it
while porting to master.
>
> 2. The recirculation of the packet implies another lookup of the packet’s
> flow key in the exact match cache (EMC) and potentially Megaflow classifier
> (DPCLS). This is the biggest cost factor.
>
> 3. The recirculated packets have a new “RSS” hash and compete with the
> original packets for the scarce number of EMC slots. This implies more
> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
>
> 4. The 256 extra megaflow entries per bond for dp_hash bond selection put
> additional load on the revalidation threads.
>
> Owing to this performance degradation, deployments stick to “balance-slb”
> bond mode even though it does not do active-active load balancing for
> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> source MAC address.
>
> Proposed optimization:
> ----------------------
> This proposal has 2 main optimizations in balance-tcp handling at egress.
>
> 1. When feasible, re-use the existing L4 RSS-hash of the packet for bond
> selection instead of computing another L4-hash in software.
This is already done. See commit
95a6cb3497c3 ("odp-execute: Reuse rss hash in OVS_ACTION_ATTR_HASH.")
It was done a year ago and, currently, if RSS is available it's used for
OVS_ACTION_ATTR_HASH while balanced bonding handling.
So, at least, you should reword a lot of RSS related comments around the code.
[manu] With this I think OVS_ACTION_ATTR_HASH can be reused and only
OVS_ACTION_ATTR_RECIRC action can be replaced with OVS_ACTION_ATTR_LB_OUTPUT.
So it will be "HASH + LB-OUTPUT" instead of existing "HASH + RECIRC".
Will evaluate this and then send v2 diffs.
>
> 2. Introduce a new load-balancing output action instead of recirculation:
>
> Maintain one table per-bond (could just be an array of uint16's) and
> Program it the same way internal flows are created today for each possible
> hash value(256 entries) from ofproto layer. Use this table to load-balance
> flows as part of output action processing.
>
> Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
> -> bond_may_recirc() and compose_output_action__() generate
> “dp_hash(hash_l4(0))” and “recirc(<RecircID>)” actions. In this case the
> RecircID identifies the bond. For the recirculated packets the ofproto layer
> installs megaflow entries that match on RecircID and masked dp_hash and send
> them to the corresponding output port.
>
> Instead, we will now generate a new action "lb_output(bond,<bond id>)" which
> combines hash computation (only if needed, else re-use RSS hash) and inline
> load-balancing over the bond. This action is used *only* for balance-tcp bonds
> in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
More information about the dev
mailing list