[ovs-dev] [PATCH v2] Avoid dp_hash recirculation for balance-tcp bond selection mode

Ilya Maximets i.maximets at samsung.com
Mon Jul 8 13:09:35 UTC 2019


On 25.06.2019 12:28, Vishal Deep Ajmera wrote:
> Problem:
> --------
> In OVS-DPDK, flows with output over a bond interface of type "balance-tcp"
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
> 
> 1. The recirculation of the packet implies another lookup of the packet's
> flow key in the exact match cache (EMC) and potentially Megaflow classifier
> (DPCLS). This is the biggest cost factor.
> 
> 2. The recirculated packets have a new "RSS" hash and compete with the
> original packets for the scarce number of EMC slots. This implies more
> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
> 
> 3. The 256 extra megaflow entries per bond for dp_hash bond selection put
> additional load on the revalidation threads.
> 
> Owing to this performance degradation, deployments stick to "balance-slb"
> bond mode even though it does not do active-active load balancing for
> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> source MAC address.
> 
> Proposed optimization:
> ----------------------
> This proposal introduces a new load-balancing output action instead of
> recirculation.
> 
> Maintain one table per-bond (could just be an array of uint16's) and
> program it the same way internal flows are created today for each possible
> hash value(256 entries) from ofproto layer. Use this table to load-balance
> flows as part of output action processing.
> 
> Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
> -> bond_may_recirc() and compose_output_action__() generate
> "dp_hash(hash_l4(0))" and "recirc(<RecircID>)" actions. In this case the
> RecircID identifies the bond. For the recirculated packets the ofproto layer
> installs megaflow entries that match on RecircID and masked dp_hash and send
> them to the corresponding output port.
> 
> Instead, we will now generate actions as
>     "hash(l4(0)),lb_output(bond,<bond id>)"
> 
> This combines hash computation (only if needed, else re-use RSS hash) and
> inline load-balancing over the bond. This action is used *only* for balance-tcp
> bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
> 
> Example:
> --------
> Current scheme:
> ---------------
> With 1 IP-UDP flow:
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2828969, bytes:181054016, used:0.000s, actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:2828937, bytes:181051968, used:0.000s, actions:2
> 
> With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL):
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2674009, bytes:171136576, used:0.000s, actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:377395, bytes:24153280, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:333486, bytes:21343104, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:348461, bytes:22301504, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:633353, bytes:40534592, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:319901, bytes:20473664, used:0.001s, actions:2
> recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:334985, bytes:21439040, used:0.001s, actions:1
> recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:326404, bytes:20889856, used:0.001s, actions:1
> 
> New scheme:
> -----------
> We can do with a single flow entry (for any number of new flows):
> 
> in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2674009, bytes:171136576, used:0.000s, actions:hash(l4(0)),lb_output(bond,1)
> 
> A new CLI has been added to dump the per-PMD bond cache as given below.
> 
> "sudo ovs-appctl dpif-netdev/pmd-bond-show"
> 
> root at ubuntu-190:performance_scripts # sudo ovs-appctl dpif-netdev/pmd-bond-show
> pmd thread numa_id 0 core_id 4:
> Bond cache:
>         bond-id 1 :
>                 bucket 0 - slave 2
>                 bucket 1 - slave 1
>                 bucket 2 - slave 2
>                 bucket 3 - slave 1
> 
> Performance improvement:
> ------------------------
> With a prototype of the proposed idea, the following perf improvement is seen
> with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
> is even more enhanced (due to reduced number of flows).
> 
> 1 VM:
> *****
> +--------------------------------------+
> |                 mpps                 |
> +--------------------------------------+
> | Flows  master  with-opt.   %delta    |
> +--------------------------------------+
> | 1      4.53    5.89        29.96
> | 10     4.16    5.89        41.51
> | 400    3.55    5.55        56.22
> | 1k     3.44    5.45        58.30
> | 10k    2.50    4.63        85.34
> | 100k   2.29    4.27        86.15
> | 500k   2.25    4.27        89.23
> +--------------------------------------+
> mpps: million packets per second.
> 
> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
> Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera at ericsson.com>
> 
> CC: Jan Scheurich <jan.scheurich at ericsson.com>
> CC: Venkatesan Pradeep <venkatesan.pradeep at ericsson.com>
> CC: Nitin Katiyar <nitin.katiyar at ericsson.com>
> ---
>  datapath/linux/compat/include/linux/openvswitch.h |   2 +
>  lib/dpif-netdev.c                                 | 491 ++++++++++++++++++++--
>  lib/dpif-provider.h                               |   5 +
>  lib/dpif.c                                        |  36 ++
>  lib/dpif.h                                        |   5 +
>  lib/odp-execute.c                                 |   2 +
>  lib/odp-util.c                                    |   4 +
>  ofproto/bond.c                                    |  34 +-
>  ofproto/bond.h                                    |   8 +
>  ofproto/ofproto-dpif-sflow.c                      |   1 +
>  ofproto/ofproto-dpif-xlate.c                      |  67 ++-
>  ofproto/ofproto-dpif.c                            |  31 ++
>  ofproto/ofproto-dpif.h                            |  12 +-
>  tests/lacp.at                                     |   9 +
>  vswitchd/bridge.c                                 |   4 +
>  vswitchd/vswitch.xml                              |  10 +
>  16 files changed, 666 insertions(+), 55 deletions(-)
> 

Hi.
Few general comments for this patch:

* dp_hash already uses RSS for hashing and "bond-hash-rss=false" will not
  disable that, so the config "bond-hash-rss" is very misleading.

* You should not access 'pmd->bond_cache' out of PMD thread, however this
  code constantly does. Caches are specially for lockless access from the
  PMD thread, no-one should modify or iterate over them except the owning
  PMD thread.

* About this comment:

> +                     * NOTE:
> +                     * Do not use load-balanced-output action when tunnel push
> +                     * recirculation is avoided (via CLONE action), as L4 hash
> +                     * for bond balancing needs to be computed post tunnel
> +                     * encapsulation.
> +                     */

 How should I ever know that I'm doing something wrong?


Best regards, Ilya Maximets.


More information about the dev mailing list