[ovs-dev] [PATCH v5] Avoid dp_hash recirculation for balance-tcp bond selection mode

Matteo Croce mcroce at redhat.com
Mon Aug 26 16:18:29 UTC 2019


On Thu, Aug 8, 2019 at 10:57 AM Vishal Deep Ajmera
<vishal.deep.ajmera at ericsson.com> wrote:
>
> Problem:
> --------
> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
>
> 1. The recirculation of the packet implies another lookup of the packet’s
> flow key in the exact match cache (EMC) and potentially Megaflow classifier
> (DPCLS). This is the biggest cost factor.
>
> 2. The recirculated packets have a new “RSS” hash and compete with the
> original packets for the scarce number of EMC slots. This implies more
> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
>
> 3. The 256 extra megaflow entries per bond for dp_hash bond selection put
> additional load on the revalidation threads.
>
> Owing to this performance degradation, deployments stick to “balance-slb”
> bond mode even though it does not do active-active load balancing for
> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> source MAC address.
>
> Proposed optimization:
> ----------------------
> This proposal introduces a new load-balancing output action instead of
> recirculation.
>
> Maintain one table per-bond (could just be an array of uint16's) and
> program it the same way internal flows are created today for each possible
> hash value(256 entries) from ofproto layer. Use this table to load-balance
> flows as part of output action processing.
>
> Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
> -> bond_may_recirc() and compose_output_action__() generate
> “dp_hash(hash_l4(0))” and “recirc(<RecircID>)” actions. In this case the
> RecircID identifies the bond. For the recirculated packets the ofproto layer
> installs megaflow entries that match on RecircID and masked dp_hash and send
> them to the corresponding output port.
>
> Instead, we will now generate actions as
>     "hash(l4(0)),lb_output(bond,<bond id>)"
>
> This combines hash computation (only if needed, else re-use RSS hash) and
> inline load-balancing over the bond. This action is used *only* for balance-tcp
> bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
>
> Example:
> --------
> Current scheme:
> ---------------
> With 1 IP-UDP flow:
>
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2828969, bytes:181054016, used:0.000s, actions:hash(hash_l4(0)),recirc(0x1)
>
> recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:2828937, bytes:181051968, used:0.000s, actions:2
>
> With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL):
>
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2674009, bytes:171136576, used:0.000s, actions:hash(hash_l4(0)),recirc(0x1)
>
> recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:377395, bytes:24153280, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:333486, bytes:21343104, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:348461, bytes:22301504, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:633353, bytes:40534592, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:319901, bytes:20473664, used:0.001s, actions:2
> recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:334985, bytes:21439040, used:0.001s, actions:1
> recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:326404, bytes:20889856, used:0.001s, actions:1
>
> New scheme:
> -----------
> We can do with a single flow entry (for any number of new flows):
>
> in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2674009, bytes:171136576, used:0.000s, actions:hash(l4(0)),lb_output(bond,1)
>
> A new CLI has been added to dump the per-PMD bond cache as given below.
>
> “sudo ovs-appctl dpif-netdev/pmd-bond-show”
>
> root at ubuntu-190:performance_scripts # sudo ovs-appctl dpif-netdev/pmd-bond-show
> pmd thread numa_id 0 core_id 4:
> Bond cache:
>         bond-id 1 :
>                 bucket 0 - slave 2
>                 bucket 1 - slave 1
>                 bucket 2 - slave 2
>                 bucket 3 - slave 1
>
> Performance improvement:
> ------------------------
> With a prototype of the proposed idea, the following perf improvement is seen
> with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
> is even more enhanced (due to reduced number of flows).
>
> 1 VM:
> *****
> +--------------------------------------+
> |                 mpps                 |
> +--------------------------------------+
> | Flows  master  with-opt.   %delta    |
> +--------------------------------------+
> | 1      4.53    5.89        29.96
> | 10     4.16    5.89        41.51
> | 400    3.55    5.55        56.22
> | 1k     3.44    5.45        58.30
> | 10k    2.50    4.63        85.34
> | 100k   2.29    4.27        86.15
> | 500k   2.25    4.27        89.23
> +--------------------------------------+
> mpps: million packets per second.
>
> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
> Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera at ericsson.com>
>
> CC: Jan Scheurich <jan.scheurich at ericsson.com>
> CC: Venkatesan Pradeep <venkatesan.pradeep at ericsson.com>
> CC: Nitin Katiyar <nitin.katiyar at ericsson.com>

Hi Vishal,

I quickly tested your patch on two servers with 2 ixgbe cards each
linked via a juniper switch with LACP.
With testpmd using a single core the switching rate raised from ~2.0
Mpps to ~2.4+ Mpps, so I read at least a +20% gain.

Please add an example command on how to enable it in the commit message, e.g.

  ovs-vsctl set port bond0 other_config:opt-bond-tcp=true

Thanks,
-- 
Matteo Croce
per aspera ad upstream


More information about the dev mailing list