[ovs-git] [openvswitch/ovs] 9df650: userspace: Avoid dp_hash recirculation for balance...

sbasavapatna noreply at github.com
Mon Jun 22 11:50:37 UTC 2020

  Branch: refs/heads/master
  Home:   https://github.com/openvswitch/ovs
  Commit: 9df65060cf4c27553ee5e29f74ef6807dd5af992
  Author: Vishal Deep Ajmera <vishal.deep.ajmera at ericsson.com>
  Date:   2020-06-22 (Mon, 22 Jun 2020)

  Changed paths:
    M NEWS
    M datapath/linux/compat/include/linux/openvswitch.h
    M lib/dpif-netdev.c
    M lib/dpif-netlink.c
    M lib/dpif-provider.h
    M lib/dpif.c
    M lib/dpif.h
    M lib/odp-execute.c
    M lib/odp-util.c
    M ofproto/bond.c
    M ofproto/bond.h
    M ofproto/ofproto-dpif-ipfix.c
    M ofproto/ofproto-dpif-sflow.c
    M ofproto/ofproto-dpif-xlate.c
    M ofproto/ofproto-dpif.c
    M ofproto/ofproto-dpif.h
    M tests/lacp.at
    M tests/odp.at
    M vswitchd/bridge.c
    M vswitchd/vswitch.xml

  Log Message:
  userspace: Avoid dp_hash recirculation for balance-tcp bond mode.


In OVS, flows with output over a bond interface of type “balance-tcp”
gets translated by the ofproto layer into "HASH" and "RECIRC" datapath
actions. After recirculation, the packet is forwarded to the bond
member port based on 8-bits of the datapath hash value computed through
dp_hash. This causes performance degradation in the following ways:

1. The recirculation of the packet implies another lookup of the
packet’s flow key in the exact match cache (EMC) and potentially
Megaflow classifier (DPCLS). This is the biggest cost factor.

2. The recirculated packets have a new “RSS” hash and compete with the
original packets for the scarce number of EMC slots. This implies more
EMC misses and potentially EMC thrashing causing costly DPCLS lookups.

3. The 256 extra megaflow entries per bond for dp_hash bond selection
put additional load on the revalidation threads.

Owing to this performance degradation, deployments stick to “balance-slb”
bond mode even though it does not do active-active load balancing for
VXLAN- and GRE-tunnelled traffic because all tunnel packet have the
same source MAC address.

Proposed optimization:

This proposal introduces a new load-balancing output action instead of

Maintain one table per-bond (could just be an array of uint16's) and
program it the same way internal flows are created today for each
possible hash value (256 entries) from ofproto layer. Use this table to
load-balance flows as part of output action processing.

Currently xlate_normal() -> output_normal() ->
bond_update_post_recirc_rules() -> bond_may_recirc() and
compose_output_action__() generate 'dp_hash(hash_l4(0))' and
'recirc(<RecircID>)' actions. In this case the RecircID identifies the
bond. For the recirculated packets the ofproto layer installs megaflow
entries that match on RecircID and masked dp_hash and send them to the
corresponding output port.

Instead, we will now generate action as
    'lb_output(<bond id>)'

This combines hash computation (only if needed, else re-use RSS hash)
and inline load-balancing over the bond. This action is used *only* for
balance-tcp bonds in userspace datapath (the OVS kernel datapath
remains unchanged).

Current scheme:

With 8 UDP flows (with random UDP src port):

  flow-dump from pmd on cpu core: 2
  recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1)

  recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2
  recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1
  recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1

New scheme:
We can do with a single flow entry (for any number of new flows):

  in_port(7),<...> actions:lb_output(1)

A new CLI has been added to dump datapath bond cache as given below.

 # ovs-appctl dpif-netdev/bond-show [dp]

   Bond cache:
     bond-id 1 :
       bucket 0 - slave 2
       bucket 1 - slave 1
       bucket 2 - slave 2
       bucket 3 - slave 1

Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera at ericsson.com>
Tested-by: Matteo Croce <mcroce at redhat.com>
Tested-by: Adrian Moreno <amorenoz at redhat.com>
Acked-by: Eelco Chaudron <echaudro at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 3950e350d240232b5ad8b7b5a701549ad0e84378
  Author: Matteo Croce <mcroce at redhat.com>
  Date:   2020-06-22 (Mon, 22 Jun 2020)

  Changed paths:
    M tests/ofproto-dpif.at

  Log Message:
  ofproto-dpif.at: Add unit test for lb_output action.

Extend the balance-tcp one so it tests lb-output action too.
The test checks that that the option is shown in bond/show,
and that the lb_output action is programmed in the datapath.

Signed-off-by: Matteo Croce <mcroce at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 029273855939cce35cba7c2ab1831bd92b0502cb
  Author: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>
  Date:   2020-06-22 (Mon, 22 Jun 2020)

  Changed paths:
    M Documentation/howto/dpdk.rst
    M NEWS
    M lib/netdev-offload-dpdk.c

  Log Message:
  netdev-offload-dpdk: Support offload of VLAN PUSH/POP actions.

Parse VLAN PUSH/POP OVS datapath actions and add respective RTE actions.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>
Acked-by: Eli Britstein <elibr at mellanox.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

Compare: https://github.com/openvswitch/ovs/compare/1fe429756398...029273855939

More information about the git mailing list