[ovs-dev] [PATCH v13 1/1] Avoid dp_hash recirculation for balance-tcp bond selection mode

Ilya Maximets i.maximets at ovn.org
Thu Jun 11 22:03:21 UTC 2020


On 5/22/20 10:50 AM, Vishal Deep Ajmera wrote:
> Problem:
> --------
> In OVS-DPDK, flows with output over a bond interface of type “balance-tcp”
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
> 
> 1. The recirculation of the packet implies another lookup of the packet’s
> flow key in the exact match cache (EMC) and potentially Megaflow classifier
> (DPCLS). This is the biggest cost factor.
> 
> 2. The recirculated packets have a new “RSS” hash and compete with the
> original packets for the scarce number of EMC slots. This implies more
> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
> 
> 3. The 256 extra megaflow entries per bond for dp_hash bond selection put
> additional load on the revalidation threads.
> 
> Owing to this performance degradation, deployments stick to “balance-slb”
> bond mode even though it does not do active-active load balancing for
> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> source MAC address.
> 
> Proposed optimization:
> ----------------------
> This proposal introduces a new load-balancing output action instead of
> recirculation.
> 
> Maintain one table per-bond (could just be an array of uint16's) and
> program it the same way internal flows are created today for each possible
> hash value(256 entries) from ofproto layer. Use this table to load-balance
> flows as part of output action processing.
> 
> Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
> -> bond_may_recirc() and compose_output_action__() generate
> “dp_hash(hash_l4(0))” and “recirc(<RecircID>)” actions. In this case the
> RecircID identifies the bond. For the recirculated packets the ofproto layer
> installs megaflow entries that match on RecircID and masked dp_hash and send
> them to the corresponding output port.
> 
> Instead, we will now generate action as
>     "lb_output(<bond id>)"
> 
> This combines hash computation (only if needed, else re-use RSS hash) and
> inline load-balancing over the bond. This action is used *only* for balance-tcp
> bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
> 
> Example:
> --------
> Current scheme:
> ---------------
> With 1 IP-UDP flow:
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2828969, bytes:181054016, used:0.000s, actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:2828937, bytes:181051968, used:0.000s, actions:2
> 
> With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL):
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2674009, bytes:171136576, used:0.000s, actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:377395, bytes:24153280, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:333486, bytes:21343104, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:348461, bytes:22301504, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:633353, bytes:40534592, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:319901, bytes:20473664, used:0.001s, actions:2
> recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:334985, bytes:21439040, used:0.001s, actions:1
> recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:326404, bytes:20889856, used:0.001s, actions:1
> 
> New scheme:
> -----------
> We can do with a single flow entry (for any number of new flows):
> 
> in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no), packets:2674009, bytes:171136576, used:0.000s, actions:lb_output(1)
> 
> A new CLI has been added to dump datapath bond cache as given below.
> 
> “sudo ovs-appctl dpif-netdev/bond-show [dp]”
> 
> root at ubuntu-190:performance_scripts # sudo ovs-appctl dpif-netdev/bond-show
> Bond cache:
>         bond-id 1 :
>                 bucket 0 - slave 2
>                 bucket 1 - slave 1
>                 bucket 2 - slave 2
>                 bucket 3 - slave 1
> 
> Performance improvement:
> ------------------------
> With a prototype of the proposed idea, the following perf improvement is seen
> with Phy-VM-Phy UDP traffic, single flow. With multiple flows, the improvement
> is even more enhanced (due to reduced number of flows).
> 
> 1 VM:
> *****
> +--------------------------------------+
> |                 mpps                 |
> +--------------------------------------+
> | Flows  master  with-opt.   %delta    |
> +--------------------------------------+
> | 1      4.34    5.59        28.86
> | 10     4.09    5.43        32.61
> | 400    3.44    5.35        55.25
> | 1k     3.31    5.25        58.41
> | 10k    2.46    4.43        79.78
> | 100k   2.32    4.27        83.59
> | 500k   2.29    4.23        84.57
> +--------------------------------------+
> mpps: million packets per second.
> packet size 64 bytes.
> 
> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
> Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc at gmail.com>
> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera at ericsson.com>
> 
> CC: Matteo Croce <mcroce at redhat.com>
> CC: Eelco Chaudron <echaudro at redhat.com>
> CC: Ilya Maximets <i.maximets at redhat.com>
> CC: Jan Scheurich <jan.scheurich at ericsson.com>
> CC: Venkatesan Pradeep <venkatesan.pradeep at ericsson.com>
> CC: Ben Pfaff <blp at ovn.org>
> CC: Ian Stokes <ian.stokes at intel.com>
> CC: David Marchand <david.marchand at redhat.com>
> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera at ericsson.com>
> ---
>  datapath/linux/compat/include/linux/openvswitch.h |   1 +
>  lib/dpif-netdev.c                                 | 413 +++++++++++++++++++---
>  lib/dpif-netlink.c                                |   3 +
>  lib/dpif-provider.h                               |  11 +
>  lib/dpif.c                                        |  49 +++
>  lib/dpif.h                                        |  13 +
>  lib/odp-execute.c                                 |   2 +
>  lib/odp-util.c                                    |  14 +
>  ofproto/bond.c                                    |  97 ++++-
>  ofproto/bond.h                                    |   5 +
>  ofproto/ofproto-dpif-ipfix.c                      |   1 +
>  ofproto/ofproto-dpif-sflow.c                      |   3 +-
>  ofproto/ofproto-dpif-xlate.c                      |  32 +-
>  ofproto/ofproto-dpif.c                            |  32 ++
>  ofproto/ofproto-dpif.h                            |  10 +-
>  ofproto/ofproto-provider.h                        |   3 +
>  ofproto/ofproto.c                                 |  10 +
>  ofproto/ofproto.h                                 |   1 +
>  tests/lacp.at                                     |   9 +
>  tests/odp.at                                      |   1 +
>  vswitchd/bridge.c                                 |  13 +
>  vswitchd/vswitch.xml                              |  23 ++
>  22 files changed, 675 insertions(+), 71 deletions(-)
> 

Hi.  Thanks for the new version it works fine.  In order to speed things
up I prepared an incremental patch with one bug fix and some changes
that I think are necessary to keep the code clean.  Don't be afraid of
the patch size, there are very few functional changes, most of other
changes are style fixes or readability improvements.

This patch + my incremental was reviewed by Eelco and tested by Adrian.
I'll add proper tags while applying to master.

So, what I'm proposing for you is to take a look at the incremental patch
and if it's OK for you, I could squash it into your v13 and apply to master.

What do you think?

Best regards, Ilya Maximets.

Incremental patch itself:
------------------------------
Summary:
  * Dropped 'dp->bond_mutex' while removing stale ports.  We're not
    touching bonds there at all.
  * Added new argument 'update' to dp_netdev_add_bond_tx_to_pmd() to
    avoid unnecessary re-allocation of the same slaves during
    reconfiguration.  We might also want to avoid adding tx bonds to
    PMDs that doesn't poll anything inside dpif_netdev_bond_add(), but
    this might be done later.
  * Added actual error codes to dpif_netdev_bond_*() functions.
    They are not used right now, but it's better to have them anyway.
  * Refined API around dpif_bond_stats_get().  Now it always clears
    n_bytes array regardless of success.  Memory clearing removed from
    the caller.
  * dpif_bond_*() functions should not check dpif and dpif_class
    pointers.  Rewritten to be more like other code.  Error codes
    must be positive and we're usually using EOPNOTSUPP instead of
    ENOTSUP for unsupported operations.
  * bond/show output changed to reflect actual values, not the
    configured ones.  Dropped printing of recirc_id if recirc_id is not
    supported (actually if lb_output is not supported).
  * Fixed wrong array size in bond_add_lb_output_buckets().
    MASK mistakenly used as an array size.
  * Reorganized compose_output_action__() to make it more readable
    and not fetch hash algo if not needed. (incremental diff is big,
    but it mostly restores previous code :) )
  * Removed is_lb_output_action_supported from the ofproto-provider
    API.  This was not a pretty solution.  Replaced with new
    ovs_lb_output_action_supported() function called from the
    ofproto/bond.c.  For this purpose validation of lb-output-action
    knob moved down to bond_reconfigure() out of bridge.c.
  * 'lb-output-action' replaced with 'lb_output action' in docs and
    comments and in the output of bond/show to make things more
    readable and easier to understand.
  * Tabs in output of bond/show command replaced with spaces.
    OVS doesn't use tabs in the output anywhere.
  * Random style fixes, improvements.

Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
---
 NEWS                         |  3 ++
 lib/dpif-netdev.c            | 56 ++++++++++++++++++++++--------------
 lib/dpif-provider.h          |  3 +-
 lib/dpif.c                   | 30 +++++++------------
 lib/dpif.h                   |  9 +++---
 ofproto/bond.c               | 40 +++++++++++++++-----------
 ofproto/bond.h               |  2 +-
 ofproto/ofproto-dpif-xlate.c | 41 +++++++++++++-------------
 ofproto/ofproto-dpif.c       | 14 ++++-----
 ofproto/ofproto-dpif.h       |  6 ++--
 ofproto/ofproto-provider.h   |  3 --
 ofproto/ofproto.c            |  9 ------
 ofproto/ofproto.h            |  1 -
 tests/lacp.at                | 18 ++++++------
 vswitchd/bridge.c            | 14 ++-------
 vswitchd/vswitch.xml         |  2 +-
 16 files changed, 120 insertions(+), 131 deletions(-)

diff --git a/NEWS b/NEWS
index 88b273a0a..207bd503c 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,9 @@ Post-v2.13.0
        by enabling interrupt mode.
    - Userspace datapath:
      * Add support for conntrack zone-based timeout policy.
+     * New configuration knob 'other_config:lb-output-action' for bond ports
+       that enables new datapath action 'lb_output' to avoid recirculation
+       in balance-tcp mode.  Disabled by default.
    - Tunnels: TC Flower offload
      * Tunnel Local endpoint address masked match are supported.
      * Tunnel Romte endpoint address masked match are supported.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c55d45233..1086efd47 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -857,9 +857,11 @@ static int
 dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd,
                                    bool force);
 static void dp_netdev_add_bond_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
-                                         struct tx_bond *bond);
+                                         struct tx_bond *bond, bool update)
+    OVS_EXCLUDED(pmd->bond_mutex);
 static void dp_netdev_del_bond_tx_from_pmd(struct dp_netdev_pmd_thread *pmd,
-                                           uint32_t bond_id);
+                                           uint32_t bond_id)
+    OVS_EXCLUDED(pmd->bond_mutex);
 
 static void reconfigure_datapath(struct dp_netdev *dp)
     OVS_REQUIRES(dp->port_mutex);
@@ -1452,15 +1454,15 @@ dpif_netdev_bond_show(struct unixctl_conn *conn, int argc,
         struct tx_bond *dp_bond_entry;
         uint32_t slave_id;
 
-        ds_put_cstr(&reply, "\nBonds:\n");
+        ds_put_cstr(&reply, "Bonds:\n");
         CMAP_FOR_EACH (dp_bond_entry, node, &dp->tx_bonds) {
-            ds_put_format(&reply, "\tbond-id %"PRIu32":\n",
+            ds_put_format(&reply, "  bond-id %"PRIu32":\n",
                           dp_bond_entry->bond_id);
             for (int bucket = 0; bucket < BOND_BUCKETS; bucket++) {
                 slave_id =
                     odp_to_u32(dp_bond_entry->slave_buckets[bucket].slave_id);
-                ds_put_format(&reply, "\t\tbucket %d - slave %"PRIu32"\n",
-                             bucket, slave_id);
+                ds_put_format(&reply, "    bucket %d - slave %"PRIu32"\n",
+                              bucket, slave_id);
             }
         }
     }
@@ -4521,8 +4523,8 @@ tx_port_lookup(const struct hmap *hmap, odp_port_t port_no)
 static struct tx_bond *
 tx_bond_lookup(const struct cmap *tx_bonds, uint32_t bond_id)
 {
-    struct tx_bond *tx;
     uint32_t hash = hash_bond_id(bond_id);
+    struct tx_bond *tx;
 
     CMAP_FOR_EACH_WITH_HASH (tx, node, hash, tx_bonds) {
         if (tx->bond_id == bond_id) {
@@ -5068,11 +5070,9 @@ reconfigure_datapath(struct dp_netdev *dp)
 
     /* Remove from the pmd threads all the ports that have been deleted or
      * need reconfiguration. */
-    ovs_mutex_lock(&dp->bond_mutex);
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
         pmd_remove_stale_ports(dp, pmd);
     }
-    ovs_mutex_unlock(&dp->bond_mutex);
 
     /* Reload affected pmd threads.  We must wait for the pmd threads before
      * reconfiguring the ports, because a port cannot be reconfigured while
@@ -5195,7 +5195,7 @@ reconfigure_datapath(struct dp_netdev *dp)
             }
 
             CMAP_FOR_EACH (bond, node, &dp->tx_bonds) {
-                dp_netdev_add_bond_tx_to_pmd(pmd, bond);
+                dp_netdev_add_bond_tx_to_pmd(pmd, bond, false);
             }
         }
         ovs_mutex_unlock(&pmd->port_mutex);
@@ -6454,12 +6454,19 @@ dp_netdev_del_port_tx_from_pmd(struct dp_netdev_pmd_thread *pmd,
 /* Add bond to the tx bond cmap of 'pmd'. */
 static void
 dp_netdev_add_bond_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
-                             struct tx_bond *bond)
+                             struct tx_bond *bond, bool update)
+    OVS_EXCLUDED(pmd->bond_mutex)
 {
     struct tx_bond *tx;
 
     ovs_mutex_lock(&pmd->bond_mutex);
     tx = tx_bond_lookup(&pmd->tx_bonds, bond->bond_id);
+
+    if (tx && !update) {
+        /* It's not an update and the entry already exists.  Do nothing. */
+        goto unlock;
+    }
+
     if (tx) {
         struct tx_bond *new_tx = xmemdup(bond, sizeof *bond);
 
@@ -6479,6 +6486,7 @@ dp_netdev_add_bond_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
         tx = xmemdup(bond, sizeof *bond);
         cmap_insert(&pmd->tx_bonds, &tx->node, hash_bond_id(bond->bond_id));
     }
+unlock:
     ovs_mutex_unlock(&pmd->bond_mutex);
 }
 
@@ -6486,6 +6494,7 @@ dp_netdev_add_bond_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
 static void
 dp_netdev_del_bond_tx_from_pmd(struct dp_netdev_pmd_thread *pmd,
                                uint32_t bond_id)
+    OVS_EXCLUDED(pmd->bond_mutex)
 {
     struct tx_bond *tx;
 
@@ -7344,8 +7353,8 @@ dp_execute_output_action(struct dp_netdev_pmd_thread *pmd,
     dp_packet_batch_apply_cutlen(packets_);
 #ifdef DPDK_NETDEV
     if (OVS_UNLIKELY(!dp_packet_batch_is_empty(&p->output_pkts)
-             && packets_->packets[0]->source
-             != p->output_pkts.packets[0]->source)) {
+                     && packets_->packets[0]->source
+                        != p->output_pkts.packets[0]->source)) {
         /* XXX: netdev-dpdk assumes that all packets in a single
          *      output batch has the same source. Flush here to
          *      avoid memory access issues. */
@@ -7375,9 +7384,9 @@ dp_execute_lb_output_action(struct dp_netdev_pmd_thread *pmd,
                             struct dp_packet_batch *packets_,
                             bool should_steal, uint32_t bond)
 {
-    struct dp_packet *packet;
     struct tx_bond *p_bond = tx_bond_lookup(&pmd->tx_bonds, bond);
     struct dp_packet_batch out;
+    struct dp_packet *packet;
 
     if (!p_bond) {
         COVERAGE_ADD(datapath_drop_invalid_bond,
@@ -7428,12 +7437,12 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,
     switch ((enum ovs_action_attr)type) {
     case OVS_ACTION_ATTR_OUTPUT:
         dp_execute_output_action(pmd, packets_, should_steal,
-                                     nl_attr_get_odp_port(a));
+                                 nl_attr_get_odp_port(a));
         return;
 
     case OVS_ACTION_ATTR_LB_OUTPUT:
         dp_execute_lb_output_action(pmd, packets_, should_steal,
-                                        nl_attr_get_u32(a));
+                                    nl_attr_get_u32(a));
         return;
 
     case OVS_ACTION_ATTR_TUNNEL_PUSH:
@@ -8050,9 +8059,9 @@ static int
 dpif_netdev_bond_add(struct dpif *dpif, uint32_t bond_id,
                      odp_port_t *slave_map)
 {
+    struct tx_bond *new_tx = xzalloc(sizeof *new_tx);
     struct dp_netdev *dp = get_dp_netdev(dpif);
     struct dp_netdev_pmd_thread *pmd;
-    struct tx_bond *new_tx = xzalloc(sizeof *new_tx);
 
     /* Prepare new bond mapping. */
     new_tx->bond_id = bond_id;
@@ -8068,14 +8077,13 @@ dpif_netdev_bond_add(struct dpif *dpif, uint32_t bond_id,
                      hash_bond_id(bond_id));
         ovsrcu_postpone(free, old_tx);
     } else {
-        cmap_insert(&dp->tx_bonds, &new_tx->node,
-                    hash_bond_id(bond_id));
+        cmap_insert(&dp->tx_bonds, &new_tx->node, hash_bond_id(bond_id));
     }
     ovs_mutex_unlock(&dp->bond_mutex);
 
     /* Update all PMDs with new bond mapping. */
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
-        dp_netdev_add_bond_tx_to_pmd(pmd, new_tx);
+        dp_netdev_add_bond_tx_to_pmd(pmd, new_tx, true);
     }
     return 0;
 }
@@ -8084,8 +8092,8 @@ static int
 dpif_netdev_bond_del(struct dpif *dpif, uint32_t bond_id)
 {
     struct dp_netdev *dp = get_dp_netdev(dpif);
-    struct tx_bond *tx;
     struct dp_netdev_pmd_thread *pmd;
+    struct tx_bond *tx;
 
     ovs_mutex_lock(&dp->bond_mutex);
     /* Check if bond existed. */
@@ -8096,7 +8104,7 @@ dpif_netdev_bond_del(struct dpif *dpif, uint32_t bond_id)
     } else {
         /* Bond is not present. */
         ovs_mutex_unlock(&dp->bond_mutex);
-        return 0;
+        return ENOENT;
     }
     ovs_mutex_unlock(&dp->bond_mutex);
 
@@ -8114,6 +8122,10 @@ dpif_netdev_bond_stats_get(struct dpif *dpif, uint32_t bond_id,
     struct dp_netdev *dp = get_dp_netdev(dpif);
     struct dp_netdev_pmd_thread *pmd;
 
+    if (!tx_bond_lookup(&dp->tx_bonds, bond_id)) {
+        return ENOENT;
+    }
+
     /* Search the bond in all PMDs. */
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
         struct tx_bond *pmd_bond_entry
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 781c7ca0b..0e024c1c9 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -624,7 +624,8 @@ struct dpif_class {
     /* Removes bond identified by 'bond_id' from 'dpif'. */
     int (*bond_del)(struct dpif *dpif, uint32_t bond_id);
 
-    /* Reads bond stats from 'dpif'. */
+    /* Reads bond stats from 'dpif'.  'n_bytes' should be an array with size
+     * sufficient to store BOND_BUCKETS number of elements. */
     int (*bond_stats_get)(struct dpif *dpif, uint32_t bond_id,
                           uint64_t *n_bytes);
 };
diff --git a/lib/dpif.c b/lib/dpif.c
index c8dc8ed71..c529a93f1 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1992,36 +1992,26 @@ dpif_meter_del(struct dpif *dpif, ofproto_meter_id meter_id,
 int
 dpif_bond_add(struct dpif *dpif, uint32_t bond_id, odp_port_t *slave_map)
 {
-    int error = -ENOTSUP;
-
-    if (dpif && dpif->dpif_class && dpif->dpif_class->bond_del) {
-        error = dpif->dpif_class->bond_add(dpif, bond_id, slave_map);
-    }
-
-    return error;
+    return dpif->dpif_class->bond_del
+           ? dpif->dpif_class->bond_add(dpif, bond_id, slave_map)
+           : EOPNOTSUPP;
 }
 
 int
 dpif_bond_del(struct dpif *dpif, uint32_t bond_id)
 {
-    int error = -ENOTSUP;
-
-    if (dpif && dpif->dpif_class && dpif->dpif_class->bond_del) {
-        error = dpif->dpif_class->bond_del(dpif, bond_id);
-    }
-
-    return error;
+    return dpif->dpif_class->bond_del
+           ? dpif->dpif_class->bond_del(dpif, bond_id)
+           : EOPNOTSUPP;
 }
 
 int
 dpif_bond_stats_get(struct dpif *dpif, uint32_t bond_id,
                     uint64_t *n_bytes)
 {
-    int error = -ENOTSUP;
+    memset(n_bytes, 0, BOND_BUCKETS * sizeof *n_bytes);
 
-    if (dpif && dpif->dpif_class && dpif->dpif_class->bond_stats_get) {
-        error = dpif->dpif_class->bond_stats_get(dpif, bond_id, n_bytes);
-    }
-
-    return error;
+    return dpif->dpif_class->bond_stats_get
+           ? dpif->dpif_class->bond_stats_get(dpif, bond_id, n_bytes)
+           : EOPNOTSUPP;
 }
diff --git a/lib/dpif.h b/lib/dpif.h
index 1cb5a7188..2d52f0186 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -898,11 +898,10 @@ int dpif_meter_del(struct dpif *, ofproto_meter_id meter_id,
 #define BOND_MASK 0xff
 #define BOND_BUCKETS (BOND_MASK + 1)
 
-int dpif_bond_add(struct dpif *dpif, uint32_t bond_id, odp_port_t *slave_map);
-int dpif_bond_del(struct dpif *dpif, uint32_t bond_id);
-int dpif_bond_stats_get(struct dpif *dpif, uint32_t bond_id,
-                        uint64_t *n_bytes);
-bool dpif_supports_lb_output_action(const struct dpif *dpif);
+int dpif_bond_add(struct dpif *, uint32_t bond_id, odp_port_t *slave_map);
+int dpif_bond_del(struct dpif *, uint32_t bond_id);
+int dpif_bond_stats_get(struct dpif *, uint32_t bond_id, uint64_t *n_bytes);
+bool dpif_supports_lb_output_action(const struct dpif *);
 
 

 /* Miscellaneous. */
diff --git a/ofproto/bond.c b/ofproto/bond.c
index 74e8d2e43..9947e7531 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -122,7 +122,7 @@ struct bond {
     enum lacp_status lacp_status; /* Status of LACP negotiations. */
     bool bond_revalidate;       /* True if flows need revalidation. */
     uint32_t basis;             /* Basis for flow hash function. */
-    bool use_lb_output_action;  /* Use lb-output-action to avoid recirculation.
+    bool use_lb_output_action;  /* Use lb_output action to avoid recirculation.
                                    Applicable only for Balance TCP mode. */
 
     /* SLB specific bonding info. */
@@ -183,10 +183,10 @@ static struct bond_slave *choose_output_slave(const struct bond *,
                                               struct flow_wildcards *,
                                               uint16_t vlan)
     OVS_REQ_RDLOCK(rwlock);
-static void update_recirc_rules__(struct bond *bond);
+static void update_recirc_rules__(struct bond *);
 static bool bond_is_falling_back_to_ab(const struct bond *);
-static void bond_add_lb_output_buckets(const struct bond *bond);
-static void bond_del_lb_output_buckets(const struct bond *bond);
+static void bond_add_lb_output_buckets(const struct bond *);
+static void bond_del_lb_output_buckets(const struct bond *);
 
 /* Attempts to parse 's' as the name of a bond balancing mode.  If successful,
  * stores the mode in '*balance' and returns true.  Otherwise returns false
@@ -340,8 +340,6 @@ update_recirc_rules__(struct bond *bond)
     struct ofpbuf ofpacts;
     int i;
 
-    ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub);
-
     HMAP_FOR_EACH(pr_op, hmap_node, &bond->pr_rule_ops) {
         pr_op->op = DEL;
     }
@@ -350,9 +348,8 @@ update_recirc_rules__(struct bond *bond)
         if (bond_use_lb_output_action(bond)) {
             bond_add_lb_output_buckets(bond);
             /* No need to install post recirculation rules as we are using
-             * lb-output-action with bond buckets.
+             * lb_output action with bond buckets.
              */
-            ofpbuf_uninit(&ofpacts);
             return;
         } else {
             for (i = 0; i < BOND_BUCKETS; i++) {
@@ -370,6 +367,8 @@ update_recirc_rules__(struct bond *bond)
         }
     }
 
+    ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub);
+
     HMAP_FOR_EACH_SAFE(pr_op, next_op, hmap_node, &bond->pr_rule_ops) {
         int error;
         switch (pr_op->op) {
@@ -485,8 +484,14 @@ bond_reconfigure(struct bond *bond, const struct bond_settings *s)
         bond->recirc_id = 0;
     }
     if (bond->use_lb_output_action != s->use_lb_output_action) {
-        bond->use_lb_output_action = s->use_lb_output_action;
-        revalidate = true;
+        if (s->use_lb_output_action &&
+            !ovs_lb_output_action_supported(bond->ofproto)) {
+            VLOG_WARN("%s: Datapath does not support 'lb_output' action, "
+                      "disabled.", bond->name);
+        } else {
+            bond->use_lb_output_action = s->use_lb_output_action;
+            revalidate = true;
+        }
     }
 
     if (bond->balance == BM_AB || !bond->hash || revalidate) {
@@ -965,7 +970,7 @@ bond_recirculation_account(struct bond *bond)
     OVS_REQ_WRLOCK(rwlock)
 {
     int i;
-    uint64_t n_bytes[BOND_BUCKETS] = {0};
+    uint64_t n_bytes[BOND_BUCKETS];
     bool use_lb_output_action = bond_use_lb_output_action(bond);
 
     if (use_lb_output_action) {
@@ -1384,6 +1389,7 @@ bond_print_details(struct ds *ds, const struct bond *bond)
     struct shash slave_shash = SHASH_INITIALIZER(&slave_shash);
     const struct shash_node **sorted_slaves = NULL;
     const struct bond_slave *slave;
+    bool use_lb_output_action;
     bool may_recirc;
     uint32_t recirc_id;
     int i;
@@ -1398,9 +1404,11 @@ bond_print_details(struct ds *ds, const struct bond *bond)
                   may_recirc ? "yes" : "no", may_recirc ? recirc_id: -1);
 
     ds_put_format(ds, "bond-hash-basis: %"PRIu32"\n", bond->basis);
-    ds_put_format(ds, "lb-output-action: %s, bond-id: %u\n",
-                  bond->use_lb_output_action ? "enabled" : "disabled",
-                  recirc_id);
+
+    use_lb_output_action = bond_use_lb_output_action(bond);
+    ds_put_format(ds, "lb_output action: %s, bond-id: %d\n",
+                  use_lb_output_action ? "enabled" : "disabled",
+                  use_lb_output_action ? recirc_id : -1);
 
     ds_put_format(ds, "updelay: %d ms\n", bond->updelay);
     ds_put_format(ds, "downdelay: %d ms\n", bond->downdelay);
@@ -1982,13 +1990,13 @@ bond_get_changed_active_slave(const char *name, struct eth_addr *mac,
 bool
 bond_use_lb_output_action(const struct bond *bond)
 {
-    return (bond_may_recirc(bond)) && bond->use_lb_output_action;
+    return bond_may_recirc(bond) && bond->use_lb_output_action;
 }
 
 static void
 bond_add_lb_output_buckets(const struct bond *bond)
 {
-    ofp_port_t slave_map[BOND_MASK];
+    ofp_port_t slave_map[BOND_BUCKETS];
 
     for (int i = 0; i < BOND_BUCKETS; i++) {
         struct bond_slave *slave = bond->hash[i].slave;
diff --git a/ofproto/bond.h b/ofproto/bond.h
index 5176b08d5..40c3258dc 100644
--- a/ofproto/bond.h
+++ b/ofproto/bond.h
@@ -58,7 +58,7 @@ struct bond_settings {
                                 /* The MAC address of the interface
                                    that was active during the last
                                    ovs run. */
-    bool use_lb_output_action;  /* Use lb-output-action. Only applicable for
+    bool use_lb_output_action;  /* Use lb_output action. Only applicable for
                                    bond mode BALANCE TCP. */
 };
 
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 57ac574c5..ad6c87850 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4207,7 +4207,18 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
         /* Commit accumulated flow updates before output. */
         xlate_commit_actions(ctx);
 
-        if (xr) {
+        if (xr && bond_use_lb_output_action(xport->xbundle->bond)) {
+            /*
+             * If bond mode is balance-tcp and optimize balance tcp is enabled
+             * then use the hash directly for slave selection and avoid
+             * recirculation.
+             *
+             * Currently support for netdev datapath only.
+             */
+            nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_LB_OUTPUT,
+                           xr->recirc_id);
+        } else if (xr) {
+            /* Recirculate the packet. */
             struct ovs_action_hash *act_hash;
 
             /* Hash action. */
@@ -4216,27 +4227,15 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
                 /* Algorithm supported by all datapaths. */
                 hash_alg = OVS_HASH_ALG_L4;
             }
+            act_hash = nl_msg_put_unspec_uninit(ctx->odp_actions,
+                                                OVS_ACTION_ATTR_HASH,
+                                                sizeof *act_hash);
+            act_hash->hash_alg = hash_alg;
+            act_hash->hash_basis = xr->hash_basis;
 
-            if (bond_use_lb_output_action(xport->xbundle->bond)) {
-                /*
-                 * If bond mode is balance-tcp and optimize balance tcp
-                 * is enabled then use the hash directly for slave selection
-                 * and avoid recirculation.
-                 *
-                 * Currently support for netdev datapath only.
-                 */
-                nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_LB_OUTPUT,
-                               xr->recirc_id);
-            } else {
-                act_hash = nl_msg_put_unspec_uninit(ctx->odp_actions,
-                                                    OVS_ACTION_ATTR_HASH,
-                                                    sizeof *act_hash);
-                act_hash->hash_alg = hash_alg;
-                act_hash->hash_basis = xr->hash_basis;
-                /* Recirculate the packet. */
-                nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC,
-                               xr->recirc_id);
-            }
+            /* Recirc action. */
+            nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC,
+                           xr->recirc_id);
         } else if (is_native_tunnel) {
             /* Output to native tunnel port. */
             native_tunnel_output(ctx, xport, flow, odp_port, truncate);
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 5380c9c3c..4f0638f23 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -868,6 +868,12 @@ ovs_explicit_drop_action_supported(struct ofproto_dpif *ofproto)
     return ofproto->backer->rt_support.explicit_drop_action;
 }
 
+bool
+ovs_lb_output_action_supported(struct ofproto_dpif *ofproto)
+{
+    return ofproto->backer->rt_support.lb_output_action;
+}
+
 /* Tests whether 'backer''s datapath supports recirculation.  Only newer
  * datapaths support OVS_KEY_ATTR_RECIRC_ID in keys.  We need to disable some
  * features on older datapaths that don't support this feature.
@@ -6634,13 +6640,6 @@ meter_del(struct ofproto *ofproto_, ofproto_meter_id meter_id)
     ovsrcu_postpone(free_meter_id, arg);
 }
 
-static bool
-is_lb_output_action_supported(const struct ofproto *ofproto_)
-{
-    struct ofproto_dpif *ofproto = ofproto_dpif_cast(ofproto_);
-    return ofproto->backer->rt_support.lb_output_action;
-}
-
 const struct ofproto_class ofproto_dpif_class = {
     init,
     enumerate_types,
@@ -6748,5 +6747,4 @@ const struct ofproto_class ofproto_dpif_class = {
     ct_flush,                   /* ct_flush */
     ct_set_zone_timeout_policy,
     ct_del_zone_timeout_policy,
-    is_lb_output_action_supported,
 };
diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h
index 9fc6f88c8..4e5ae0c9e 100644
--- a/ofproto/ofproto-dpif.h
+++ b/ofproto/ofproto-dpif.h
@@ -385,11 +385,11 @@ int ofproto_dpif_add_internal_flow(struct ofproto_dpif *,
                                    struct rule **rulep);
 int ofproto_dpif_delete_internal_flow(struct ofproto_dpif *, struct match *,
                                       int priority);
-int ofproto_dpif_add_lb_output_buckets(struct ofproto_dpif *dpif,
-                                       uint32_t bond_id,
+int ofproto_dpif_add_lb_output_buckets(struct ofproto_dpif *, uint32_t bond_id,
                                        const ofp_port_t *slave_map);
-int ofproto_dpif_delete_lb_output_buckets(struct ofproto_dpif *dpif,
+int ofproto_dpif_delete_lb_output_buckets(struct ofproto_dpif *,
                                           uint32_t bond_id);
+bool ovs_lb_output_action_supported(struct ofproto_dpif *);
 
 bool ovs_native_tunneling_is_on(struct ofproto_dpif *);
 
diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
index 571da24d8..afecb24cb 100644
--- a/ofproto/ofproto-provider.h
+++ b/ofproto/ofproto-provider.h
@@ -1905,9 +1905,6 @@ struct ofproto_class {
     /* Deletes the timeout policy associated with 'zone' in datapath type
      * 'dp_type'. */
     void (*ct_del_zone_timeout_policy)(const char *dp_type, uint16_t zone);
-
-    /* Return 'true' if datapath supports 'lb-output-action'. */
-    bool (*is_lb_output_action_supported)(const struct ofproto *);
 };
 
 extern const struct ofproto_class ofproto_dpif_class;
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index b34522a87..d8e7e8d09 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -1498,15 +1498,6 @@ ofproto_is_mirror_output_bundle(const struct ofproto *ofproto, void *aux)
             : false);
 }
 
-/* Returns true if datapath supports lb-output-action. */
-bool
-ofproto_is_lb_output_action_supported(const struct ofproto *ofproto)
-{
-    return (ofproto->ofproto_class->is_lb_output_action_supported
-            ? ofproto->ofproto_class->is_lb_output_action_supported(ofproto)
-            : false);
-}
-
 

 /* Configuration of OpenFlow tables. */
 
diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
index dda2733a6..2dd253167 100644
--- a/ofproto/ofproto.h
+++ b/ofproto/ofproto.h
@@ -505,7 +505,6 @@ unsigned int ofproto_aa_vlan_get_queue_size(struct ofproto *ofproto);
 
 int ofproto_set_flood_vlans(struct ofproto *, unsigned long *flood_vlans);
 bool ofproto_is_mirror_output_bundle(const struct ofproto *, void *aux);
-bool ofproto_is_lb_output_action_supported(const struct ofproto *ofproto);
 
 /* Configuration of OpenFlow tables. */
 struct ofproto_table_settings {
diff --git a/tests/lacp.at b/tests/lacp.at
index d7d98f7c3..df1691731 100644
--- a/tests/lacp.at
+++ b/tests/lacp.at
@@ -121,7 +121,7 @@ AT_CHECK([ovs-appctl bond/show], [0], [dnl
 bond_mode: active-backup
 bond may use recirculation: no, Recirc-ID : -1
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 0
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -287,7 +287,7 @@ slave: p3: current attached
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 1
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -303,7 +303,7 @@ slave p1: enabled
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 2
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -426,7 +426,7 @@ slave: p3: current attached
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 1
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -444,7 +444,7 @@ slave p1: enabled
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 2
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -560,7 +560,7 @@ slave: p3: current attached
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 1
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -578,7 +578,7 @@ slave p1: enabled
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 2
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -699,7 +699,7 @@ slave: p3: current attached
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 1
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
@@ -717,7 +717,7 @@ slave p1: enabled
 bond_mode: balance-tcp
 bond may use recirculation: yes, <del>
 bond-hash-basis: 0
-lb-output-action: disabled, bond-id: 2
+lb_output action: disabled, bond-id: -1
 updelay: 0 ms
 downdelay: 0 ms
 lacp_status: negotiated
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index de14fd4a7..f312efd8e 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -4601,18 +4601,10 @@ port_configure_bond(struct port *port, struct bond_settings *s)
         s->active_slave_mac = eth_addr_zero;
     }
 
-    /* lb-output-action is disabled by default. */
+    /* lb_output action is disabled by default. */
     s->use_lb_output_action = (s->balance == BM_TCP)
-                          && smap_get_bool(&port->cfg->other_config,
-                                         "lb-output-action", false);
-
-    /* Verify if datapath supports lb-output-action. */
-    if (s->use_lb_output_action
-        && !ofproto_is_lb_output_action_supported(port->bridge->ofproto)) {
-        VLOG_WARN("port %s: Unsupported lb-output-action in datapath, "
-                  "defaulting to false.", port->name);
-        s->use_lb_output_action = false;
-    }
+                              && smap_get_bool(&port->cfg->other_config,
+                                               "lb-output-action", false);
 }
 
 /* Returns true if 'port' is synthetic, that is, if we constructed it locally
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 5a5a47422..7a729631c 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -1996,7 +1996,7 @@
 
       <column name="other_config" key="lb-output-action"
               type='{"type": "boolean"}'>
-        Enable/disable usage of optimized lb-output-action for
+        Enable/disable usage of optimized <code>lb_output</code> action for
         balancing flows among output slaves in load balanced bonds in
         <code>balance-tcp</code>. When enabled, it uses optimized path for
         balance-tcp mode by using rss hash and avoids recirculation.
-- 
2.25.4


More information about the dev mailing list