[ovs-git] [openvswitch/ovs] fb7a75: ofproto-dpif-xlate: Terminate native tunnels only ...

Ilya Maximets noreply at github.com
Fri Nov 19 17:40:30 UTC 2021


  Branch: refs/heads/master
  Home:   https://github.com/openvswitch/ovs
  Commit: fb7a75e52321d748189ab9faed154f176d139b93
      https://github.com/openvswitch/ovs/commit/fb7a75e52321d748189ab9faed154f176d139b93
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-11-19 (Fri, 19 Nov 2021)

  Changed paths:
    M ofproto/ofproto-dpif-xlate.c
    M tests/packet-type-aware.at
    M tests/tunnel-push-pop.at

  Log Message:
  -----------
  ofproto-dpif-xlate: Terminate native tunnels only on ports with IP addresses.

Commit dc0bd12f5b04 removed restriction that tunnel endpoint must be a
bridge port.  So, currently OVS has to check if the native tunnel needs
to be terminated regardless of the output port.  Unfortunately, there
is a side effect: tnl_port_map_lookup() always adds at least 'dl_dst'
match to the megaflow that ends up in the corresponding datapath flow.
And since tunneling works on L3 level and not restricted by any
particular bridge, this extra match criteria is added to every
datapath flow on every bridge even if that bridge cannot be part of
a tunnel processing.

For example, if OVS has at least one tunnel configured and we're
adding a completely separate bridge with 2 ports and simple rules
to forward packets between two ports, there still will be a match on
a destination mac address:

 1. <create a tunnel configuration in OVS>
 2. ovs-vsctl add-br br-non-tunnel -- set bridge datapath_type=netdev
 3. ovs-vsctl add-port br-non-tunnel port0
           -- add-port br-non-tunnel port1
 4. ovs-ofctl del-flows br-non-tunnel
 5. ovs-ofctl add-flow br-non-tunnel in_port=port0,actions=port1
 6. ovs-ofctl add-flow br-non-tunnel in_port=port1,actions=port0

 # ovs-appctl ofproto/trace br-non-tunnel in_port=port0

 Flow: in_port=1,vlan_tci=0x0000,
       dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,dl_type=0x0000

 bridge("br-non-tunnel")
 -----------------------
  0. in_port=1, priority 32768
     output:2

 Final flow: unchanged
 Megaflow: recirc_id=0,eth,in_port=1,dl_dst=00:00:00:00:00:00,dl_type=0x0000
 Datapath actions: 5                 ^^^^^^^^^^^^^^^^^^^^^^^^

This increases the number of upcalls and installed datapath flows,
since separate flow needs to be installed per destination MAC, reducing
the switching performance.  This also blocks datapath performance
optimizations that are based on the datapath flow simplicity.

In general, in order to be a tunnel endpoint, port has to have an IP
address.  Hence native tunnel termination should be attempted only
for such ports.  This allows to avoid extra matches in most cases.

Fixes: dc0bd12f5b04 ("userspace: Enable non-bridge port as tunnel endpoint.")
Reported-by: Sriharsha Basavapatna <sriharsha.basavapatna at broadcom.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388904.html
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
Acked-by: Mike Pattrick <mkp at redhat.com>


  Commit: 7b8aeadd60c82f99a9d791a7df7c617254654f9d
      https://github.com/openvswitch/ovs/commit/7b8aeadd60c82f99a9d791a7df7c617254654f9d
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-11-19 (Fri, 19 Nov 2021)

  Changed paths:
    M lib/ovsdb-idl-provider.h
    M lib/ovsdb-idl.c

  Log Message:
  -----------
  ovsdb-idl: Re-parse backrefs of inserted rows only once.

While adding new rows ovsdb-idl re-parses all the other rows that
references this new one.  For example, current ovn-kubernetes creates
load balancers and adds the same load balancer to all logical switches
and logical routers.  So, then a new load balancer is added, rows for
all logical switches and routers re-parsed.

During initial database connection (or re-connection with
monitor/monitor_cond or monitor_cond_since with outdated last
transaction id) the client downloads the whole content of a database.
In case of OVN, there might be already thousands of load balancers
configured.  ovsdb-idl will process rows in that initial monitor reply
one-by-one.  Therefore, for each load balancer row, it will re-parse
all rows for switches and routers.

Assuming that we have 120 Logical Switches and 30K load balancers.
Processing of the initial monitor reply will take 120 (switch rows) *
30K (load balancer references in a switch row) * 30K (load balancer
rows) = 10^11 operations, which may take hours.  ovn-kubernetes will
use LB groups eventually, but there are other less obvious cases that
cannot be changed that easily.

Re-parsing doesn't change any internal structures of the IDL.  It
destroys and re-creates exactly same arcs between rows.  The only
thing that changes is the application-facing array of pointers.

Since internal structures remains intact, suggested solution is to
postpone the re-parsing of back references until all the monitor
updates processed.  This way we can re-parse each row only once.

Tested in a sandbox with 120 LSs, 120 LRs and 3K LBs, where each
load balancer added to each LS and LR, by re-statring ovn-northd and
measuring the time spent in ovsdb_idl_run().

Before the change:

  OVN_Southbound: ovsdb_idl_run took: 924 ms
  OVN_Northbound: ovsdb_idl_run took: 825118 ms  --> 13.75 minutes!

After:

  OVN_Southbound: ovsdb_idl_run took: 692 ms
  OVN_Northbound: ovsdb_idl_run took: 1698 ms

Acked-by: Dumitru Ceara <dceara at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/9fe0ce4f7258...7b8aeadd60c8


More information about the git mailing list