[ovs-discuss] balance-tcp vs. balance-slb w/ lacp and megaflows (or the lack of)

Tue Aug 30 16:05:24 UTC 2016

Hi,

tldr:
 1) using balance-tcp prevents ovs from using megaflows?
 2) balance-slb documentation is unclear when used with lacp?
 -> questions at the end

== background ==

I'm using openvswitch in its default out of the box mode as a mac
learning L2 switch with multiple vlans for virtual machines on
Debian/Xen hypervisors.

OpenvSwitch on a physical server is connected with 2x1G ethernet ports
to a 2x cisco 3750-X switch stack, using a bond with lacp.

Average traffic levels are measured in Mb/s, not Gb/s. Traffic between
the physical servers and the switch is avg. between ~20 to ~150 Mbit/s.

Most of the servers use Debian Jessie (linux 3.16-ckt), Xen 4.4 and
OpenvSwitch 2.3.0. A few of them are still running Debian Wheezy (linux
3.2), Xen 4.1 and OpenvSwitch 1.4.2.

== traffic interruptions ==

Last weeks I've been investigating short network interruptions in our
production network, lasting a few seconds each, starting to occur a few
times a week during the last two months. Symptoms where flapping
behaviour of vrrp on routers and flapping load balancer health checks.

The first issue that was found was short bursts of unicast flooding on
the network, mainly caused by a specific case of asymmetric routing. The
asymmetric routing caused asymmetric L2 traffic, expiring mac address to
port mappings, having unicast flooding as a result.

It seems ovs having to duplicate a stream of traffic into 100+ virtual
nics, does not help network stability(tm).

The asymmetric routing case could quite easily be solved by some added
routing policies to send traffic back where it came from, instead of
taking a shortcut to a connected network. This solved almost all of the
unicast flooding.

== balance-tcp, flow counts, misses and lost flows ==

While investigating, I also started having a closer look at the
behaviour of openvswitch. I added a little plugin to our monitoring
(munin) to graph counters from the output of ovs-dpctl show.

Especially the "lost" counter, which was increasing from time to time in
several places caught the attention.

When looking at the flows themselves (with ovs-dpctl dump-flows), I
found out that the "megaflow" optimization that was introduced in ovs
long time ago did not seem to be applied at all in our case.

A significant number of flows are related to dns resolver traffic,
containing all possible fields like source and destination ports,
meaning all packets go to userspace, and the resulting flows will not be
reused at all.

Example:

in_port(1),eth(src=02:00:52:5e:bc:05,dst=02:00:52:5e:bc:03),
eth_type(0x8100),vlan(vid=10,pcp=0),encap(eth_type(0x0800),
ipv4(src=82.94.188.6,dst=82.94.240.117,proto=17,tos=0,ttl=64,frag=no),
udp(src=53,dst=50464)), packets:0, bytes:0, used:never,
actions:pop_vlan,213

So, in a worst case scenario, when I need to do a dns request on a
resolver that's behind a load balancer, with two routers in between,
having all of them live on a different physical server, that means that
about 14 extremely specific flows need to be set up and removed again.
If I do 1000 requests, it's 14000 etc...

When searching for information about this, I came across an old mailing
list post from Ben,
http://openvswitch.org/pipermail/discuss/2014-January/012769.html
suggesting that using balance-tcp prevents the use of megaflows. The
idea made sense when I read that, because the hashing algo probably
wants to have the L4 info available.

== balance-slb ==

In a test environment, I tried to see what happens when changing the
hashing method from balance-tcp to balance-slb. After all, with our
traffic volumes, spreading the traffic a bit based on mac/vlan alone is
sufficient.

The results were a devastating decrease of flow misses, lost flows and
flow counts in general in all graphs.

Yesterday, I applied the change to the production network, with the same
results. Well, except for the wheezy ones, with older ovs, but we're
emptying them anyway.

Some graphs: https://syrinx.knorrie.org/~knorrie/keep/ovs/

The switch from balance-tcp to balance-slb is where most of the lines
show a steep drop. The lost flows (yellow) seen yesterday but not today
is the result of fixing a case of asymmetric traffic spikes. The wheezy
boxes (with the greek letter names) do not show improvement, but we're
emptying them anyway. There's a few busy routers (traffic complexity,
not volume) on there, which need to move first.

== questions ==

1. Does using balance-tcp hashing on a bond disable megaflows? If so,
why isn't there a huge warning about this and the significant resulting
performance hit in the man page?

2. The documentation about balance-slb is confusing. In "Bonding
Configuration", the text suggests that balance-slb and active-backup
requires "On the upstream switch, do not configure the interfaces as a
bond".

Also, the vswitchd/INTERNALS file lists "Bond Balance Modes". It feels
like the whole concept of choosing which ports are active (bond mode)
and otoh choosing what traffic to throw at the active ports (hashing
algorithm) are used interchangeably in a confusing way.

Am I right to assume that all the potential problems listed at "SLB
Bonding" in vswitchd/INTERNALS do not apply to my situation, if I use
LACP and the balance-slb hashing method on top? There's a single line of
hope inside the documentation, which seems to suggest this: "after LACP
negotiation is complete, there is no need for special handling of
received packets".

Thanks,
Hans van Kranenburg