[ovs-discuss] Megaflow Inspection

Matan Rosenberg matan129 at gmail.com
Mon Jan 13 08:10:46 UTC 2020


After seeing this article (
https://developers.redhat.com/blog/2017/04/06/direct-kernel-open-vswitch-flow-programming/),
I've experimented a bit more.
I've set up a single in-kernel datapath, and added two veth interfaces to
it (like in the example I've provided before).

Then, in order to emulate an OpenFlow flood rule, I've added two
(mega)flows manually:

in_port(1),eth() 2
in_port(2),eth() 1

(1)
When passing non-vlan tagged packets, the datapath forwards the packets as
expected.
In contrast, when passing vlan-tagged packets, or malformed packets (beyond
the Ethernet layer), the packets are *lost*, as confirmed by dpctl stats.

This means that for some reason, the above flows keys + single mask fail to
match these packets, albeit these rules only look at the most low-level
fields.
(Actually, this behaviour is hinted here with
http://docs.openvswitch.org/en/latest/topics/datapath/)

(2)
After learning this, I've set up again a similar bridge, now with
ovs-vswitchd (using the managed ovs-system datapath).
I've injected the exact same packets to this bridge, and the results are
not surprsing;
The vlan-tagged/malformed packets are now not dropped, but they *do* cause
upcalls, and the megaflows they are assigned are highly-specific (matching
vlan and encap()ed fields, having an empty encap() if malformed).
(Of course, it explains the packet loss from (1) - there's no vswitchd to
upcall in the manual scenario).

What I discern from the above it that it's not only a vswitchd problem, but
a kernel one; I somehow have to make the kernel correctly match against my
flow keys.

I've thought about writing a small daemon that will wrap all the packets
with a fixed Ethernet/IP header, just to make sure OVS does not look into
the payload, but it's very hacky.
Also, it'll have to support decapsulation in the other end of the tunnel...

Is there any way to make OVS ignore these fields?
If necessary, I'm able to implement patches to OVS, but I'll need a bit of
guidance here.

Matan



On Thu, 9 Jan 2020 at 09:52, Matan Rosenberg <matan129 at gmail.com> wrote:

> Thanks for the quick responses.
>
> Levi - you'd provided a lot of info, I'm still looking into some of the
> points. At time point, this is what I know:
>
> 1)  No, Scapy is used only to create the packets. I can make a very
> diverse cap and then send it with tcpreplay, it should be fast enough -
> I'll look into it.
> 2) In general, I don't think that the veth pair performance is the
> bottleneck here.
> 3) In production, according to dpctl/dump-flows I see ~7k megaflows, and
> about ~3k masks (!). The hit/pkt is around 1k, which is *huge*.
> 4) About the TCP offloading: I don't think this is it, but I'll check.
>
> Ben - I've taken a second look at the megaflows in production, and mapped
> the fields that are matched beyond the default dl_type and IP fragmentation.
> I have some bridges with flood rules, some with normal (MAC learning), and
> some of them also contain vxlan ports (remote_ip and vni are defined per
> port, not flow based).
>
> The fields that I see matched are:
>
> Both actions=flood/normal (or even just output to manually specified
> ports):
>
>    - VLAN IDs
>    - We use about 3K vlans out of the available 4K vlan range, so it's
>       quite a lot.
>       - Most of the traffic is vlan tagged, so this applies to most
>       megaflows.
>       - Just to clarify, I don't actually need OVS to care about the
>       vlans.
>       - If a packet is vlan-tagged, it's eth_type and fragmentation is
>    also matched via encap(eth_type(...)).
>       - This makes a cartesian product: the handful eth_types we have
>       times the number of active vlans.
>    - Tunnel-related fields, but that's normal for the vxlan ports.
>    - I also see some other IP flags being matched, like tos and tclass.
>
> Only with actions=normal (MAC learning):
>
>    - I obviously also see dl_src/dst addresses, which is sensible.
>    - Additionaly, I see OVS matching against specific ARP fields (for
>    example, src/dst IP).
>
>
>
> On Wed, 8 Jan 2020 at 02:25, Levente Csikor <levente.csikor at gmail.com>
> wrote:
>
>> Hi Matan,
>>
>> I guess you were referring to my talk at OVS Fall 2018 :)
>> As Ben has pointed out in his last email, even you are matching only on
>> the in_port, because of your (not-manually-inserted) default drop
>> rule(s), you will still have a couple of megaflow entries to handle
>> different packets (as you could see, usually they are an IPv6 related
>> discovery message and an ARP).
>>
>> Before going into the megaflow cache code, according to your setup,
>> could you confirm the following things?
>>
>> 1) by using scapy for generating the packets, are you actually able to
>> achieve the intended packet rate at the generator?
>>
>> 2) if YES: without OVS, can you see at the other end of your veth pair
>> the performance you are having for generation?
>> ----
>> These two things can easily be the bottleneck, so we have to justify
>> that they are not the bad guys in your case.
>>
>>
>> 3) After checking the megaflow entries with the command Ben has shared
>> (ovs-dpctl show/dump-flows), how many entries/masks did you see?
>> (Note, I did not go through thoroughly your flow rules and packets)
>> If the number is just a handful, then megaflow won't be you issue!
>> If the number is more than ~100, it still would not be an issue,
>> however if it is then it can be caused by two reasons:
>>  - you are using an OVS (version), which is delivered by your
>> distribution -> we realized (in 2018 with Ben et al.) that the default
>> kernel module coming with the distribution has the microflow cache
>> switched OFF (the main networking guys responsible for the kernel
>> modules are not a huge fans of caching) - so you either enable it
>> (somehow if possible) or simply install OVS from source.
>>  - OR there are still some issues with your VETHS! we experienced such
>> a bad performance with relatively low number of masks and traffic, if
>> TCP offload was switched off on the physical NIC, or when we were using
>> UDP packets (as there is no offloading function for UDP).
>> Have you tried playing these values for your veth (ethtool -K <iface>)?
>> TL;DR recently, I have experienced that switching off TCP offloading
>> for a veth (that I don't think it should have an effect) produced
>> better throughput :/
>>
>> After you can check this things, we will be much smarter ;)
>>
>> Cheers,
>> Levi
>>
>> On Wed, 2020-01-08 at 00:52 +0200, Matan Rosenberg wrote:
>> > Running oproto/trace unfortunately does not explain why OVS chose to
>> > look at these fields.
>> > Using the same setup, for example:
>> >
>> > #  ovs-appctl ofproto/trace br0 in_port=a-
>> > blue,dl_src=11:22:33:44:55:66,dl_dst=aa:bb:cc:dd:ee:ff,ipv4,nw_src=1.
>> > 2.3.4
>> > Flow:
>> > ip,in_port=4,vlan_tci=0x0000,dl_src=11:22:33:44:55:66,dl_dst=aa:bb:cc
>> > :dd:ee:ff,nw_src=1.2.3.4,nw_dst=0.0.0.0,nw_proto=0,nw_tos=0,nw_ecn=0,
>> > nw_ttl=0
>> >
>> > bridge("br0")
>> > -------------
>> >  0. in_port=4, priority 32768
>> >     output:5
>> >
>> > Final flow: unchanged
>> > Megaflow: recirc_id=0,eth,ip,in_port=4,nw_frag=no
>> > Datapath actions: 3
>> >
>> > It seems that the OpenFlow rule (not to be confused with the megaflow
>> > entry) was correctly identified, and no other actions take place.
>> > Since the relevant OpenFlow rule has nothing to do with the IP layer,
>> > I don't understand why the megaflow is aware of it.
>> >
>> > I'll try to look at the classifier/megaflow code (?) tomorrow, but
>> > I'd like to know if there's a high-level way to avoid such trouble.
>> >
>> > Thanks
>> >
>> > On Wed, 8 Jan 2020 at 00:39, Ben Pfaff <blp at ovn.org> wrote:
>> > > On Tue, Jan 07, 2020 at 10:44:57PM +0200, Matan Rosenberg wrote:
>> > > > Acutally, I do think I have a megaflow (or other caching) issue.
>> > > >
>> > > > We use OVS for L2 packet forwarding; that is, given a packet, we
>> > > don't need
>> > > > OVS to look at other protocols beyond the Ethernet layer.
>> > > > Additionally, we use VXLAN to establish L2 overlay networks
>> > > across multiple
>> > > > OVS servers.
>> > > >
>> > > > Just to make thing clear, these are some typical flow rules that
>> > > you might
>> > > > see on a bridge:
>> > > >
>> > > > - in_port=1,actions=2,3
>> > > > - in_port=42,actions=FLOOD
>> > > > - actions=NORMAL
>> > > >
>> > > > No IP matching, conntrack, etc.
>> > > >
>> > > > We're experiencing severe performance issues with OVS - in this
>> > > use case,
>> > > > it cannot handle more than couple thousand packets/s.
>> > > > After some exploring, I've noticed that the installed megaflows
>> > > try to
>> > > > match on fields that are not present in the rules, apparently for
>> > > no reason.
>> > > > Here's a complete example to reproduce, using OVS 2.12.0:
>> > > >
>> > > > # ip link add dev a-blue type veth peer name a-red
>> > > > # ip link add dev b-blue type veth peer name b-red
>> > > >
>> > > > # ovs-vsctl add-br br0
>> > > > # ovs-vsctl add-port br0 a-blue
>> > > > # ovs-vsctl add-port br0 b-blue
>> > > >
>> > > > # ovs-ofctl del-flows br0
>> > > > # ovs-ofctl add-flow br0 in_port=a-blue,actions=b-blue
>> > > > # ovs-ofctl add-flow br0 in_port=b-blue,actions=a-blue
>> > > >
>> > > > After injecting ~100 random packets (IP, IPv6, TCP, UDP, ARP with
>> > > random
>> > > > addresses) to one of the red interfaces (with
>> > > https://pastebin.com/Y6dPFCKJ),
>> > > > these are the installed flows:
>> > > > # ovs-dpctl dump-flows
>> > > > recirc_id(0),in_port(2),eth(),eth_type(0x0806), packets:54,
>> > > bytes:2268,
>> > > > used:1.337s, actions:3
>> > > > recirc_id(0),in_port(2),eth(),eth_type(0x86dd),ipv6(frag=no),
>> > > packets:28,
>> > > > bytes:1684, used:1.430s, flags:S, actions:3
>> > > > recirc_id(0),in_port(2),eth(),eth_type(0x0800),ipv4(frag=no),
>> > > packets:15,
>> > > > bytes:610, used:1.270s, flags:S, actions:3
>> > > >
>> > > > As you can see, for some reason, OVS had split the single
>> > > relevant OpenFlow
>> > > > rule to three separate megaflows, one for each eth_type (and even
>> > > other
>> > > > fields - IP fragmentation?).
>> > > > In my production scenario, the packets are even more diversified,
>> > > and we
>> > > > see OVS installing flows which match on even more fields,
>> > > including
>> > > > specific Ethernet and IP addresses.
>> > > >
>> > > > This leads to a large number of flows that have extremely low hit
>> > > rate -
>> > > > each flow handles not more than ~100 packets (!) during its
>> > > entire lifetime.
>> > > >
>> > > > We suspect that this causes the performance peanalty; either
>> > > > 1) The EMC/megaflow table is full, so vswitchd upcalls are all
>> > > over the
>> > > > place, or
>> > > > 2) The huge number of inefficient megaflows leads to terrible
>> > > lookup times
>> > > > in the in-kernel megaflow table itslef (due to large number of
>> > > masks, etc.)
>> > > >
>> > > > In short: how can I just make OVS oblivious to these fields? Why
>> > > does it
>> > > > try to match on irrlevant fields?
>> > >
>> > > I can see how this would be distressing.
>> > >
>> > > You can use ofproto/trace with a few examples to help figure out
>> > > why OVS
>> > > is matching on more fields than you expect.
>> >
>> > _______________________________________________
>> > discuss mailing list
>> > discuss at openvswitch.org
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200113/a6275a59/attachment.html>


More information about the discuss mailing list