[ovs-discuss] [OVN] Too many resubmits for packets coming from "external" network

Krzysztof Klimonda kklimonda at syntaxhighlighted.com
Tue Sep 29 11:07:38 UTC 2020


On Tue, Sep 29, 2020, at 12:40, Dumitru Ceara wrote:
> On 9/29/20 12:14 PM, Daniel Alvarez Sanchez wrote:
> > 
> > 
> > On Tue, Sep 29, 2020 at 11:14 AM Krzysztof Klimonda
> > <kklimonda at syntaxhighlighted.com
> > <mailto:kklimonda at syntaxhighlighted.com>> wrote:
> > 
> >     On Tue, Sep 29, 2020, at 10:40, Dumitru Ceara wrote:
> >     > On 9/29/20 12:42 AM, Krzysztof Klimonda wrote:
> >     > > Hi Dumitru,
> >     > >
> >     > > This cluster is IPv4-only for now - there are no IPv6 networks
> >     defined at all - overlay or underlay.
> >     > >
> >     > > However, once I increase a number of routers to ~250, a similar
> >     behavior can be observed when I send ARP packets for non-existing
> >     IPv4 addresses. The following warnings will flood ovs-vswitchd.log
> >     for every address not known to OVN when I run `fping -g
> >     192.168.0.0/16` <http://192.168.0.0/16>:
> >     > >
> >     > > ---8<---8<---8<---
> >     > >
> >     2020-09-28T22:26:40.967Z|21996|ofproto_dpif_xlate(handler6)|WARN|over 4096
> >     resubmit actions on bridge br-int while processing
> >     arp,in_port=1,vlan_tci=0x0000,dl_src=fa:16:3e:75:38:be,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.0.1,arp_tpa=192.168.0.35,arp_op=1,arp_sha=fa:16:3e:75:38:be,arp_tha=00:00:00:00:00:00
> >     > > ---8<---8<---8<---
> >     > >
> >     > > This is even a larger concern for me, as some of our clusters
> >     would be exposed to the internet where we can't easily prevent
> >     scanning of an entire IP range.
> >     > >
> >     > > Perhaps this is something that should be handled differently for
> >     traffic coming from external network? Is there any reason why OVN is
> >     not dropping ARP requests and IPv6 ND for IP addresses it knows
> >     nothing about? Or maybe OVN should drop most of BUM traffic on
> >     external network in general? I think all this network is used for is
> >     SNAT and/or SNAT+DNAT for overlay networks.
> >     > >
> >     >
> >     > Ok, so I guess we need a combination of the existing broadcast domain
> >     > limiting options:
> >     >
> >     > 1. send ARP/NS packets only to router ports that own the target IP
> >     address.
> >     > 2. flood IPv6 ND RS packets only to router ports with IPv6 addresses
> >     > configured and ipv6_ra_configs.address_mode set.
> >     > 3. according to the logical switch multicast configuration either
> >     flood
> >     > unkown IP multicast or forward it only to hosts that registered
> >     for the
> >     > IP multicast group.
> >     > 4. drop all other BUM traffic.
> >     >
> >     > From the above, 1 and 3 are already implemented. 2 is what I suggested
> >     > earlier. 4 would probably turn out to be configuration option that
> >     needs
> >     > to be explicitly enabled on the logical switch connected to the
> >     external
> >     > network.
> >     >
> >     > Would this work for you?
> > 
> >     I believe it would work for me, although it may be a good idea to
> >     consult with neutron developers and see if they have any input on that.
> > 
> > 
> > I think that's a good plan. Implementing 4) via a configuration option
> > sounds smart. From an OpenStack point of view, I think that as all the
> > ports are known, we can just have it on by default.
> > We need to make sure it works for 'edge' cases like virtual ports, load
> > balancers and subports (ports with a parent port and a tag) but the idea
> > sounds great to me.
> > 
> > Thanks folks for the discussion! 
> 
> Thinking more about it it's probably not OK to drop all other BUM
> traffic. Instead we should just flood it on all logical ports of a
> logical switch _except_ router ports.
> 
> Otherwise we'll be breaking E-W traffic between VIFs connected to the
> same logical switch. E.g., VM1 and VM2 connected to the same LS and VM1
> sending ARP request for VM2's IP.

Does it also matter for the LS that is used by openstack for external networks? We don't usually connect VMs directly to that network, instead using FIPs for some VMs and SNATing traffic from other VMs on the router. Or is it unrelated to how VM is connected to the network and it would break for example FIP<->FIP traffic?

> 
> > 
> > 
> >     >
> >     > Thanks,
> >     > Dumitru
> >     >
> >     > > -- Krzysztof Klimonda kklimonda at syntaxhighlighted.com
> >     <mailto:kklimonda at syntaxhighlighted.com> On Mon, Sep 28,
> >     > > 2020, at 21:14, Dumitru Ceara wrote:
> >     > >> On 9/28/20 5:33 PM, Krzysztof Klimonda wrote:
> >     > >>> Hi,
> >     > >>>
> >     > >> Hi Krzysztof,
> >     > >>
> >     > >>> We're still doing some scale tests of OpenStack ussuri with
> >     ml2/ovn driver. We've deployed 140 virtualized compute nodes, and
> >     started creating routers that share single external network between
> >     them. Additionally, each router is connected to a private network.
> >     > >>> Previously[1] we hit a problem of too many logical flows being
> >     generated per router connected to the same "external" network - this
> >     put too much stress on ovn-controller and ovs-vswitchd on compute
> >     nodes, and we've applied a patch[2] to limit a number of logical
> >     flows created per router.
> >     > >>> After we dealt with that we've done more testing and created
> >     200 routers connected to single external network. After that we've
> >     noticed the following logs in ovs-vswitchd.log:
> >     > >>>
> >     > >>> ---8<---8<---8<---
> >     > >>>
> >     2020-09-28T11:10:18.938Z|18401|ofproto_dpif_xlate(handler9)|WARN|over 4096
> >     resubmit actions on bridge br-int while processing
> >     icmp6,in_port=1,vlan_tci=0x0000,dl_src=fa:16:3e:9b:77:c3,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::f816:3eff:fe9b:77c3,ipv6_dst=ff02::2,ipv6_label=0x2564e,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0
> >     > >>> ---8<---8<---8<---
> >     > >>>
> >     > >>> That starts happening after I create ~178 routers connected to
> >     the same external network.
> >     > >>>
> >     > >>> IPv6 RS ICMP packets are coming from the external network -
> >     that's due to the fact that all virtual compute nodes have IPv6
> >     address on their interface used for the external network and are
> >     trying to discover a gateway. That's by accident, and we can remove
> >     IPv6 address from that interface, however I'm worried that it would
> >     just hide some bigger issue with flows generated by OVN.
> >     > >>>
> >     > >> Is this an IPv4 cluster; are there IPv6 addresses configured on the
> >     > >> logical router ports connected to the external network?
> >     > >>
> >     > >> If there are IPv6 addresses, do the logical router ports
> >     connected to
> >     > >> the external network have
> >     > >> Logical_Router_Port.ipv6_ra_configs.address_mode set?
> >     > >>
> >     > >> If not, we could try to enhance the broadcast domain limiting
> >     code in
> >     > >> OVN [3] to also limit sending router solicitations only to
> >     router ports
> >     > >> with address_mode configured.
> >     > >>
> >     > >> Regards,
> >     > >> Dumitru
> >     > >>
> >     > >> [3]
> >     > >>
> >     https://github.com/ovn-org/ovn/blob/20a20439219493f27eb222617f045ba54c95ebfc/northd/ovn-northd.c#L6424
> >     > >>
> >     > >>> software stack:
> >     > >>>
> >     > >>> ovn: 20.06.2
> >     > >>> ovs: 2.13.1
> >     > >>> neutron: 16.1.0
> >     > >>>
> >     > >>> [1]
> >     http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017370.html
> >     > >>> [2] https://review.opendev.org/#/c/752678/
> >     > >>>
> >     > >>
> >     >
> >     >
> >     _______________________________________________
> >     discuss mailing list
> >     discuss at openvswitch.org <mailto:discuss at openvswitch.org>
> >     https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > 
> 
>


More information about the discuss mailing list