[ovs-dev] [PATCH ovn v3 2/2] northd: Remove "reachable" functions and users of them.

Mark Michelson mmichels at redhat.com
Fri Mar 26 13:52:05 UTC 2021

On 3/26/21 7:35 AM, Dumitru Ceara wrote:
> On 3/26/21 12:32 PM, Dumitru Ceara wrote:
>> On 3/24/21 12:44 AM, Mark Michelson wrote:
>>> On 3/23/21 6:55 PM, Dumitru Ceara wrote:
>>>> On 3/23/21 10:02 PM, Mark Michelson wrote:
>>>>> On 3/23/21 12:13 PM, Dumitru Ceara wrote:
>>>>>> On 3/23/21 5:05 PM, Numan Siddique wrote:
>>>>>>> On Fri, Mar 19, 2021 at 2:20 AM Mark Michelson <mmichels at redhat.com>
>>>>>>> wrote:
>>>>>>>> Self-originated ARPs are intended to be sent to the "owning"
>>>>>>>> router for
>>>>>>>> a given IP address, as well as flooded to non-router ports on a
>>>>>>>> logical
>>>>>>>> switch.
>>>>>>>> When trying to determine the owning router for an IP address, we
>>>>>>>> would
>>>>>>>> iterate through load balancer and NAT addresses, and check if
>>>>>>>> these IP
>>>>>>>> addresses were "reachable" on this particular router. Reachable here
>>>>>>>> means that the NAT external IP or load balancer VIP falls within
>>>>>>>> any of
>>>>>>>> the networks served by this router. If an IP address were determined
>>>>>>>> not
>>>>>>>> to be reachable, then we would not try to send ARPs for that
>>>>>>>> particular
>>>>>>>> address to the router.
>>>>>>>> However, it is possible (and sometimes desired) to configure NAT
>>>>>>>> floating
>>>>>>>> IPs on a router that are not in any subnet handled by that router.
>>>>>>>> In this case, OVN refuses to send ARP requests to the router on
>>>>>>>> which the
>>>>>>>> floating IP has been configured. The result is that
>>>>>>>> internally-generated
>>>>>>>> traffic that targets the floating IP cannot reach its destination,
>>>>>>>> since the router on which the floating IP is configured never
>>>>>>>> receives ARPs
>>>>>>>> for the floating IP.
>>>>>>>> This patch fixes the issue by removing the reachability checks
>>>>>>>> altogether. If a router has a NAT external IP or load balancer VIP
>>>>>>>> that
>>>>>>>> is outside the range of any of its configured subnets, we still
>>>>>>>> should
>>>>>>>> send ARPs to that router for those requested addresses.
>>>>>>>> Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1929901
>>>>>>>> Signed-off-by: Mark Michelson <mmichels at redhat.com>
>>>>>>> Thanks for addressing this.
>>>>>>> Acked-by: Numan Siddique <numans at ovn.org>
>>>>>>> @Dumitru - Since you had added the code to limit ARPs.  Can you please
>>>>>>> also take a look at this patch ?
>>>>>> Hi Mark, Numan,
>>>>>> I've been thinking about this for a while.  I think this needs at
>>>>>> least:
>>>>>> Fixes: 1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS
>>>>>> broadcast domain.")
>>>>> OK, I can either just add that to the commit when I merge, or if I need
>>>>> to put a new version up for review, I can add it to the new version
>>>>> then.
>>>> Sounds good.
>> I investigated some more and the problem is actually because of the
>> combination of:
>> 1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS broadcast domain.")
>> and
>> 8c6a5bc21847 ("ovn-northd: Limit self originated ARP/ND broadcast domain.")
>> Without 8c6a5bc21847, the "reachable" functions just optimize ARP
>> request flooding but when the flows using them are not hit we always
>> fall back to flooding in the MC_FLOOD group on LS-PUB.
>> With 8c6a5bc21847, we instead flood all logical router self originated
>> ARP requests in the MC_FLOOD_L2 group.
>>>>>> With the latest changes in ovn-kubernetes I think that the above was
>>>>>> not
>>>>>> needed anyway.  Mark, do you have more details about this by any
>>>>>> chance?
>>>>> I'm not sure which changes in ovn-kubernetes you're referring to here.
>>>> I was thinking of this:
>>>> https://github.com/ovn-org/ovn-kubernetes/pull/2055
>>>> Specifically:
>>>> https://github.com/ovn-org/ovn-kubernetes/commit/ce2d312c7266c8d55173c9660002c67715a5444b
>>>> However, after a closer look, I'm not sure if dnat_and_snat usage is
>>>> completely removed in ovn-kubernetes.  This makes me wonder if reverting
>>>> 1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS broadcast
>>>> domain.") will not cause a regression in the ovn-kubernetes use case.
>>>>>> Initial bug report was here:
>>>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-June/050287.html
>>>>> To quote from that e-mail:
>>>>> "Question though is why any Pod on the logical switch would send an ARP
>>>>> for an IP that is not in its subnet. A packet from a Pod towards a non
>>>>> subnet IP should ARP only for the default gateway IP."
>>>>> The above holds true for the scenario I'm working here. The key is that
>>>>> this is a self-originated ARP from OVN, not an ARP from a pod/VM.
>>>>> The use case I'm solving here is OpenStack floating IPs. It's not
>>>>> common, but it is expected that a user is able to assign a floating IP
>>>>> to a router that does not service that floating IP's subnet. They also
>>>>> expect all other VMs in the cluster to be able to reach the target VM
>>>>> via its floating IP.
>>>>> So let's say you have a setup like:
>>>>>                         EXT-ROUTER
>>>>>                             ^
>>>>>                             |
>>>>>                             v
>>>>> VM1 <-> LS1 <-> LR1 <-> LS-PUB <-> LR2 <-> LS2 <-> VM2
>>>>> In this case, LR1 and LR2 both have distributed gateway router ports on
>>>>> them (the ports going to LS-PUB). They also have default static routes
>>>>> that point to the EXT-ROUTER. LR2's gateway router port IP address is
>>>>> An OpenStack user configures a floating IP for VM2 of
>>>>> This translates to a dnat_and_snat entry on LR2 for that
>>>>> address, which is NOT in the subnet configured on that router. Now VM1
>>>>> wants to ping In this case, when the ping reaches LR1, LR1
>>>>> needs to ARP to find where is. The ARP needs to target LR2
>>>>> since that's where the floating IP is configured. If the ARP only goes
>>>>> to EXT-ROUTER, EXT-ROUTER won't be able to respond to the ARP properly,
>>>>> and connectivity cannot be established.
>>>> I might be wrong but I'm still not sure how valid it is to define a
>>>> floating IP ( on a router that doesn't have network that
>>>> includes
>>> Unfortunately, this battle has been fought a couple of times already,
>>> and in the case of OpenStack, this is something they expect to work and
>>> currently does work with ml2-ovs.
>>>> Would it work if neutron-ovn configured a network on the
>>>> LR2-to-LS-PUB router port?
>>> Yes, that would make the router appear to be "reachable" with the
>>> current OVN code.
>>>> Otherwise, I think we need to clarify if ovn-kubernetes would still have
>>>> a scale issue or not.
>>>> A less pretty alternative is to allow such floating IPs (without a
>>>> network that includes that IP) only if a new config knob is set.
>>> I think there are a couple of valid options here:
>>> 1) A configuration option. The option could be configured on the NAT
>>> itself to allow for that address to be considered "reachable" on the
>>> router even if it lies outside the router subnet.
>>> 2) Automatically allow this behavior for "floating IP" NATs
>>> (dnat_and_snat, external_mac set, and port set).
>>> (2) Has the advantage of "just working" for the OpenStack FIP case while
>>> also not causing a flow explosion for ovn-kubernetes. However, (1) is
>>> more flexible, allowing for the option to be set on other DNATs as well.
>> Maybe an option (3) that would make all CMSs happy could be:
>> For all dnat_and_snat logical IPs that are not part of router subnets
>> and a single logical flow in table in_l2_lkup on LS-PUB flooding ARP
> s/and a single/add a single/
>> requests that target them in the MC_FLOOD group.  This shouldn't create
>> a scale issue because it's a single flow for all such IPs. If this
>> flow has higher priority than the flow restricting router self
>> originated ARPs to MC_FLOOD_L2 then it should also work fine.

OK, the idea here is that you put the known "unreachable" IP 
destinations in a higher priority than the known "reachable" ones, so 
that if one of the unreachable ones is targeted, we flood out all ports 
instead of just L2. That should work well for self-originated ARPs or 
ARPs from outside.

I think this would work. But also see my comments below.

>> The downside is that we *might* hit the OVS resubmit limit for ARPs
>> targeting such FIPs on scaled setups but, I guess, that's a different
>> story.

One thing Terry brought up with me yesterday is that it's odd we have to 
self-originate ARPs at all for NAT addresses that are configured on the 
logical network. We should be able to put add MAC_Binding entries for 
those addresses without actually sending an ARP. You told me yesterday 
in chat that in pinctrl, we have send_garp_locally(), which is supposed 
to do this. In my setup (and presumably with OpenStack), this is not 
happening, hence the need for ARPs.

If we could get the NAT addresses added to MAC_Binding without sending 
any ARPs, I think this would be for the best. And for deployments 
worried about the size of the MAC_Binding table, Han added the 
"always_learn_from_arp_request" option to force the need for an ARP in 
those cases. And in those cases, your suggestion about adding the new 
flow in ls_in_l2_lkup should work.

>>> I'm just bracing for when someone wants the same behavior for load
>>> balancers...
>> Also, I think the system test you're adding needs a small change as it
>> currently passes both with and without your patch.  We need to remove:
>> ovn-nbctl lr-route-add lr1 lr1-ls-pub
>> What do you think?

In this particular case, I can remove that static route. The only reason 
I can remove it in this case is because the floating IP happens to be in 
lr1's subnet. Otherwise, lr_in_ip_routing would drop the ping request. 
We would not even reach lr_in_arp_request. In the generalized case where 
a FIP could be outside the subnet of EITHER of the logical routers, then 
the static route would be necessary.

In the reference network I used from Terry, a difference was that there 
was an external router connected to ls-pub via a localnet port. lr1 and 
lr2 had default static routes added. Essentially:

ovn-nbctl lr-route-add lr1 <ext_router_ip> lr1-ls-pub
ovn-nbctl lr-route-add lr2 <ext_router_ip> lr2-ls-pub

This way, the pings (both request and response) would always make it 
past lr_in_ip_routing.

In my test, I was trying for a more minimal setup, with no external 
router. In order for the test to work, I needed to add static routes.

I think I need to just add a fake "external router" VM to the test and 
include the same default routes on lr1 and lr2 in order to simulate the 
OpenStack scenario better. Having no external router means I have to add 
static routes on each router (for the general case), which trivializes 
the scenario. I also will change the test so that FIP so that it is not 
in either logical router's subnet.

>> Regards,
>> Dumitru

More information about the dev mailing list