[ovs-dev] [PATCH ovn v3 2/2] northd: Remove "reachable" functions and users of them.

Dumitru Ceara dceara at redhat.com
Tue Mar 23 22:55:27 UTC 2021


On 3/23/21 10:02 PM, Mark Michelson wrote:
> On 3/23/21 12:13 PM, Dumitru Ceara wrote:
>> On 3/23/21 5:05 PM, Numan Siddique wrote:
>>> On Fri, Mar 19, 2021 at 2:20 AM Mark Michelson <mmichels at redhat.com>
>>> wrote:
>>>>
>>>> Self-originated ARPs are intended to be sent to the "owning" router for
>>>> a given IP address, as well as flooded to non-router ports on a logical
>>>> switch.
>>>>
>>>> When trying to determine the owning router for an IP address, we would
>>>> iterate through load balancer and NAT addresses, and check if these IP
>>>> addresses were "reachable" on this particular router. Reachable here
>>>> means that the NAT external IP or load balancer VIP falls within any of
>>>> the networks served by this router. If an IP address were determined
>>>> not
>>>> to be reachable, then we would not try to send ARPs for that particular
>>>> address to the router.
>>>>
>>>> However, it is possible (and sometimes desired) to configure NAT
>>>> floating
>>>> IPs on a router that are not in any subnet handled by that router.
>>>> In this case, OVN refuses to send ARP requests to the router on
>>>> which the
>>>> floating IP has been configured. The result is that
>>>> internally-generated
>>>> traffic that targets the floating IP cannot reach its destination,
>>>> since the router on which the floating IP is configured never
>>>> receives ARPs
>>>> for the floating IP.
>>>>
>>>> This patch fixes the issue by removing the reachability checks
>>>> altogether. If a router has a NAT external IP or load balancer VIP that
>>>> is outside the range of any of its configured subnets, we still should
>>>> send ARPs to that router for those requested addresses.
>>>>
>>>> Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1929901
>>>>
>>>> Signed-off-by: Mark Michelson <mmichels at redhat.com>
>>>
>>> Thanks for addressing this.
>>>
>>> Acked-by: Numan Siddique <numans at ovn.org>
>>>
>>> @Dumitru - Since you had added the code to limit ARPs.  Can you please
>>> also take a look at this patch ?
>>>
>>
>> Hi Mark, Numan,
>>
>> I've been thinking about this for a while.  I think this needs at least:
>>
>> Fixes: 1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS
>> broadcast domain.")
> 
> OK, I can either just add that to the commit when I merge, or if I need
> to put a new version up for review, I can add it to the new version then.

Sounds good.

> 
>>
>> With the latest changes in ovn-kubernetes I think that the above was not
>> needed anyway.  Mark, do you have more details about this by any chance?
> 
> I'm not sure which changes in ovn-kubernetes you're referring to here.

I was thinking of this:

https://github.com/ovn-org/ovn-kubernetes/pull/2055

Specifically:

https://github.com/ovn-org/ovn-kubernetes/commit/ce2d312c7266c8d55173c9660002c67715a5444b

However, after a closer look, I'm not sure if dnat_and_snat usage is
completely removed in ovn-kubernetes.  This makes me wonder if reverting
1e07781310d8 ("ovn-northd: Fix logical flows to limit ARP/NS broadcast
domain.") will not cause a regression in the ovn-kubernetes use case.

> 
>>
>> Initial bug report was here:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-June/050287.html
> 
> To quote from that e-mail:
> 
> "Question though is why any Pod on the logical switch would send an ARP
> for an IP that is not in its subnet. A packet from a Pod towards a non
> subnet IP should ARP only for the default gateway IP."
> 
> The above holds true for the scenario I'm working here. The key is that
> this is a self-originated ARP from OVN, not an ARP from a pod/VM.
> 
> The use case I'm solving here is OpenStack floating IPs. It's not
> common, but it is expected that a user is able to assign a floating IP
> to a router that does not service that floating IP's subnet. They also
> expect all other VMs in the cluster to be able to reach the target VM
> via its floating IP.
> 
> So let's say you have a setup like:
> 
> 
>                       EXT-ROUTER
>                           ^
>                           |
>                           v
> VM1 <-> LS1 <-> LR1 <-> LS-PUB <-> LR2 <-> LS2 <-> VM2
> 
> In this case, LR1 and LR2 both have distributed gateway router ports on
> them (the ports going to LS-PUB). They also have default static routes
> that point to the EXT-ROUTER. LR2's gateway router port IP address is
> 172.18.1.1/24. An OpenStack user configures a floating IP for VM2 of
> 172.18.2.100. This translates to a dnat_and_snat entry on LR2 for that
> address, which is NOT in the subnet configured on that router. Now VM1
> wants to ping 172.18.2.100. In this case, when the ping reaches LR1, LR1
> needs to ARP to find where 172.18.2.100 is. The ARP needs to target LR2
> since that's where the floating IP is configured. If the ARP only goes
> to EXT-ROUTER, EXT-ROUTER won't be able to respond to the ARP properly,
> and connectivity cannot be established.

I might be wrong but I'm still not sure how valid it is to define a
floating IP (172.18.2.100) on a router that doesn't have network that
includes 172.18.2.100.

Would it work if neutron-ovn configured a 172.18.2.100/32 network on the
LR2-to-LS-PUB router port?

Otherwise, I think we need to clarify if ovn-kubernetes would still have
a scale issue or not.

A less pretty alternative is to allow such floating IPs (without a
network that includes that IP) only if a new config knob is set.

Regards,
Dumitru



More information about the dev mailing list