[ovs-dev] [PATCH ovn v6 0/5] ARP and Floating IP Fixes

Mark Michelson mmichels at redhat.com
Fri Apr 30 19:29:54 UTC 2021


On 4/27/21 3:40 PM, Numan Siddique wrote:
> On Fri, Apr 23, 2021 at 3:17 PM Mark Michelson <mmichels at redhat.com> wrote:
>>
>> This patch series aims to fix issues seen in OpenStack deployments when
>> floating IPs were assigned to routers, and those floating IPs were not
>> part of any subnet configured on that router.
>>
>> Originally, this was a two patch series but it has bloomed into a 5
>> patch series.
>>
>> Patch 1 fixes the scenario where a VM attempts to reach a floating IP on
>> the directly connected router. This has been part of this patch series
>> since v1.
>>
>> Patch 2 is an incidental fix that removes a redundant paragraph from
>> documentation.
>>
>> Patches 3 and 4 work towards pre-allocating MAC_Bindings for known
>> router addresses. Patch 3 is the northd side, placing all
>> router_addresses in the connected logical switch port's Port_Binding
>> record. Patch 4 is the ovn-controller side, adding the MAC_Bindings
>> based on the Port_Binding's router_addresses.
>>
>> And Patch 5 addresses the situation for when the pre-allocated
>> MAC_Bindings cannot be used. For this situation, we will flood the ARP
>> request if the TPA is for a configured IP address that is outside the
>> connected routers' subnets.
>> ---
>> v5 -> v6:
>> * Patch 3 now only saves gateway router addresses to the connected
>>    switch's router_addresses column. Previous versions saved all router
>>    addresses to all connected switches' columns.
>> * Patch 5 has two new tests added. One ensures that the priority 90
>>    flows that flood ARP for unreachable addresses are present. The other
>>    is a restored system test that ensures that a ping to a floating IP
>>    outside of the router's subnet succeeds.
>> * Patch 4 has a small change of types from int to size_t for a loop
>>    index.
>>
>> v4 -> v5:
>> Fixed memory leaks in patch 3 and patch 4. Patches 1, 2,  and 5 are the
>> same as in v4.
>> ---
>>
>> Mark Michelson (5):
>>    northd: Swap src and dst eth addresses in router egress loop.
>>    ovn-sb: Remove redundant "nat-addresses" information from
>>      Port_Binding.
>>    northd: Save all router addresses in Port_Bindings
>>    pinctrl: Add Chassis MAC_Bindings for all router addresses.
>>    northd: Flood ARPs to routers for "unreachable" addresses.
>>
>>   controller/ovn-controller.c |   4 +
>>   controller/pinctrl.c        | 300 +++++++++++++++++++++-------
>>   controller/pinctrl.h        |   1 +
>>   northd/ovn-northd.8.xml     |   8 +
>>   northd/ovn-northd.c         | 377 ++++++++++++++++++++++++------------
>>   northd/ovn_northd.dl        | 146 ++++++++++----
>>   ovn-sb.ovsschema            |   8 +-
>>   ovn-sb.xml                  |  37 +++-
>>   tests/ovn-controller.at     | 179 +++++++++++++++++
>>   tests/ovn-northd.at         | 305 +++++++++++++++++++++++++++++
>>   tests/system-ovn.at         | 218 +++++++++++++++++++++
>>   11 files changed, 1348 insertions(+), 235 deletions(-)
> 
> Hi Mark,
> 
> Thanks for the v6.
> 
> I see 2 issues with this version.
> 
> 1. I still see that the port_Binding (of a logical switch port has
> router_addresses) set even if its peer is not
>      a gateway router port.  Is it required that we need to store this
> information ?
> 
>      Please check this out - http://paste.openstack.org/show/804814/
>      I was expecting that router_addressses need not to be set for
> port_bindings sw0-lr0 and sw1-lr0.
> 
> 2. There are many system tests which are failing.  I see the below
> warning message in the test suite logs
> 
> --- /dev/null   2021-04-26 15:23:24.901985828 -0400
> +++ /home/nusiddiq/workspace_cpp/ovn-org/ovn/_gcc_system/tests/system-kmod-testsuite.dir/at-groups/1/stdout
>      2021-04-27 15:11:09.277572760 -0400
> @@ -0,0 +1 @@
> +2021-04-27T19:11:06.035Z|00020|ovsdb_idl|WARN|transaction error:
> {"details":"Transaction causes multiple rows in \"MAC_Binding\" table
> to have identical values (R1_join and \"20.0.0.2\") for index on
> columns \"logical_port\" and \"ip\".  First row, with UUID
> a426a9eb-121b-4b1b-91d1-8f2ba095e3b7, existed in the database before
> this transaction and was not modified by the transaction.  Second row,
> with UUID 92af4be3-1cb6-4287-bae7-b823479476ed, was inserted by this
> transaction.","error":"constraint violation"}
> ovsdb-server.log:
> 
> Looks like all the system tests are running as expected, but in the
> end the test fails because ovn-controller.log has the above warning
> message.
> 
> It needs to be seen if this warning message can be whitelisted or not.

I figured out what's happening here, but I'm not sure the proper way to 
fix it.

pinctrl uses an OVSDB IDL index to check if a MAC Binding we want to 
insert is already in the SBDB. If it exists already, we instead just 
update the existing MAC Binding. If it doesn't exist yet, we insert the 
new MAC Binding.

In loop 1, we check the index, see that there is no MAC Binding, and 
insert a new one. At the end of loop 1, we call 
ovsdb_idl_loop_commit_and_wait(), which returns -1. This indicates the 
transaction is still in progress.

In loop 2, we check the index, and we still see that there is no MAC 
Binding, so we attempt to insert a new one. But on the server, when we 
attempt to insert this one, it reports an error since the MAC Binding we 
inserted in loop 1 is present.

Somehow in loop 2, we need to be able to detect that the MAC Binding we 
want to add already exists in the SB DB, even though presumably we 
haven't received the update notification that the MAC Binding is 
present. Any ideas?

I don't think this should be whitelisted. I think this indicates an 
error somewhere in the chain and we need to fix it properly.

> 
> Thanks
> Numan
> 
> 
>>
>> --
>> 2.29.2
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 



More information about the dev mailing list