[ovs-discuss] [ovn] Broken ovs localport flow for ovnmeta namespaces created by neutron

Krzysztof Klimonda kklimonda at syntaxhighlighted.com
Tue Dec 15 10:38:26 UTC 2020


Hi,

Just as a quick update - I've updated our ovn version to 20.12.0 snapshot (d8bc0377c) and so far the problem hasn't yet reoccurred after over 24 hours of tempest testing.

Best Regards,
-Chris


On Tue, Dec 15, 2020, at 11:13, Daniel Alvarez Sanchez wrote:
> Hey Krzysztof,
> 
> On Fri, Nov 20, 2020 at 1:17 PM Krzysztof Klimonda <kklimonda at syntaxhighlighted.com> wrote:
>> Hi,
>> 
>> Doing some tempest runs on our pre-prod environment (stable/ussuri with ovn 20.06.2 release) I've noticed that some network connectivity tests were failing randomly. I've reproduced that by conitnously rescuing and unrescuing instance - network connectivity from and to VM works in general (dhcp is fine, access from outside is fine), however VM has no access to its metadata server (via 169.254.169.254 ip address). Tracing packet from VM to metadata via:
>> 
>> ----8<----8<----8<----
>> ovs-appctl ofproto/trace br-int in_port=tapa489d406-91,dl_src=fa:16:3e:2c:b0:fd,dl_dst=fa:16:3e:8b:b5:39
>> ----8<----8<----8<----
>> 
>> ends with
>> 
>> ----8<----8<----8<----
>> 65. reg15=0x1,metadata=0x97e, priority 100, cookie 0x15ec4875
>>     output:1187
>>      >> Nonexistent output port
>> ----8<----8<----8<----
>> 
>> And I can verify that there is no flow for the actual ovnmeta tap interface (tap67731b0a-c0):
>> 
>> ----8<----8<----8<----
>> # docker exec -it openvswitch_vswitchd ovs-ofctl dump-flows br-int |grep -E output:'("tap67731b0a-c0"|1187)'
>>  cookie=0x15ec4875, duration=1868.378s, table=65, n_packets=524, n_bytes=40856, priority=100,reg15=0x1,metadata=0x97e actions=output:1187
>> #
>> ----8<----8<----8<----
>> 
>> From ovs-vswitchd.log it seems the interface tap67731b0a-c0 was added with index 1187, then deleted, and re-added with index 1189 - that's probably due to the fact that that is the only VM in that network and I'm constantly hard rebooting it via rescue/unrescue:
>> 
>> ----8<----8<----8<----
>> 2020-11-20T11:41:18.347Z|08043|bridge|INFO|bridge br-int: added interface tap67731b0a-c0 on port 1187
>> 2020-11-20T11:41:30.813Z|08044|bridge|INFO|bridge br-int: deleted interface tapa489d406-91 on port 1186
>> 2020-11-20T11:41:30.816Z|08045|bridge|WARN|could not open network device tapa489d406-91 (No such device)
>> 2020-11-20T11:41:31.040Z|08046|bridge|INFO|bridge br-int: deleted interface tap67731b0a-c0 on port 1187
>> 2020-11-20T11:41:31.044Z|08047|bridge|WARN|could not open network device tapa489d406-91 (No such device)
>> 2020-11-20T11:41:31.050Z|08048|bridge|WARN|could not open network device tapa489d406-91 (No such device)
>> 2020-11-20T11:41:31.235Z|08049|connmgr|INFO|br-int<->unix#31: 2069 flow_mods in the last 43 s (858 adds, 814 deletes, 397 modifications)
>> 2020-11-20T11:41:33.057Z|08050|bridge|INFO|bridge br-int: added interface tapa489d406-91 on port 1188
>> 2020-11-20T11:41:33.582Z|08051|bridge|INFO|bridge br-int: added interface tap67731b0a-c0 on port 1189
>> 2020-11-20T11:42:31.235Z|08052|connmgr|INFO|br-int<->unix#31: 168 flow_mods in the 2 s starting 59 s ago (114 adds, 10 deletes, 44 modifications) 
>> ----8<----8<----8<----
>> 
>> Once I restart ovn-controller it recalculates local ovs flows and the problem is fixed so I'm assuming it's a local problem and not related to NB and SB databases.
>> 
> 
> I have seen exactly the same which with 20.09, for the same port input and output ofports do not match:
> 
> bash-4.4# ovs-ofctl dump-flows br-int table=0 | grep 745
>  cookie=0x38937d8e, duration=40387.372s, table=0, n_packets=1863, n_bytes=111678, idle_age=1, priority=100,in_port=745 actions=load:0x4b->NXM_NX_REG13[],load:0x6a->NXM_NX_REG11[],load:0x69->NXM_NX_REG12[],load:0x18d->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)
> 
> 
> bash-4.4# ovs-ofctl dump-flows br-int table=65 | grep 8937d8e
>  cookie=0x38937d8e, duration=40593.699s, table=65, n_packets=1848, n_bytes=98960, idle_age=2599, priority=100,reg15=0x1,metadata=0x18d actions=output:737
> 
> 
> In table=0, the ofport is fine (745) but in the output stage it is using a different one (737).
> 
> By checking the OVS database transaction history, that port, at some point, had the id 737:
> 
> record 6516: 2020-12-14 22:22:54.184
> 
>   table Interface row "tap71a5dfc1-10" (073801e2):
>     ofport=737
>   table Open_vSwitch row 1d9566c8 (1d9566c8):
>     cur_cfg=2023
> 
> So it looks like ovn-controller is not updating the ofport in the physical flows for the output stage.
> 
> We'll try to figure out if this happens also in master.
> 
> Thanks,
> daniel
>  
>> -- 
>>   Krzysztof Klimonda
>>   kklimonda at syntaxhighlighted.com
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20201215/06353b74/attachment.html>


More information about the discuss mailing list