[ovs-dev] [ovn] problem: long tcp session instantiation with stateful ACLs

Vladislav Odintsov odivlad at gmail.com
Fri Nov 26 21:05:22 UTC 2021


Hi Dumitru, Numan,

I’ve sent a corresponding patch to openvswitch with my findings.
It’d be great if you can take a look on it.
Thanks.

https://patchwork.ozlabs.org/project/openvswitch/patch/20211126205942.9354-1-odivlad@gmail.com/

Regards,
Vladislav Odintsov

> On 21 Sep 2021, at 14:43, Dumitru Ceara <dceara at redhat.com> wrote:
> 
> On 9/21/21 1:33 PM, Vladislav Odintsov wrote:
>> Hi Dumitru,
> 
> Hi Vladislav,
> 
>> 
>> are you talking about any specific _mising_ patch?
> 
> No, sorry for the confusion.  I just meant there's a bug in the OOT
> module that was probably already fixed in the in-tree one so, likely,
> one would have to figure out the patch that fixed it.
> 
>> 
>> Regards,
>> Vladislav Odintsov
> 
> Regards,
> Dumitru
> 
>> 
>>> On 16 Sep 2021, at 19:09, Dumitru Ceara <dceara at redhat.com> wrote:
>>> 
>>> On 9/16/21 4:18 PM, Vladislav Odintsov wrote:
>>>> Sorry, by OOT I meant non-inbox kmod.
>>>> I’ve tried to use inbox kernel module (from kernel package) and problem resolved.
>>>> 
>>>> Regards,
>>>> Vladislav Odintsov
>>>> 
>>>>> On 16 Sep 2021, at 17:17, Vladislav Odintsov <odivlad at gmail.com> wrote:
>>>>> 
>>>>> Hi Dumitru,
>>>>> 
>>>>> I’ve tried to exclude OOT OVS kernel module.
>>>>> With OVN 20.06.3 + OVS 2.13.4 the problem solved.
>>>>> 
>>>>> Could you please try with OOT kmod? For me it looks like a bug in OOT OVS kernel module code.
>>> 
>>> You're right, this seems to be a missing patch in the OOT openvswitch
>>> module.  I could replicate the problem you reported with the OOT module.
>>> 
>>> Regards,
>>> Dumitru
>>> 
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Regards,
>>>>> Vladislav Odintsov
>>>>> 
>>>>>> On 16 Sep 2021, at 11:02, Dumitru Ceara <dceara at redhat.com <mailto:dceara at redhat.com> <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>> wrote:
>>>>>> 
>>>>>> On 9/16/21 2:50 AM, Vladislav Odintsov wrote:
>>>>>>> Hi Dumitru,
>>>>>>> 
>>>>>>> thanks for your reply.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Vladislav Odintsov
>>>>>>> 
>>>>>>>> On 15 Sep 2021, at 11:24, Dumitru Ceara <dceara at redhat.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Vladislav,
>>>>>>>> 
>>>>>>>> On 9/13/21 6:14 PM, Vladislav Odintsov wrote:
>>>>>>>>> Hi Numan,
>>>>>>>>> 
>>>>>>>>> I’ve checked with OVS 2.16.0 and OVN master. The problem persists.
>>>>>>>>> Symptoms are the same.
>>>>>>>>> 
>>>>>>>>> # grep ct_zero_snat /var/log/openvswitch/ovs-vswitchd.log
>>>>>>>>> 2021-09-13T16:10:01.792Z|00019|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_zero_snat
>>>>>>>> 
>>>>>>>> This shouldn't be related to the problem we fixed with ct_zero_snat.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Vladislav Odintsov
>>>>>>>>> 
>>>>>>>>>> On 13 Sep 2021, at 17:54, Numan Siddique <numans at ovn.org> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Mon, Sep 13, 2021 at 8:10 AM Vladislav Odintsov <odivlad at gmail.com <mailto:odivlad at gmail.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> we’ve encountered a next problem with stateful ACLs.
>>>>>>>>>>> 
>>>>>>>>>>> Suppose, we have one logical switch (ls1) and attached to it a VIF type logical ports (lsp1, lsp2).
>>>>>>>>>>> Each logical port has a linux VM besides it.
>>>>>>>>>>> 
>>>>>>>>>>> Logical ports reside in port group (pg1) and two ACLs are created within this PG:
>>>>>>>>>>> to-lport outport == @pg1 && ip4 && ip4.dst == 0.0.0.0/0 allow-related
>>>>>>>>>>> from-lport outport == @pg1 && ip4 && ip4.src == 0.0.0.0/0 allow-related
>>>>>>>>>>> 
>>>>>>>>>>> When we have a high-connection rate service between VMs, the tcp source/dest ports may be reused before the connection is deleted from LSP’s-related conntrack zones on the host.
>>>>>>>>>>> Let’s use curl with passing --local-port argument to have each time same source port.
>>>>>>>>>>> 
>>>>>>>>>>> Run it from VM to another VM (172.31.0.18 -> 172.31.0.17):
>>>>>>>>>>> curl --local-port 44444 http://172.31.0.17/
>>>>>>>>>>> 
>>>>>>>>>>> Check connections in client’s and server’s vif zones (client - zone=20, server - zone=1):
>>>>>>>>>>> run while true script to check connections state per-second, while running new connection with same source/dest 5-tuple:
>>>>>>>>>>> 
>>>>>>>>>>> while true; do date; grep -e 'zone=1 ' -e zone=20 /proc/net/nf_conntrack; sleep 0.2; done
>>>>>>>>>>> 
>>>>>>>>>>> Right after we’ve succesfully run curl, the connection is getting time-closed and next time-wait states:
>>>>>>>>>>> 
>>>>>>>>>>> Mon Sep 13 14:34:39 MSK 2021
>>>>>>>>>>> ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2
>>>>>>>>>>> ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2
>>>>>>>>>>> Mon Sep 13 14:34:39 MSK 2021
>>>>>>>>>>> ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2
>>>>>>>>>>> ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2
>>>>>>>>>>> 
>>>>>>>>>>> And it remains in time-wait state for nf_conntrack_time_wait_timeout (120 seconds for centos 7).
>>>>>>>>>>> 
>>>>>>>>>>> Everything is okay for now.
>>>>>>>>>>> While we have installed connections in TW state in zone 1 and 20, lets run this curl (source port 44444) again:
>>>>>>>>>>> 1st SYN packet is lost. It didn’t get to destination VM. In conntrack we have:
>>>>>>>>>>> 
>>>>>>>>>>> Mon Sep 13 14:34:41 MSK 2021
>>>>>>>>>>> ipv4     2 tcp      6 118 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2
>>>>>>>>>>> 
>>>>>>>>>>> We see that TW connection was dropped in source vif’s zone (20).
>>>>>>>>>>> 
>>>>>>>>>>> Next, after one second TCP sends retry and connection in destination (server’s) zone is dropped and a new connection is created in source zone (client’s):
>>>>>>>>>>> 
>>>>>>>>>>> Mon Sep 13 14:34:41 MSK 2021
>>>>>>>>>>> ipv4     2 tcp      6 120 SYN_SENT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 [UNREPLIED] src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 mark=0 zone=20 use=2
>>>>>>>>>>> 
>>>>>>>>>>> Server VM still didn’t get this SYN packet. It got dropped.
>>>>>>>>>>> 
>>>>>>>>>>> Then, after 2 seconds TCP sends retry again and connection is working well:
>>>>>>>>>>> 
>>>>>>>>>>> Mon Sep 13 14:34:44 MSK 2021
>>>>>>>>>>> ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2
>>>>>>>>>>> ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2
>>>>>>>>>>> Mon Sep 13 14:34:44 MSK 2021
>>>>>>>>>>> ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2
>>>>>>>>>>> ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2
>>>>>>>>>>> 
>>>>>>>>>>> I guess, that it could happen:
>>>>>>>>>>> 1. Run curl with an empty conntrack zones. Everything is good, we’ve got http response, closed the connection. There’s one TW entry in client’s and one in server’s zonntrack zones.
>>>>>>>>>>> 2. Run curl with same source port within nf_conntrack_time_wait_timeout seconds.
>>>>>>>>>>> 2.1. OVS gets packet from VM, sends it to client’s conntrack zone=20. It matches pre-existed conntrack entry in tw state from previous curl run. TW connection in conntrack is deleted. A copy of a packet is returned to OVS and recirculated packet has ct.inv (?) and !ct.trk states and got dropped (I’m NOT sure, it’s just an assumption!)
>>>>>>>>>>> 3. After one second client VM resends TCP SYN.
>>>>>>>>>>> 3.1. OVS gets packet, sends through client’s conntrack zone=20, a new connection is added, packet has ct.trk and ct.new states set. Packet goes to recirculation.
>>>>>>>>>>> 3.2. OVS sends packet to server’s conntrack zone=1. It matches pre-existed conntrack entry in tw state from previous run. Conntrack removes this entry. Packet is returned to OVS with ct.inv (?) and !ct.trk. Packet got dropped.
>>>>>>>>>>> 4. Client’s VM again sends TCP SYN after 2 more seconds left.
>>>>>>>>>>> 4.1 OVS gets packet from client’s VIF, sends to client’s conntrack zone=20, it matches pre-existed SYN_SENT conntrack entry state, packets is returned to OVS with ct.new, ct.trk flags set.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 4.2 OVS sends packet to server’s conntrack zone=1. Conntrack table for zone=1 is empty, it adds new entry, returns packet to OVS with ct.trk and ct.new flags set.
>>>>>>>>>>> 4.3 OVS sends packet to server’s VIF, next traffic operates normally.
>>>>>>>>>>> 
>>>>>>>>>>> So, with such behaviour connection establishment sometimes takes up to three seconds (2 TCP SYN retries) and makes troubles in overlay services. (Application timeouts and service outages).
>>>>>>>>>>> 
>>>>>>>>>>> I’ve checked how conntrack works inside VMs with such traffic and it looks like if conntrack gets a packet within a TW connection it recreates a new conntrack entry. No tuning inside VMs was performed. As a server I used apache with default config from CentOS distribution.
>>>>>>>> 
>>>>>>>> I don't have a centos 7 at hand but I do have a rhel 7
>>>>>>>> (3.10.0-1160.36.2.el7.x86_64) and I didn't manage to hit the issue you
>>>>>>>> reported here (using OVS and OVN upstream master).  The SYN matching the
>>>>>>>> conntrack entry in state TIME_WAIT moves the entry to NEW and seems to
>>>>>>>> be forwarded just fine, the session afterwards go to ESTABLISHED.
>>>>>>>> 
>>>>>>>> Wed Sep 15 04:18:35 AM EDT 2021
>>>>>>>> conntrack v1.4.5 (conntrack-tools): 7 flow entries have been shown.
>>>>>>>> tcp      6 431930 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141
>>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 [ASSURED]
>>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=6 use=1
>>>>>>>> tcp      6 431930 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141
>>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 [ASSURED]
>>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=3 use=1
>>>>>>>> --
>>>>>>>> Wed Sep 15 04:18:36 AM EDT 2021
>>>>>>>> conntrack v1.4.5 (conntrack-tools): 7 flow entries have been shown.
>>>>>>>> tcp      6 119 TIME_WAIT src=42.42.42.2 dst=42.42.42.3 sport=4141
>>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 [ASSURED]
>>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=6 use=1
>>>>>>>> tcp      6 119 TIME_WAIT src=42.42.42.2 dst=42.42.42.3 sport=4141
>>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 [ASSURED]
>>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=3 use=1
>>>>>>>> --
>>>>>>>> Wed Sep 15 04:18:38 AM EDT 2021
>>>>>>>> conntrack v1.4.5 (conntrack-tools): 7 flow entries have been shown.
>>>>>>>> tcp      6 431999 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141
>>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 [ASSURED]
>>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=6 use=1
>>>>>>>> tcp      6 431999 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141
>>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 [ASSURED]
>>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=3 use=1
>>>>>>>> --
>>>>>>>> 
>>>>>>>> DP flows just after the second session is initiated also seem to confirm
>>>>>>>> that everything is fine:
>>>>>>>> 
>>>>>>>> # ovs-appctl dpctl/dump-flows | grep -oE "ct_state(.*),ct_label"
>>>>>>>> ct_state(+new-est-rel-rpl-inv+trk),ct_label
>>>>>>>> ct_state(-new+est-rel-rpl-inv+trk),ct_label
>>>>>>>> ct_state(-new+est-rel+rpl-inv+trk),ct_label
>>>>>>>> ct_state(+new-est-rel-rpl-inv+trk),ct_label
>>>>>>>> ct_state(-new+est-rel+rpl-inv+trk),ct_label
>>>>>>>> ct_state(-new+est-rel-rpl-inv+trk),ct_label
>>>>>>>> 
>>>>>>>> I also tried it out on a Fedora 34 with 5.13.14-200.fc34.x86_64, still
>>>>>>>> works fine.
>>>>>>>> 
>>>>>>>> What kernel and openvswitch module versions do you use?
>>>>>>>> 
>>>>>>> On my box there is CentOS 7.5 with kernel 3.10.0-862.14.4.el7 and OOT kernel module.
>>>>>>> I’ve tested two versions, in both the problem was hit:
>>>>>>> openvswitch-kmod-2.13.4-1.el7_5.x86_64
>>>>>>> openvswitch-kmod-2.16.0-1.el7_5.x86_64
>>>>>>> 
>>>>>>> Do you think the problem could be related to kernel (conntrack) and kernel must be upgraded here?
>>>>>>> Or, maybe I should try master OVS, as you did?
>>>>>> 
>>>>>> I just tried with OVS v2.13.4, OVN master and it all worked fine (both
>>>>>> on Fedora 34 and rhel 7).  I don't think the problem is in user space.
>>>>>> 
>>>>>> Regards,
>>>>>> Dumitru
>>>>>> 
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev at openvswitch.org <mailto:dev at openvswitch.org> <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org>> <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org> <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org>>>
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev><https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev at openvswitch.org <mailto:dev at openvswitch.org>
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>>>> 
>>> 
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> 
>> 
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



More information about the dev mailing list