[ovs-dev] [PATCH v2] ipf: fix only nat the first fragment in the reass process

Aaron Conole aconole at redhat.com
Tue Aug 24 17:43:57 UTC 2021


Aaron Conole <aconole at redhat.com> writes:

> Ilya Maximets <i.maximets at ovn.org> writes:
>
>> On 8/12/21 6:17 PM, Aaron Conole wrote:
>>> wenxu at ucloud.cn writes:
>>> 
>>>> From: wenxu <wenxu at ucloud.cn>
>>>>
>>>> The ipf collect original fragment packets and reass a new pkt
>>>> to do the conntrack logic. After finsh the conntrack things
>>>> copy the ct meta info to each orignal packet and modify the
>>>> l4 header in the first fragment. It should modify the ip src/
>>>> dst info for all the fragments.
>>>>
>>>> Signed-off-by: wenxu <wenxu at ucloud.cn>
>>>> Co-authored-by: luke.li <luke.li at ucloud.cn>
>>>> Signed-off-by: luke.li <luke.li at ucloud.cn>
>>>> ---
>>> 
>>> Acked-by: Aaron Conole <aconole at redhat.com>
>>> 
>>> Thanks for the fix.  I see it can work for any l3 protocol.
>>> 
>>> Based on the comments you supplied, I wrote the following test case.  It
>>> can either be folded in by you (or Ilya on apply), or I can submit as a
>>> separate patch (in case you are worried about having my sign-off /
>>> coauthor on this patch).
>>> 
>>> When testing 'make check-system-userspace' before this patch, I see a
>>> failure and get the following tcpdump logged:
>>> 
>>>   12:15:31 aconole at RHTPC1VM0NT {master} ~/git/ovs$ sudo tcpdump -r
>>> tests/system-userspace-testsuite.dir/078/p1.pcap
>>>   reading from file
>>> tests/system-userspace-testsuite.dir/078/p1.pcap, link-type EN10MB
>>> (Ethernet), snapshot length 262144
>>>   dropped privs to tcpdump
>>>   12:07:21.364925 ARP, Request who-has 10.2.1.2 tell 10.2.1.1, length 28
>>>   12:07:21.364928 ARP, Reply 10.2.1.2 is-at e6:45:4a:80:7c:61 (oui Unknown), length 28
>>>   12:07:21.365095 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 40165, seq 1, length 1480
>>>   12:07:21.365099 IP 10.2.1.1 > 10.1.1.2: icmp
>>>   12:07:21.365101 IP 10.2.1.1 > 10.1.1.2: icmp
>>>   12:07:21.365102 IP 10.2.1.1 > 10.1.1.2: icmp
>>> 
>>> We see the first frag correct, but subsequent frags are broken.
>>> 
>>> This test worked both for userspace and kernel datapath on my local
>>> system.
>>
>> Hmm.  This test fails for me for both kernel and userspace:
>
> Okay, I'll try it again on my system.  For reference, I was on F34,
> kernel 5.12.12-300.fc34.x86_64
>
>> tcpdump -r tests/system-userspace-testsuite.dir/078/p0.pcap
>> 15:17:12.832383 ARP, Request who-has 10.2.1.2 tell 10.2.1.1, length 28
>> 15:17:12.834317 ARP, Reply 10.2.1.2 is-at 46:0c:83:aa:6e:b0 (oui Unknown), length 28
>> 15:17:12.834327 IP 10.2.1.1 > 10.1.1.2: ICMP echo request, id 27759, seq 1, length 1480
>> 15:17:12.834329 IP 10.2.1.1 > 10.1.1.2: icmp
>> 15:17:12.834330 IP 10.2.1.1 > 10.1.1.2: icmp
>> 15:17:12.834332 IP 10.2.1.1 > 10.1.1.2: icmp
>>
>> tcpdump -r tests/system-userspace-testsuite.dir/078/p1.pcap
>> 15:17:12.833542 ARP, Request who-has 10.2.1.2 tell 10.2.1.1, length 28
>> 15:17:12.834994 IP 10.1.1.1 > 10.1.1.2: ICMP echo request, id 27759, seq 1, length 1480
>> 15:17:12.834999 IP 10.1.1.1 > 10.1.1.2: icmp
>> 15:17:12.835002 IP 10.1.1.1 > 10.1.1.2: icmp
>> 15:17:12.835004 IP 10.1.1.1 > 10.1.1.2: icmp
>>
>> ping -c 1 10.1.1.2 -M dont -s 4500 | grep "transmitted" | sed 's/time.*ms$/time 0ms/'
>> NS_EXEC_HEREDOC
>> --- -   2021-08-16 15:17:22.844535052 -0400
>> +++ /root/ovs/tests/system-userspace-testsuite.dir/at-groups/78/stdout
>> @@ -1,2 +1,2 @@
>> -1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> +1 packets transmitted, 0 received, 100% packet loss, time 0ms
>>
>> # uname -a
>> Linux rhel8 4.18.0-305.3.1.el8_4.x86_64
>>
>> I'm not sure what is going on.  Could you, please, re-check?
>
> I'll boot an rhel8.4 instance and try it out.
>
>> I will not apply this patch for now until we figure out how to test it.
>
> Okay.
>

I hope this change works.  I altered the test environment to use a
single port so instead of having a dummy attached to the bridge, I now
use lo to be the receiver.

  [ 'client' ] < --- > [ bridge ] < --- > [ 'server' ][ lo ]

Before it was

  [ 'client' ] < --- > [ bridge ] < --- > [ 'server' ]
                            ^ --------- > [  'dummy' ]

And 'dummy' provided the l3 routing path.  Seems that doesn't work in all
environments, so I use 'server' to provide the routing path and 'lo' as
the actual ping endpoing.

I tested this on RHEL8 (with the latest z-stream kernel), and Fedora
34.

---
 tests/system-traffic.at | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index f400cfabc9..de9108ac20 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -3305,6 +3305,47 @@ NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP
 
+AT_SETUP([conntrack - IPv4 Fragmentation + NAT])
+AT_SKIP_IF([test $HAVE_TCPDUMP = no])
+CHECK_CONNTRACK()
+
+OVS_TRAFFIC_VSWITCHD_START(
+   [set-fail-mode br0 secure -- ])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.2.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.2.1.2/24")
+
+# create a dummy route for NAT
+NS_CHECK_EXEC([at_ns1], [ip addr add 10.1.1.2/32 dev lo])
+NS_CHECK_EXEC([at_ns0], [ip route add 10.1.1.0/24 via 10.2.1.2])
+NS_CHECK_EXEC([at_ns1], [ip route add 10.1.1.0/24 via 10.2.1.1])
+
+# solely for debugging when things go wrong
+NS_EXEC([at_ns0], [tcpdump -i p0 -w p0.pcap -xx >tcpdump.out &])
+NS_EXEC([at_ns1], [tcpdump -i p1 -w p1.pcap -xx >tcpdump.out &])
+
+AT_DATA([flows.txt], [dnl
+table=0,arp,actions=normal
+table=0,ct_state=-trk,ip,in_port=ovs-p0, actions=ct(table=1, nat)
+table=0,ct_state=-trk,ip,in_port=ovs-p1, actions=ct(table=1, nat)
+table=1,ct_state=+trk+new,ip,in_port=ovs-p0, actions=ct(commit, nat(src=10.1.1.1)),ovs-p1
+table=1,ct_state=+trk+est,ip,in_port=ovs-p0, actions=ovs-p1
+table=1,ct_state=+trk+est,ip,in_port=ovs-p1, actions=ovs-p0
+])
+
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+#check connectivity
+NS_CHECK_EXEC([at_ns0], [ping -c 1 10.1.1.2 -M dont -s 4500 | FORMAT_PING], [0], [dnl
+1 packets transmitted, 1 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
 AT_SETUP([conntrack - resubmit to ct multiple times])
 CHECK_CONNTRACK()
 
-- 
2.31.1



More information about the dev mailing list