[ovs-dev] [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Darrell Ball dball at vmware.com
Sun Nov 19 21:27:22 UTC 2017


Hi Jan

As discussed and agreed at OVSCON, I submitted a patch to bring the userspace connection tracker established state
In line with that of the kernel.
I used a similar patch to what I earlier suggested earlier in this thread, adding a test and also made some documentation updates.

Some of the discussion in this thread was somewhat orthogonal to bringing userspace ‘established’ in line with kernel
‘established’, but it appears to have been useful as some new recommendations may come out of it with respect to
recommended practices, for conntrack pipeline design.

Thanks Darrell


From: Jan Scheurich <jan.scheurich at ericsson.com>
Date: Saturday, November 4, 2017 at 4:54 AM
To: Darrel Ball <dball at vmware.com>, Rohith Basavaraja <rohith.basavaraja at ericsson.com>
Cc: "dev at openvswitch.org" <dev at openvswitch.org>
Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrel,

The example pipeline I crafted was not meant to be a realistic conntrack application but to demonstrate the semantic differences between userspace and kernel implementation and to discuss our problems with the current documentation.

I fully agree with your proposal for a proper ICMP set of rules. It would work equally for kernel and userspace datapath. But there are other rule sets where there they behave differently and we believe this is not good.

The original pipeline brought up by Rohith in August is the implementation of OpenStack Security Groups in OpenDaylight. In general ODL does not commit connections in the untrusted direction. However, in the problematic scenario (two Neutron ports in the same Neutron Network but in different Security Groups, co-located on the same OVS instance) the connection was committed (as trusted) on the sending side. The packet should have been dropped on the receiving side but the ct() lookup for the first packet on egress hits the committed connection and passes because ODL uses one conntrack zone per Neutron Network rather than per Security Group. I think this is wrong and using one zone per Security Group would probably solve this specific issue.

But with the kernel datapath this issue never surfaced because the connection is not considered established prior to the first reply packet so that the second lookup of the first packet on egress still yields +new-est. So the ODL developers testing with kernel datapath assumed their design was suitable. You can argue that was a misunderstanding of the function but the discrepancy between documentation and kernel behavior certainly didn’t help.

Perhaps it is better we continue this discussion in person during the OVS conference?

Regards, Jan

From: Darrell Ball [mailto:dball at vmware.com]
Sent: Saturday, 04 November, 2017 01:47
To: Jan Scheurich <jan.scheurich at ericsson.com>; Rohith Basavaraja <rohith.basavaraja at ericsson.com>
Cc: dev at openvswitch.org
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK



From: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>
Date: Friday, November 3, 2017 at 6:22 AM
To: Darrel Ball <dball at vmware.com<mailto:dball at vmware.com>>, Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Cc: "dev at openvswitch.org<mailto:dev at openvswitch.org>" <dev at openvswitch.org<mailto:dev at openvswitch.org>>
Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrel,

I have now been able to actually test the example pipelines I provided earlier. Turns out that the first one I sent was correct.

[Darrell] Sure, let us discuss the first example; let me know if you want to discuss the second example you gave as well.


Please note that it was not meant as realistic conntrack pipeline

Darrell] Again, I can agree your example is not realistic or recommended. No one would write rules like this.
              The rules would certainly be written properly so that the trusted direction (the one that does the commit) allows the first packet through;
              this is a fundamental principle of conntrack. There are an infinite number of ways to misuse conntrack rules and no one can prevent misuse.
              On a similar topic, another fundamental problem I saw with the original discussion (from Aug) is creating a conntrack pipeline that commits
              a connection in the untrusted direction. That is also not something we do or recommend others do. This ‘suboptimal design approach’ brought
              us to the question on when a packet gets labelled as ESTABLISHED.  Normally, the difference would not be noticed, since a connection would not be
              committed in the untrusted direction and hence EST would not be possible unless another rule correctly commits in the trusted direction.
              I’ll add more comments below.


but just to demonstrate the misalignment between userspace and kernel conntrack and the conflict of both with the documentation.

The following pipeline is now tested:

ovs-ofctl add-flow br0 "table=0,priority=10,in_port=1,icmp actions=ct(table=1,zone=5000)"
ovs-ofctl add-flow br0 "table=0,priority=10,in_port=1,arp actions=output:2"
ovs-ofctl add-flow br0 "table=0,priority=10,in_port=2 actions=output:1"
ovs-ofctl add-flow br0 "table=1,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2"
ovs-ofctl add-flow br0 "table=1,priority=10,in_port=1,ip,ct_state=+new+trk actions=ct(commit,zone=5000),goto_table:2"
ovs-ofctl add-flow br0 "table=2,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2"

The ct(commit) action in table 1 commits a new connection entry, but the subsequent match in table 2 proves that the ct_state of the packet is still not EST despite the commit.

>>>>>>>>>>>>>

[Darrell] I assumed you meant “lack of match in table 2” per your following test result.
               You use zoning in your rules without effect and you even split the pipeline with goto_table – we would not do this.
               Out of 6 rules, probably at least 4 of them are not what you want.
               I think there is a big disconnect here and I feel we are wasting time discussing such a contrived pipeline.

Here is a simplified set of rules that might be reasonably used for just icmp:

priority=1,action=drop
priority=10,arp,action=normal
table=0,priority=10,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
table=0,priority=10,in_port=2,icmp,ct_state=+trk+est actions=output:1
table=0,priority=10,in_port=1,icmp actions=ct(commit,table=1)
table=1,priority=10,in_port=1,icmp,ct_state=+trk+est actions=output:2
table=1,priority=10,in_port=1,icmp,ct_state=+trk+new actions=output:2

<<<<<<<<<<<<<


This contradicts the statement in man ovs-fields: “est (0x02)  Part of an existing connection. Set to 1 if this is a committed connection.”

>>>>>>>>>
[Darrell]
No, it does not.
Same answer as earlier

[Darrell   Let me clear up some misconceptions, ct(commit ) is a prerequisite for EST being set for a later packet seen. A ‘packet’ (not a connection) that hits an
                 existing conntrack entry is marked as established and that’s what the documentation says; the idea behind the ‘OVS conntrack EST state’ is to make
                 the state intuitive.

“est (0x02)  Part of an existing connection. Set to 1 if this is a committed connection.”

ESTABLISHED is an attribute of a packet hitting an existing conntrack entry (“Part of an existing connection”), not the conntrack entry itself.
So, a packet that hits an ‘existing’ entry (which is a committed connection, of course) gets its state set to EST.
I agree this is subtle, because the reader has to know that EST is a state of the packet not the connection entry itself; the wording could have been better.
<<<<<<<<<


Consequently the userspace datapath drops the first ICMP packet:

root at ubuntu:~# ip netns exec ns1 ping -c1 192.168.10.20
PING 192.168.10.20 (192.168.10.20) 56(84) bytes of data.
--- 192.168.10.20 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

root at ubuntu:/opt/ovs# ovs-ofctl -Oopenflow13 dump-flows br0
cookie=0x0, duration=30.885s, table=0, n_packets=1, n_bytes=98, reset_counts priority=10,icmp,in_port="br0-ns1" actions=ct(table=1,zone=5000)
cookie=0x0, duration=30.848s, table=0, n_packets=0, n_bytes=0, reset_counts priority=10,arp,in_port="br0-ns1" actions=output:"br0-ns2"
cookie=0x0, duration=30.815s, table=0, n_packets=0, n_bytes=0, reset_counts priority=10,in_port="br0-ns2" actions=output:"br0-ns1"
cookie=0x0, duration=30.783s, table=1, n_packets=0, n_bytes=0, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2"
cookie=0x0, duration=30.746s, table=1, n_packets=1, n_bytes=98, reset_counts priority=10,ct_state=+new+trk,ip,in_port="br0-ns1" actions=ct(commit,zone=5000),resubmit(,2)
cookie=0x0, duration=30.712s, table=2, n_packets=0, n_bytes=0, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2"

root at ubuntu:/opt/ovs# ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
ct_state(+new-est-rel-inv+trk),recirc_id(0x2),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=5000)
recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=1,frag=no), packets:0, bytes:0, used:never, actions:ct(zone=5000),recirc(0x2)

root at ubuntu:/opt/ovs# ovs-appctl dpctl/dump-conntrack
icmp,orig=(src=192.168.10.10,dst=192.168.10.20,id=20697,type=8,code=0),reply=(src=192.168.10.20,dst=192.168.10.10,id=20697,type=0,code=0),zone=5000

But when I send two ICMP packets in a row, the second packet hits the connection entry committed by the first dropped packet and goes through:

root at ubuntu:~# ip netns exec ns1 ping -c2 192.168.10.20
PING 192.168.10.20 (192.168.10.20) 56(84) bytes of data.
64 bytes from 192.168.10.20: icmp_seq=2 ttl=64 time=1.87 ms
--- 192.168.10.20 ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1006ms
rtt min/avg/max/mdev = 1.874/1.874/1.874/0.000 ms

root at ubuntu:/opt/ovs# ovs-ofctl -Oopenflow13 dump-flows br0
cookie=0x0, duration=40.727s, table=0, n_packets=2, n_bytes=196, reset_counts priority=10,icmp,in_port="br0-ns1" actions=ct(table=1,zone=5000)
cookie=0x0, duration=40.696s, table=0, n_packets=1, n_bytes=42, reset_counts priority=10,arp,in_port="br0-ns1" actions=output:"br0-ns2"
cookie=0x0, duration=40.667s, table=0, n_packets=2, n_bytes=140, reset_counts priority=10,in_port="br0-ns2" actions=output:"br0-ns1"
cookie=0x0, duration=40.631s, table=1, n_packets=1, n_bytes=98, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2"
cookie=0x0, duration=40.602s, table=1, n_packets=1, n_bytes=98, reset_counts priority=10,ct_state=+new+trk,ip,in_port="br0-ns1" actions=ct(commit,zone=5000),resubmit(,2)
cookie=0x0, duration=40.566s, table=2, n_packets=0, n_bytes=0, reset_counts priority=10,ct_state=-new+est-rel-inv+trk,in_port="br0-ns1" actions=output:"br0-ns2"

root at ubuntu:/opt/ovs# ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x0806), packets:0, bytes:0, used:never, actions:2
recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(proto=1,frag=no), packets:1, bytes:98, used:4.149s, actions:ct(zone=5000),recirc(0x4)
ct_state(+new-est-rel-inv+trk),recirc_id(0x4),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:ct(commit,zone=5000)
ct_state(-new+est-rel-inv+trk),recirc_id(0x4),in_port(2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:3
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:2

The ct() lookup in table 0 for subsequent packets sets the packet’s ct_state to EST no matter if conntrack has seen reply packets or not.

[Darrell   Let me clear up some misconceptions, ct(commit ) is a prerequisite for EST being set for a later packet seen. A ‘packet’ (not a connection) that hits an
                existing conntrack entry is marked as established and that’s what the documentation says; the idea behind the ‘OVS conntrack EST state’ is to make
                the state intuitive.

Well, I would question if the current definition of EST state in the OVS documentation is intuitive. It certainly has fooled us ;-) But also the ODL developers who have rather based their Security Group pipeline design on the exhibited behavior of the kernel datapath.

I’d find it much more intuitive if the ct_state of a packet reflected the state of the tracked connection at the end of the last ct() action. Directly after a commit of a new connection it should still be NEW. Only when a connection is really ‘established’ it should change to EST. The definition of when a connection is established actually depends on the protocol type. For icmp and udp (other) it is indeed the lookup of the first reply packet when the corresponding entries enter states ICMPS_REPLY and OTHERS_BIDIR, respectively. For tcp the trigger should be the lookup of a valid SYN/ACK packet from the remote side.

[Darrell] Bringing userspace ESTABLISHED inline with the kernel ESTABLISHED is trivial; for 2.8/master, it is below (for 2.6 it is similar, but a few lines less)

I’m not sure that your simple patch that just checks the reply direction as prerequisite for all protocols is sufficient. We’d rather suggest to base the EST ct state on the actual connection state as result of xxx_conn_update().

[Darrell]  1/ As I mentioned before, EST is a packet state; a prerequisite for EST is a committed connection.
                2/ TCP conn update code understands what valid is and tracks acks in this regard.
                3/ For UDP and ICMP, I also don’t intend to conflate ESTABLISHED with VALID.


That may require a new return value (CT_UPDATE_UNCHANGED) in these functions that leaves the ct state of the packet unchanged. The conntrack modules (_tcp, _icmp, _other) would only return CT_UPDATE_VALID when the connection is established.

[Darrell] No, I disagree; VALID and ESTABLISHED are very different; I will not conflate them.
                Packets can be valid without being marked as established.


[Darrell]  The OVS documented definition of ESTABLISHED is actually better, but I don’t think that is very important and I think most users will not care or even notice the difference.

We find the documented OVS definition rather confusing. Now, after all our discussions and tests, I tend to agree that the current implementation of userspace conntrack is actually very straightforward and could be described in simple terms, but since it does not match the kernel datapath behavior, which in our view sets the reference, that won’t help.

[Darrell] Actually, the contrived example (raison d'etre) you first provided in Aug is based on committing a connection in the untrusted direction; as mentioned earlier,
               we don’t do that and we don’t recommend others do it either.


Here’s a proposal for an improved description of ct_state in ovs-fields:

       Connection Tracking State Field

       Name:            ct_state
       Width:           32 bits

       Format:          ct state
       Masking:         arbitrary bitwise masks
       Prerequisites:   none

       Access:          read-only
       OpenFlow 1.0:    not supported
       OpenFlow 1.1:    not supported

       OXM:             none
       NXM:             NXM_NX_CT_STATE (105) since Open vSwitch 2.5

       This field holds several flags that can be used to determine the state of the con‐
       nection to which the packet belongs. It is initially zero and updated every time a
       ct() action is executed. It reflects the state of the packet and of its associated
connection, if any, at completion of the ct() action. Only committed connections
are being tracked.

       Matches  on  this  field  are most conveniently written in terms of symbolic names
       (listed below), each preceded by either + for a flag that must be set, or - for  a
       flag that must be unset, without any other delimiters between the flags. Flags not
       mentioned are wildcarded. For example, tcp,ct_state=+trk-new matches  TCP  packets
       that  have been run through the connection tracker and do not establish a new con‐
       nection. Matches can also be written as  flags/mask,  where  flags  and  mask  are
       32-bit numbers in decimal or in hexadecimal prefixed by 0x.

       The following flags are defined:

              new (0x01)
                     A new connection. Set to 1 if there exists no committed connection
                     for the packet yet, or if the committed connection is not yet fully
                     established.

              est (0x02)
                     Part of an established connection. Set to 1 if there is a committed
connection for the packet and the connection is fully established.

A TCP connection is established when the connection tracker has seen
the SYN-ACK from the destination. For UDP and ICMP the connection is
established when the connection tracker has seen the first reply
packet.

              rel (0x04)
                     Related to  an  existing  connection,  e.g.  an  ICMP  ``destination
                     unreachable’’  message  or  an  FTP data connections. This flag will
                     only be 1 if the connection to which this one is related is  commit‐
                     ted.

                     Connections identified as rel are separate from the originating con‐
                     nection and must be committed separately. All packets for a  related
                     connection will have the rel flag set, not just the initial packet.

              rpl (0x08)
                     This  packet  is  in  the reply direction, meaning that it is in the
                     opposite direction from the packet that  initiated  the  connection.
                     This flag will only be 1 if the connection is committed.

              inv (0x10)
                     The  state  is invalid, meaning that the connection tracker couldn’t
                     identify the connection. This flag is a catch-all  for  problems  in
                     the connection or the connection tracker, such as:

                     ·      L3/L4  protocol  handler  is not loaded/unavailable. With the
                            Linux  kernel  datapath,  this  may  mean  that  the  nf_con‐
                            ntrack_ipv4 or nf_conntrack_ipv6 modules are not loaded.

                     ·      L3/L4  protocol  handler  determines  that the packet is mal‐
                            formed.

                     ·      Packets are unexpected length for protocol.

              trk (0x20)
                     This packet is tracked, meaning that it has previously traversed the
                     connection  tracker.  If  this  flag is not set, then no other flags
                     will be set. If this flag is set, then the  packet  is  tracked  and
                     other flags may also be set.

              snat (0x40)
                     This  packet was transformed by source address/port translation by a
                     preceding ct action. Open vSwitch 2.6 added this flag.

              dnat (0x80)
                     This packet was transformed by destination address/port  translation
                     by a preceding ct action. Open vSwitch 2.6 added this flag.

       There  are  additional  constraints  on these flags, listed in decreasing order of
       precedence below:

              1.
                If trk is unset, no other flags are set.

              2.
                If trk is set, one or more other flags may be set.

              3.
                If inv is set, only the trk flag is also set.

              4.
                new and est are mutually exclusive.

              5.
                new and rpl are mutually exclusive.

              6.
                rel may be set in conjunction with any other flags.

       Future versions of Open vSwitch may define new flags.

What do you think?

BR, Jan


From: Darrell Ball [mailto:dball at vmware.com]
Sent: Friday, 03 November, 2017 06:26
To: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>; Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

One update inline regarding the kernel/userspace ESTABLISHED definition syncing
However, we still need to resolve the other discussion points.

From: Darrel Ball <dball at vmware.com<mailto:dball at vmware.com>>
Date: Thursday, November 2, 2017 at 7:46 PM
To: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>, Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

I am checking a few things so I’ll get back to you, but I have a couple comments inline.

From: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>
Date: Thursday, November 2, 2017 at 11:41 AM
To: Darrel Ball <dball at vmware.com<mailto:dball at vmware.com>>, Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrell,

Sorry for the confusion. One of our points concerns actually *is* that the userspace conntrack sets the connection to ESTABLISHED without the commit.

[Darrell] I am still interested to know about this case; pls provide a test.

But, more importantly, that the kernel conntrack does not set the connection to ESTABLISHED at all through a second ct() lookup for a packet in the same direction, with or without commit.

My example was trying to demonstrate that ct(commit) does not move the new connection to ESTABLISHED, but I didn’t test it and I think now that it might, but not because of the commit but just because of the second ct() action for the same packet.

[Darrell] Yes, your previous example is not correct for your purpose.
                Let me clear up some misconceptions, ct(commit ) is a prerequisite for EST being set for a later packet seen. A ‘packet’ (not a connection) that hits an
                existing conntrack entry is marked as established and that’s what the documentation says; the idea behind the ‘OVS conntrack EST state’ is to make
                the state intuitive.


So a better example to demonstrate the misbehavior of userspace datapath would be as follows:

table=0,priority=10,in_port=1,icmp actions=ct(table=1,zone=5000)
table=0,priority=10,in_port=1,arp actions=output:2
table=0,priority=10,in_port=2 actions=output:1

table=1,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2
table=1,priority=10,in_port=1,ct_state=+new+trk actions=ct(zone=5000),goto_table:2

table=2,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2

This should now move the new connection to ESTABLISHED state in table 1 so that table 2 will hit and forward the icmp packet to port 2. The Ping should go through with userspace datapath.

Darrell] This test does not make sense either. It does not even send the reply packet entering port 2 thru. conntrack, so it is not properly testing conntrack here.
               There are other problems with this test as well; pls check it.

With the kernel datapath the icmp packets would still not pass.

The problem is really that we today have entirely different conntrack semantics in kernel and userspace datapath. Since the kernel conntrack semantics are outside the scope of OVS we should take them as given, align the userspace conntrack accordingly, and update the OVS documentation to reflect the real semantics, i.e. that a connection only moves to established state when a conntrack lookup is done for a reply packet.

[Darrell] Bringing userspace ESTABLISHED inline with the kernel ESTABLISHED is trivial; for 2.8/master, it is below (for 2.6 it is similar, but a few lines less)
               None of the 60 or so existing conntrack system tests are affected. However, I added a test and I’ll submit a patch once we resolve our discussion points.
               The OVS documented definition of ESTABLISHED is actually better, but I don’t think that is very important and I think most users will not care or even notice the difference.

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index ac0198f..1f6a107 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -107,6 +107,7 @@ struct conn {
     uint8_t seq_skew_dir;
     /* True if alg data connection. */
     uint8_t alg_related;
+    uint8_t reply_seen;
};

 enum ct_update_res {
diff --git a/lib/conntrack.c b/lib/conntrack.c
index e555b55..69061fc 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -912,10 +912,13 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt,

         switch (res) {
         case CT_UPDATE_VALID:
-            pkt->md.ct_state |= CS_ESTABLISHED;
-            pkt->md.ct_state &= ~CS_NEW;
             if (ctx->reply) {
                 pkt->md.ct_state |= CS_REPLY_DIR;
+                (*conn)->reply_seen = true;
+            }
+            if ((*conn)->reply_seen) {
+                pkt->md.ct_state |= CS_ESTABLISHED;
+                pkt->md.ct_state &= ~CS_NEW;
             }
             break;
         case CT_UPDATE_INVALID:


The commit itself does not change the conntrack state. It only persists the initially temporary conntrack entry in the database.

We have created a simple downstream patch to align the userspace conntrack behavior to the kernel but we are not allowed publish it on the mailing list for IPR licensing reasons. The needed code changes are quite straightforward, though.

BR, Jan

From: Darrell Ball [mailto:dball at vmware.com]
Sent: Thursday, 02 November, 2017 18:53
To: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>; Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK



From: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>
Date: Thursday, November 2, 2017 at 10:14 AM
To: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Cc: Darrel Ball <dball at vmware.com<mailto:dball at vmware.com>>
Subject: RE: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Rohith,

To illustrate that ct(commit) does not move a new connection to the established state as stated in [2], it should be enough to check that the initial icmp packet is dropped in table 2 of the simplistic conntrack pipeline:

[Darrell] I am confused; the problem statement from Rohith was and is:
               “The userspace datapath flags a packet as ESTABLISHED ‘without’ a ct(commit)” ?




table=0,priority=10,in_port=1,icmp actions=ct(table=1,zone=5000)
table=0,priority=10,in_port=1,arp actions=output:2
table=0,priority=10,in_port=2 actions=output:1

table=1,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2
table=1,priority=10,in_port=1,ct_state=+new+trk actions=ct(commit,zone=5000),goto_table:2

table=2,priority=10,in_port=1,ct_state=-new+est-rel-inv+trk actions=output:2

In addition we should dump the conntrack state after the initial packet has passed.

The second icmp packet in the same direction should then move the packet to established state in table 0 and immediately send it to port 2 in table 1.

In contrast, the behavior of the kernel datapath would be to drop all icmp packets sent from port 1 to port 2 as the return packet is never seen by conntrack.

BR, Jan


From: Darrell Ball [mailto:dball at vmware.com]
Sent: Thursday, 02 November, 2017 17:31
To: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>; ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>
Cc: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK



From: <ovs-discuss-bounces at openvswitch.org<mailto:ovs-discuss-bounces at openvswitch.org>> on behalf of Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Date: Wednesday, November 1, 2017 at 10:28 PM
To: "ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>" <ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>>
Cc: Jan Scheurich <jan.scheurich at ericsson.com<mailto:jan.scheurich at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi,

It’ been quite some time I raised this issue, thought will update the thread with our findings.
Following is the summary of our findings and analysis, and we think OVS user datapath conntrack implementation
Needs to be fixed otherwise some of the security group deployments mentioned below might fail.

Analysis/Findings
===============
Currently OVS kernel datapath implementation have the ct_state (conntrack state) semantics as described
In the following document.
http://www.iptables.info/en/connection-state.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.iptables.info_en_connection-2Dstate.html&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=cTQCN_hF1TN3jk_TJWYNMha6aUqSOURuVHEc5RCep1Y&s=yCJI8jfdp7_jUXkdLDee-j25N93R7kNawJUgmugy2-M&e=>[1]

OVS user datapath doesn’t follow above semantics and also the ct_state description in the  OVS specification
(http://openvswitch.org/support/dist-docs/ovs-fields.7.pdf)<https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_support_dist-2Ddocs_ovs-2Dfields.7.pdf-29&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=cTQCN_hF1TN3jk_TJWYNMha6aUqSOURuVHEc5RCep1Y&s=eGwS0b14Sw_uvAefFrhp3aKmzckD4UOt2Y0nwXsPPS8&e=>[2] is not correct as explained below.

The main issue is when the conntrack  state “CS_ESTABLISHED” is set for a tracked flow. In the kernel datapath and iptables a tracked
flow moves to established state only once it sees a reply packet in the reverse direction.



The user-space conntrack, in contrast, moves a
tracked connection to established state as soon as a newly tracked connection is looked up the first time, irrespectively of the direction
of the packet.

[Darrell] are you sure ?
               The expectation is that the Userspace Datapath 2.6 behavior adheres to the OVS specification below.
               Please provide a test case that shows this is not the case; I would be interested ?



Finally, OVS specification [2] defines the “est” state as “est (0x02) Part of an existing connection. Set to 1 if this is a committed
connection”. This means that the tracked connection would move to established state when the ct(commit) action is executed and the
semantics don’t match either the kernel or user-space behaviour.


Because of the above difference some of the Security Group(SGs) use cases are failing for eg:
VMs that have SGs that shall not allow communication among them are not working when VMs are on the same compute node.

[Darrell] We had a lengthy offline email discussion about this from 8/23 to 8/30. The last few exchanges are below.


///////////////////////////////////////////////



From: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Date: Wednesday, August 30, 2017 at 9:05 AM
To: Darrel Ball <dball at vmware.com<mailto:dball at vmware.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrell,

Thanks, a lot for the help and sharing the useful information.

Thanks
Rohith

From: Darrell Ball <dball at vmware.com<mailto:dball at vmware.com>>
Date: Wednesday, 30 August 2017 at 9:18 PM
To: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Rohith



From: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Date: Wednesday, August 30, 2017 at 3:46 AM
To: Darrel Ball <dball at vmware.com<mailto:dball at vmware.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrell,

For user datapath  do we have any other tools to dump conntrack entries
Other than  *ovs-appctl dpctl/dump-conntrack*

[Darrell]
Right now, we have

           "  dump-conntrack [DP] [zone=ZONE]  " \
               "display conntrack entries for ZONE\n"
           "  flush-conntrack [DP] [zone=ZONE] " \
               "delete all conntrack entries in ZONE\n"
           "  ct-stats-show [DP] [zone=ZONE] [verbose] " \
               "CT connections grouped by protocol\n"
           "  ct-bkts [DP] [gt=N] display connections per CT bucket\n"

This is from ./utilities/ovs-dpctl.c


For kernel datapath I see that we can use conntrack –L  to dump the entries,
Is conntrack tool is only for kernel datapath only?

[Darrell]
Yes


In general, any other conntrack commands or tools available for userdatapath?

[Darrell]
Right now, the ones mentioned above
Of course, the well known commands also tell lots about what is happening in conntrack indirectly
sudo ovs-ofctl dump-flows br0
sudo ovs-appctl dpif/dump-flows br0

Sorry for too many queries,  Pl let me know if it’s bothering you.

[Darrell]
No problem at all.

Thanks
Rohith



From: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Date: Wednesday, 30 August 2017 at 1:57 PM
To: Darrell Ball <dball at vmware.com<mailto:dball at vmware.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrell,

Thanks for the suggestions.

Thanks
Rohith

From: Darrell Ball <dball at vmware.com<mailto:dball at vmware.com>>
Date: Wednesday, 30 August 2017 at 12:17 PM
To: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Ho Rohith

So the previous answer is the solution then; elaborating:

You want a committed connection VM2 -> VM1 (i.e. originated from VM2); this allows VM1 to send replies to VM2.

You want to prevent creating a committed connection from VM1 -> VM2
This can be done in various ways by using in_port, zones (per logical ports), dl_src, dl_dst etc

So traffic originated from VM1 -> VM2 will always be new

Thanks Darrell

From: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Date: Tuesday, August 29, 2017 at 11:19 PM
To: Darrel Ball <dball at vmware.com<mailto:dball at vmware.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrell,

Just to clarify following is the usecase.


  1.  VM1 can originate/initiate traffic to VM2
  2.  VM1 can receive traffic from VM2
  3.  VM2 should not receive any new connection from VM1
  4.  VM2 can originate/initiate traffic to VM1

Thanks
Rohith

From: Darrell Ball <dball at vmware.com<mailto:dball at vmware.com>>
Date: Wednesday, 30 August 2017 at 11:37 AM
To: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Rohith

Just to confirm:
1/ VM1 can never send traffic to VM2 (originate or reply) ?
OR
2/ VM1 cannot originate traffic to VM2 but VM1 can send reply traffic to VM2. ?

I have now been assuming ‘2’ ?

Thanks Darrell


///////////////////////////////////////////////






Thanks
Rohith

From: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Date: Thursday, 24 August 2017 at 12:43 AM
To: Darrell Ball <dball at vmware.com<mailto:dball at vmware.com>>, "ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>" <ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Darrell,

Yes the expected outcome is to drop new or non related connection, and allow only related or established connections.

Just for the clarity adding  the details of the topology and pipeline rules.

Description about the topology
=========================

VM1 and VM4 VMs  are on same compute node but with different SGs.

For VM4, security rules configured are as below:
Egress/Ingress  Allow all

For VM1,
Egress  Allow all
Ingress  Allow only from VMs which are in same security group.

For above combination, all conntrack flows required (in tables 213, 214 on VM egress side and 243, 244) are properly programmed in the OVS.

For traffic sent from VM4 to VM1 , conntrack is allowing traffic which should have been dropped as the ingress for later is to be allowed only from the VMs of the same SG. For VM1 , conntrack is directly sending traffic to "ct_state==-new+est-rel-inv+trk " flow by-passing "ct_state=+new+trk" flow in the ingress direction.

Following is the pipe line rules details
==============================

VM4 is on 112/15: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, mtu=2140, requested_rx_queues=1, requested_tx_queues=1)

VM1 is on 108/11: (dpdkvhostuser: configured_rx_queues=1, configured_tx_queues=1, mtu=2140, requested_rx_queues=1, requested_tx_queues=1)

VM4 IP: 172.20.1.113  MAC: fa:16:3e:55:9e:33

VM1 IP: 172.20.1.117 MAC : fa:16:3e:f7:72:d3

I am doing Ping from VM4 (172.20.1.113 ) to VM1 (172.20.1.117).

cookie=0x8000000, duration=2809.367s, table=0, n_packets=74, n_bytes=10426, priority=4,in_port=112,vlan_tci=0x0000/0x1fff actions=write_metadata:0x19f40000000000/0xffffff0000000001,goto_table:17


cookie=0x6900000, duration=2809.343s, table=17, n_packets=74, n_bytes=10426, priority=10,metadata=0x19f40000000000/0xffffff0000000000 actions=write_metadata:0x4019f40000000000/0xfffffffffffffffe,goto_table:211

cookie=0x6900000, duration=2809.313s, table=211, n_packets=54, n_bytes=8674, priority=61010,ip,metadata=0x19f40000000000/0x1fffff0000000000,dl_src=fa:16:3e:55:9e:33,nw_src=172.20.1.113 actions=goto_table:212

cookie=0x6900000, duration=15546.529s, table=212, n_packets=3669, n_bytes=361562, priority=61010,icmp actions=goto_table:213

cookie=0x6900000, duration=2809.308s, table=213, n_packets=54, n_bytes=8674, priority=61010,ip,metadata=0x19f40000000000/0x1fffff0000000000 actions=ct(table=214,zone=5021)

cookie=0x6900000, duration=15546.544s, table=214, n_packets=3660, n_bytes=367508, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,17)
cookie=0x6900001, duration=2809.304s, table=214, n_packets=2, n_bytes=180, priority=62015,ct_state=+inv+trk,metadata=0x19f40000000000/0x1fffff0000000000 actions=drop
cookie=0x6900000, duration=2809.295s, table=214, n_packets=10, n_bytes=908, priority=1000,ct_state=+new+trk,ip,metadata=0x19f40000000000/0x1fffff0000000000 actions=ct(commit,zone=5021),resubmit(,17)

cookie=0x6800001, duration=2807.340s, table=17, n_packets=72, n_bytes=10246, priority=10,metadata=0x4019f40000000000/0xffffff0000000000 actions=write_metadata:0xc019f40000000000/0xfffffffffffffffe,goto_table:60

cookie=0x6800000, duration=15546.596s, table=60, n_packets=262265, n_bytes=19604926, priority=0 actions=resubmit(,17)

cookie=0x8040000, duration=2807.338s, table=17, n_packets=70, n_bytes=9562, priority=10,metadata=0xc019f40000000000/0xffffff0000000000 actions=write_metadata:0xe019f4139d000000/0xfffffffffffffffe,goto_table:48

cookie=0x8500000, duration=15546.528s, table=48, n_packets=302458, n_bytes=34190652, priority=0 actions=resubmit(,49),resubmit(,50)

cookie=0x805139d, duration=2808.378s, table=50, n_packets=70, n_bytes=9562, priority=20,metadata=0x19f4139d000000/0x1fffffffff000000,dl_src=fa:16:3e:55:9e:33 actions=goto_table:51


cookie=0x803139d, duration=2818.232s, table=51, n_packets=34, n_bytes=4613, priority=20,metadata=0x139d000000/0xffff000000,dl_dst=fa:16:3e:f7:72:d3 actions=load:0x1a5300->NXM_NX_REG6[],resubmit(,220)

cookie=0x6900000, duration=2819.193s, table=220, n_packets=3455, n_bytes=207541, priority=6,reg6=0x1a5300 actions=load:0xe01a5300->NXM_NX_REG6[],write_metadata:0xe01a530000000000/0xfffffffffffffffe,goto_table:241


cookie=0x6900000, duration=2819.237s, table=241, n_packets=32, n_bytes=4851, priority=61010,ip,metadata=0x1a530000000000/0x1fffff0000000000,dl_dst=fa:16:3e:f7:72:d3,nw_dst=172.20.1.117 actions=goto_table:242


cookie=0x6900000, duration=15546.579s, table=242, n_packets=3738, n_bytes=368372, priority=61010,icmp actions=goto_table:243

cookie=0x6900000, duration=2819.235s, table=243, n_packets=32, n_bytes=4851, priority=61010,ip,metadata=0x1a530000000000/0x1fffff0000000000 actions=ct(table=244,zone=5021)


cookie=0x6900001, duration=2819.230s, table=244, n_packets=2, n_bytes=196, priority=50,ct_state=+new+trk,metadata=0x1a530000000000/0x1fffff0000000000 actions=drop


cookie=0x6900000, duration=15546.577s, table=244, n_packets=0, n_bytes=0, priority=62020,ct_state=-new-est+rel-inv+trk actions=resubmit(,220)
cookie=0x6900000, duration=15546.552s, table=244, n_packets=3819, n_bytes=431050, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,220)

cookie=0x8000007, duration=2819.193s, table=220, n_packets=107, n_bytes=9431, priority=7,reg6=0xe01a5300 actions=output:108


Thanks
Rohith




From: Darrell Ball <dball at vmware.com<mailto:dball at vmware.com>>
Date: Thursday, 24 August 2017 at 12:20 AM
To: Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>, "ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>" <ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>>
Subject: Re: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi Rohith

I might have missed the alias earlier.

From the below o/p, I see the rule
cookie=0x6900000, duration=15546.577s, table=244, n_packets=0, n_bytes=0, priority=62020,ct_state=-new-est+rel-inv+trk actions=resubmit(,220)
not being hit.

I also see the rule
cookie=0x6900001, duration=2819.230s, table=244, n_packets=2, n_bytes=196, priority=50, ct_state=+new+trk,metadata=0x1a530000000000/0x1fffff0000000000 actions=drop
having a drop action.

What is the expectation of the test ?
Is table 244 intended to drop non-related and non-established packets ?

Thanks Darrell

From: <ovs-discuss-bounces at openvswitch.org<mailto:ovs-discuss-bounces at openvswitch.org>> on behalf of Rohith Basavaraja <rohith.basavaraja at ericsson.com<mailto:rohith.basavaraja at ericsson.com>>
Date: Wednesday, August 23, 2017 at 3:03 AM
To: "ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>" <ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>>
Subject: [ovs-discuss] Conntrack issue in OVS (2.6)+DPDK

Hi,

I see that if I have following rules, i.e not allow any new connections and allow only established and related flows,

cookie=0x6900001, duration=2819.230s, table=244, n_packets=2, n_bytes=196, priority=50, ct_state=+new+trk,metadata=0x1a530000000000/0x1fffff0000000000 actions=drop
cookie=0x6900000, duration=15546.577s, table=244, n_packets=0, n_bytes=0, priority=62020,ct_state=-new-est+rel-inv+trk actions=resubmit(,220)
cookie=0x6900000, duration=15546.552s, table=244, n_packets=3819, n_bytes=431050, priority=62020,ct_state=-new+est-rel-inv+trk actions=resubmit(,220)

We are still seeing that new connections are getting allowed, we see this behavior/issue only OVS + DPDK and not in OVS kernel mode.

Wanted to check if this issue is already reported elsewhere or it’s new issue.

Thanks
Rohith






More information about the dev mailing list