[ovs-discuss] Issue with connection tracking for packets modified in pipeline

Numan Siddique nusiddiq at redhat.com
Thu Jul 6 08:55:13 UTC 2017


On Wed, Jun 28, 2017 at 7:06 AM, Numan Siddique <nusiddiq at redhat.com> wrote:

>
>
> On Jun 23, 2017 2:25 PM, "Joe Stringer" <joe at ovn.org> wrote:
>
> On 22 June 2017 at 16:08, Numan Siddique <nusiddiq at redhat.com> wrote:
> >
> >
> > On Jun 23, 2017 1:31 AM, "Joe Stringer" <joe at ovn.org> wrote:
> >
> > On 22 June 2017 at 04:16, Numan Siddique <nusiddiq at redhat.com> wrote:
> >>
> >>
> >> On Thu, Jun 22, 2017 at 5:45 AM, Joe Stringer <joe at ovn.org> wrote:
> >>>
> >>> On 21 June 2017 at 04:19, Numan Siddique <nusiddiq at redhat.com> wrote:
> >>> >
> >>> >
> >>> > On Tue, Jun 20, 2017 at 3:11 AM, Joe Stringer <joe at ovn.org> wrote:
> >>> >>
> >>> >> On 19 June 2017 at 00:37, Numan Siddique <nusiddiq at redhat.com>
> wrote:
> >>> >> >
> >>> >> >
> >>> >> > On Fri, Jun 16, 2017 at 11:22 PM, Joe Stringer <joe at ovn.org>
> wrote:
> >>> >> >>
> >>> >> >> On 15 June 2017 at 22:20, Numan Siddique <nusiddiq at redhat.com>
> >>> >> >> wrote:
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > On Thu, Jun 15, 2017 at 5:06 PM, Aswin S <
> aswinsuryan at gmail.com>
> >>> >> >> > wrote:
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> Adding some more info here, Thanks Numan! for pointing to
> this.
> >>> >> >> >>
> >>> >> >> >> The issue I am facing looks similar to the one described in
> [1]
> >>> >> >> >> and
> >>> >> >> >> [2].
> >>> >> >> >> But it seems the issue is not yet fixed.  Is there a plan to
> fix
> >>> >> >> >> this
> >>> >> >> >> soon?
> >>> >> >> >> In Opendaylight security groups is implemented using
> >>> >> >> >> ovs-conntrack.
> >>> >> >> >> So
> >>> >> >> >> the
> >>> >> >> >> flow based router  ping  responder and floating IP
> translations
> >>> >> >> >> hits
> >>> >> >> >> this
> >>> >> >> >> issue.
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >>
> >>> >> >> >> [1]https://mail.openvswitch.org/pipermail/ovs-dev/2017-March
> /329542.html
> >>> >> >> >> [2]https://patchwork.ozlabs.org/patch/739796/
> >>> >> >> >>
> >>> >> >> >
> >>> >> >> > The same issuse is also seen in OVN as pointed by Aswin.
> >>> >> >> >
> >>> >> >> > Joe - If you remember, we had a chat about this same issue
> during
> >>> >> >> > the
> >>> >> >> > Openstack Boston summit.
> >>> >> >>
> >>> >> >> Hi Numan, yeah I recall we had this discussion. I didn't have
> much
> >>> >> >> clarity on where we're at with this.  Looking at patchwork, I
> >>> >> >> provided
> >>> >> >> some feedback on the RFC. The most straightforward approach seems
> >>> >> >> to
> >>> >> >> be adding a nf_ct_set(skb, NULL, 0); call for each of the 5tuple
> >>> >> >> "set"
> >>> >> >> actions in the datapath.
> >>> >> >
> >>> >> >
> >>> >> > Thanks. I will try it out and let you know how it went.
> >>> >> > I remember, I was suppose to provide more clarity after our
> >>> >> > discussion.
> >>> >> > My
> >>> >> > apologies. It slipped out of my head.
> >>> >>
> >>> >> No worries, let me know how you go.
> >>> >
> >>> >
> >>> > I tried this and it didn't work. In fact the function set_ipv4 (in
> >>> > datapath/actions.c) is not even called.
> >>> >
> >>> > Below is the flow which responds to ICMP request packet
> >>> >
> >>> > cookie=0x64913aa, duration=566.801s, table=17, n_packets=3,
> >>> > n_bytes=294,
> >>> > idle_age=144,
> >>> >
> >>> > priority=90,icmp,metadata=0x3,nw_dst=192.168.0.1,icmp_type=8
> ,icmp_code=0
> >>> >
> >>> >
> >>> > actions=push:NXM_OF_IP_SRC[],push:NXM_OF_IP_DST[],pop:NXM_OF
> _IP_SRC[],pop:NXM_OF_IP_DST[],load:0xff->NXM_NX_IP_TTL[],loa
> d:0->NXM_OF_ICMP_TYPE[],load:0x1->NXM_NX_REG10[0],resubmit(,18)
> >>> >
> >>> > Thanks
> >>> > Numan
> >>>
> >>> Hi Numan,
> >>>
> >>> How are you going about making these changes and testing them? Could
> >>> you double-check that the correct module was loaded when you ran the
> >>> test? Given that the IP src and dst are being modified from the flow
> >>> you described above, I think that the set_ipv4 function should be
> >>> called for such flows.
> >>>
> >>> Some sanity checks:
> >>> # modinfo openvswitch
> >>> # find /lib/modules -name openvswitch.ko* | xargs ls -l
> >>>
> >>> Might want to double-check that your depmod.d settings are set
> >>> correctly so it loads the new module instead of the one that comes
> >>> with your kernel.
> >>> # man depmod.d
> >>>
> >>> Of course, the above doesn't necessarily apply if you're making
> >>> changes directly in your kernel tree and loading the module from there
> >>> (for example, using insmod, or make modules_install into the original
> >>> module path).
> >>>
> >>
> >> Hi Joe,
> >>
> >> I verified that the loaded openvswitch module loaded is indeed modified
> by
> >> me.  I also put some printks in functions like "ovs_packet_cmd_execute"
> to
> >> verify.
> >>
> >> I created my testing scenario as per the commands here [1]. There are 2
> >> logical ports with IPs 192.168.0.2 and 192.168.0.3 associated to 2
> >> namespaces ns1 and ns2. The logical switch is also connected to a
> logical
> >> router.
> >>
> >> I pinged from 192.168.0.2 to 192.168.0.3 continuously and monitored the
> >> kernel flows with the command -
> >>
> >> $watch -n1 -d "sudo ovs-dpctl dump-flows system at ovs-system"
> >>
> >> recirc_id(0),in_port(3),eth(src=00:00:00:00:00:00/01:00:00:0
> 0:00:00,dst=50:54:00:00:00:01),eth_type(0x0800),ipv4(dst=192
> .168.0.2/255.255.255.254,frag=no),
> >> packets:28, bytes:2744, used:0.323s, actions:2
> >>
> >> recirc_id(0),in_port(2),eth(src=00:00:00:00:00:00/01:00:00:0
> 0:00:00,dst=50:54:00:00:00:02),eth_type(0x0800),ipv4(dst=192
> .168.0.2/255.255.255.254,frag=no),
> >> packets:28, bytes:2744, used:0.323s, actions:3
> >>
> >>
> >> I pinged from 192.168.0.2 to 192.168.0.1 (without any ACLs, so the ping
> >> would be successful), I observed that the action is always userspace
> and I
> >> could see that the function "odp_execute_masked_set_action" in
> >> lib/odp-execute.c is called in vswitchd.
> >>
> >> $watch -n1 -d "sudo ovs-dpctl dump-flows system at ovs-system"
> >>
> >> recirc_id(0),in_port(2),eth(src=50:54:00:00:00:01,dst=00:00:
> 00:00:ff:01),eth_type(0x0806),arp(sip=192.168.0.2,tip=192.16
> 8.0.1,op=1/0xff,sha=50:54:00:00:00:01,tha=00:00:00:00:00:00),
> >> packets:0, bytes:0, used:never,
> >> actions:userspace(pid=4294958020,slow_path(action))
> >>
> >> recirc_id(0),in_port(2),eth(src=50:54:00:00:00:01,dst=00:00:
> 00:00:ff:01),eth_type(0x0800),ipv4(src=192.168.0.2,dst=192.1
> 68.0.1,proto=1,ttl=64,frag=no),icmp(type=8,code=0),
> >> packets:9, bytes:882, used:0.937s,
> >> actions:userspace(pid=4294958021,slow_path(action))
> >>
> >> In this case, the ICMP reply is framed by the OVS flow  and there is
> >> "clone"
> >> action involved for the packet to go to and from the logical switch to
> >> logical router pipeline.
> >>
> >> To avoid clone action, I added some code in ovn-northd to respond the
> ICMP
> >> reply if the ip4.dst = 192.168.0.1 which translated to the below OF flow
> >>
> >> table=19, n_packets=619, n_bytes=60662, idle_age=1,
> >> priority=90,icmp,metadata=0x1,nw_dst=192.168.0.1,icmp_type=8
> ,icmp_code=0
> >>
> >> actions=move:NXM_OF_IP_SRC[]->NXM_OF_IP_DST[],mod_nw_src:192
> .168.0.1,push:NXM_OF_ETH_SRC[],push:NXM_OF_ETH_DST[],pop:NXM
> _OF_ETH_SRC[],pop:NXM_OF_ETH_DST[],load:0xff->NXM_NX_IP_TTL[
> ],load:0->NXM_OF_ICMP_TYPE[],load:0x1->NXM_NX_REG10[0],resubmit(,20)
> >>
> >> And in both the cases I see that there is an upcall for each packet and
> >> odp_execute_masked_set_action is called.
> >
> > OK, I think that my suggestion for that patch (patchwork 739796) was
> > actually addressing a subtly different issue.
> >
> > With regards to this issue, as far as I understand back to the
> > original report, connection with tuple A is committed to the
> > connection tracker. A is then statelessly modified to tuple B, then a
> > lookup with B is performed. Typically if you have tuple A or tuple A'
> > (ie, the reversed tuple) in the packet headers then looking up with
> > either of these headers will find the same connection. If you then
> > perform a lookup with tuple B, then it can only look up using B or B';
> > no state was kept about the translation from A->B, so there's no way
> > for the connection tracker to associate tuple B back to tuple A.
> > Lookup using B and B' cannot find a connection because it was never
> > committed like that. Therefore it would be new. However, since B is a
> > SYN-ACK packet, the Linux connection tracker considers that it is
> > invalid rather than new. For it to work, the tuple B', ie the original
> > SYN, should be committed first.
> >
> >
> > Thanks for the explanation. The issue we are seeing is for ICMP packets
> and
> > looking into the connection tracking entries I see the packet is in
> > UNREPLIED state. When the ICMP reply is framed by the ovs flows, the
> tuple
> > would still remain the same right ? Only ip4.src is swapped with ip4.dst
> and
> > ICMP code is changed.
>
> Right, so for ICMP I think the problem is different. Yes, by the looks
> only src/dst are swapped and code changed, which should produce a
> tuple that can look up and find the original connection. Given that
> the execution is happening in userspace, that would be one path to
> follow: exactly what is executed upon the packet in terms of datapath
> actions after the kernel runs userspace(...,slow_path(action))? Where
> is the conntrack call from that path, and how does it try to get the
> ct_state from the kernel?
>
> I wonder if the ICMP issue is related to the patch here:
> http://patchwork.ozlabs.org/patch/775756/
>
>
> Thanks. I will test some more and get back on this.
>

Hi Joe,

I tested the scenario again by adding the below ACLs. All my testing is
using the script here - [1]

# ACLs for sw0-port1
#  - allow all outgoing traffic and related reply traffic
#  - deny all incoming traffic not a part of an existing connection
sudo ovn-nbctl --wait=hv acl-add sw0 from-lport 1001 'inport == "sw0-port1"
&& ip' allow-related
sudo ovn-nbctl --wait=hv acl-add sw0 to-lport 1001 'outport == "sw0-port1"
&& ip' drop
sudo ovn-nbctl acl-list sw0


With the above ACLs, the ping to the router ip is dropped. Below is the
output of ovs-dpctl dump-flows

sudo ovs-dpctl dump-flows system at ovs-system
recirc_id(0x5),in_port(2),ct_state(+new-est-rel-rpl-inv+trk)
,ct_label(0/0x1),eth(src=50:54:00:00:00:01,dst=00:00:00:
00:ff:01),eth_type(0x0800),ipv4(src=192.168.0.2,dst=192.168.
0.1,proto=1,ttl=64,frag=no),icmp(type=8,code=0), packets:119, bytes:11662,
used:0.637s, actions:userspace(pid=4294963061,slow_path(action))
recirc_id(0),in_port(2),eth(src=00:00:00:00:00:00/01:00:00:
00:00:00),eth_type(0x0800),ipv4(frag=no), packets:119, bytes:11662,
used:0.637s, actions:ct(zone=1),recirc(0x5)
recirc_id(0x6),in_port(2),ct_state(+new-est-rel-rpl-inv+trk)
,ct_label(0/0x1),eth_type(0x0800),ipv4(frag=no), packets:119, bytes:11662,
used:0.637s, actions:drop


I also tested by applying the patch http://patchwork.ozlabs.
org/patch/775756/ and I could still see the issue and the datapath flows
were same in both the cases.

[1] - https://gist.github.com/russellb/4ab0a9641f12f8ac66fdd6822ee7789e



This is what I could understand on how ct_state is set and passed between
the datapath and userspace

 - During the upcall, the connection tracking state is passed in the packet
metadata in the nl attributes -
https://github.com/openvswitch/ovs/blob/master/datapath/conntrack.c#L268

 - When the vswitchd sends the packet back to the kernel datapath, it
stores the connection tracking state back here -
https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-upcall.c#L1418
https://github.com/openvswitch/ovs/blob/master/lib/odp-util.c#L4722
  Looks like even if vswitchd clears the ct state (using the ct_clear
action), it would not be passed back to the kernel datapath

 - I think even if we implement ct_clear action in datapath, it would not
solve the problem. In case if there is an upcall after ct_clear action but
before ct commit, the datapath would send the latest ct state back to the
userspace making the previous ct_clear action ineffective.

Thanks
Numan



>
> Nirman
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170706/a22da50a/attachment-0001.html>


More information about the discuss mailing list