[ovs-discuss] [ovs-dev] ovn: unsnat handling error for Distributed Gateway

Mickey Spiegel mickeys.dev at gmail.com
Sun Apr 9 22:23:25 UTC 2017


On Thu, Apr 6, 2017 at 7:34 AM, Guoshuai Li <ligs at dtdream.com> wrote:

>
> revese my topology:
>
>                              +---------+--------+
>                              |  VM  172.16.1.7  |
>                              +---------+--------+
>                                        |
>                              +---------+--------+
>                              |  Logical Switch  |
>                              +---------+--------+
>                                        |172.16.1.254
>               10.157.142.3 +-----------+--------+
>           +----------------+  Logical Router 1  +
>           |                +--------------------+
> +---------+--------+
> |  Logical Switch  |
> +------------------+
>           |                +--------------------+
>           +----------------+  Logical Router 2  |
>               10.157.142.1 +--------------------+
>
>
> Hi All, I am having a problem for ovn and need help, thanks.
>>
>>
>> I created two logical routes and connected the two LogicalRoutes through
>> a external LogicalSwitch (connected to the external network) .
>>
>> And then LogicalRoute-1 connected to the VM through the internal
>> LogicalSwitch .
>>
>> my topology:
>>
>>                       10.157.142.3              172.16.1.254
>>                            +--------------------+ +---------+--------+
>>                +---------+--------+
>>           +----------------+  Logical Router 1 +------------------|
>> Logical Switch  +-------------------+ VM 172.16.1.7   |
>>           |                +--------------------+ +------------------+
>>                +------------------+
>> +---------+--------+
>> |  Logical Switch  |
>> +------------------+
>>           |                +--------------------+
>>           +----------------+  Logical Router 2  |
>>                            +--------------------+
>>                       10.157.142.1
>>
>> I tested the master and Branch2.7, it Can not be transferred from VM
>> (172.16.1.7) to LogicaRouter-2 's port (10.157.142.
>>
>     Sorry, The destination address is 10.157.142.1, And The SNAT/unSNAT
> address is 10.157.142.3.
>
>> ) via ping.
>> My logical router is a distributed gateway, and the two logical router
>> ports that connect external LogicalSwitch are on the same chassis.
>> If the two logical router ports are not on the same chassis ping is also
>> OK, And ping from VM (172.16.1.7) to external network is also OK.
>>
>> I looked at the openflow tables on gateway chassis,  I suspected unsnat
>> handling error in Router1 input for icmp replay.
>> I think it is necessary to replace the destination address 10.157.142.3
>> with 172.16.1.7 in Table 19 and route 172.16.1.7 in Table 21, but now the
>> route match is 10.157.142.0/24.
>>
>> cookie=0x92bd0055, duration=68.468s, table=16, n_packets=1, n_bytes=98,
>> idle_age=36, priority=50,reg14=0x4,metadata=0x7,dl_dst=fa:16:3e:58:1c:8a
>> actions=resubmit(,17)
>> cookie=0x45765344, duration=68.467s, table=17, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,18)
>> cookie=0xaeaaed29, duration=68.479s, table=18, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,19)
>> cookie=0xce785d51, duration=68.479s, table=19, n_packets=1, n_bytes=98,
>> idle_age=36, priority=100,ip,reg14=0x4,metadata=0x7,nw_dst=10.157.142.3
>> actions=ct(table=20,zone=NXM_NX_REG12[0..15],nat)
>> cookie=0xbd994421, duration=68.481s, table=20, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,metadata=0x7 actions=resubmit(,21)
>> cookie=0xaea3a6ae, duration=68.479s, table=21, n_packets=1, n_bytes=98,
>> idle_age=36, priority=49,ip,metadata=0x7,nw_dst=10.157.142.0/24
>> actions=dec_ttl(),move:NXM_OF_IP_DST[]->NXM_NX_XXREG0[96..12
>> 7],load:0xa9d8e03->NXM_NX_XXREG0[64..95],mod_dl_src:fa:16:
>> 3e:58:1c:8a,load:0x4->NXM_NX_REG15[],load:0x1->NXM_NX_REG10
>> [0],resubmit(,22)
>> cookie=0xce6e8d4e, duration=68.482s, table=22, n_packets=1, n_bytes=98,
>> idle_age=36, priority=0,ip,metadata=0x7 actions=push:NXM_NX_REG0[],pus
>> h:NXM_NX_XXREG0[96..127],pop:NXM_NX_REG0[],mod_dl_dst:00:
>> 00:00:00:00:00,resubmit(,66),pop:NXM_NX_REG0[],resubmit(,23)
>> cookie=0xce89c4ed, duration=68.481s, table=23, n_packets=1, n_bytes=98,
>> idle_age=36, priority=150,reg15=0x4,metadata=0x7,dl_dst=00:00:00:00:00:00
>> actions=load:0x5->NXM_NX_REG15[],resubmit(,24)
>> cookie=0xb2d84350, duration=68.469s, table=24, n_packets=1, n_bytes=98,
>> idle_age=36, priority=100,ip,metadata=0x7,dl_dst=00:00:00:00:00:00
>>
>> I do not know why and need help, thanks.
>>
>
I was able to reproduce this. I agree with your analysis. Looking at
ovs-ofctl dump-flows, the packet counts indicate that the packet is subject
to ct(...,nat), but the routing table match is as if NAT never occurred.

I tried with gateway routers and it worked. There are some differences in
ovs-dpctl dump-flows.

For the case of gateway routers:

vagrant at compute2:~$ sudo ovs-dpctl dump-flows

recirc_id(0x14),tunnel(tun_id=0x6,src=192.168.33.31,dst=192.168.33.32,geneve({}{}),flags(-df+csum+key)),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:02:02:03:04),eth_type(0x0800),ipv4(src=172.16.1.3,dst=172.16.1.10,proto=1,ttl=62,frag=no),icmp(type=8,code=0),
packets:3, bytes:294, used:1.981s,
actions:userspace(pid=2658598031,slow_path(action))

recirc_id(0x16),tunnel(tun_id=0x6,src=192.168.33.31,dst=192.168.33.32,geneve({class=0x102,type=0x80,len=4,0x10002}),flags(-df+csum+key)),in_port(4),eth(src=00:00:02:02:03:04,dst=00:00:02:01:02:03),eth_type(0x0800),ipv4(src=172.16.1.10,dst=192.168.1.2,tos=0/0x3,ttl=254,frag=no),
packets:3, bytes:294, used:1.981s,
actions:set(tunnel(tun_id=0x3,dst=192.168.33.31,ttl=64,tp_src=24284,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x10002}),flags(df|csum|key))),set(eth(src=00:00:01:01:02:03,dst=f0:00:00:01:02:03)),set(ipv4(src=172.16.1.10,dst=192.168.1.2,tos=0/0x3,ttl=252)),4

recirc_id(0),tunnel(tun_id=0x6,src=192.168.33.31,dst=192.168.33.32,geneve({class=0x102,type=0x80,len=4,0x10002/0x7fffffff}),flags(-df+csum+key)),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:04:01:02:04),eth_type(0x0800),ipv4(src=
192.168.1.2/255.255.255.254,dst=172.16.1.10,proto=1,ttl=63,frag=no),
packets:3, bytes:294, used:1.981s, actions:ct(zone=1,nat),recirc(0x13)

recirc_id(0x15),tunnel(tun_id=0x6,src=192.168.33.31,dst=192.168.33.32,geneve({}{}),flags(-df+csum+key)),in_port(4),eth(src=00:00:02:01:02:03,dst=00:00:02:02:03:04),eth_type(0x0800),ipv4(src=172.16.1.10,dst=172.16.1.3,proto=1,ttl=255,frag=no),
packets:3, bytes:294, used:1.981s,
actions:set(eth(src=00:00:02:02:03:04,dst=00:00:02:01:02:03)),set(ipv4(src=172.16.1.10,dst=172.16.1.3,ttl=254)),ct(zone=2,nat),ct(commit,zone=1,nat(dst=192.168.1.2)),recirc(0x16)

recirc_id(0x13),tunnel(tun_id=0x6,src=192.168.33.31,dst=192.168.33.32,geneve({}{}),flags(-df+csum+key)),in_port(4),eth(src=00:00:04:01:02:03,dst=00:00:04:01:02:04),eth_type(0x0800),ipv4(src=192.168.1.2,dst=172.16.1.10,ttl=63,frag=no),
packets:3, bytes:294, used:1.981s,
actions:set(eth(src=00:00:02:01:02:03,dst=00:00:02:02:03:04)),set(ipv4(src=192.168.1.2,dst=172.16.1.10,ttl=62)),ct(commit,zone=2,nat(src=172.16.1.3)),recirc(0x14)

Note that recirc_id(0x15) goes to ct() actions after the ICMP response is
generated in the slow path, which is required for ICMP type change.


With distributed routers and distributed gateway ports:

vagrant at compute2:~$ sudo ovs-dpctl dump-flows

recirc_id(0),tunnel(tun_id=0x1,src=192.168.33.31,dst=192.168.33.32,geneve({class=0x102,type=0x80,len=4,0x10004/0x7fffffff}),flags(-df+csum+key)),in_port(4),eth_type(0x0800),ipv4(src=192.168.1.3,frag=no),
packets:3, bytes:294, used:2.388s,
actions:ct(commit,zone=3,nat(src=172.16.1.1)),recirc(0x3)

recirc_id(0x3),tunnel(tun_id=0x1,src=192.168.33.31,dst=192.168.33.32,geneve({}{}),flags(-df+csum+key)),in_port(4),eth(src=00:00:02:01:02:03,dst=00:00:02:01:02:04),eth_type(0x0800),ipv4(src=172.16.1.1,dst=172.16.1.10,proto=1,ttl=63,frag=no),icmp(type=8,code=0),
packets:3, bytes:294, used:2.389s,
actions:userspace(pid=2248102802,slow_path(action))

recirc_id(0x4),tunnel(tun_id=0x1,src=192.168.33.31,dst=192.168.33.32,geneve({}{}),flags(-df+csum+key)),in_port(4),eth(src=00:00:02:01:02:04,dst=00:00:02:01:02:03),eth_type(0x0800),ipv4(dst=172.16.1.1,ttl=254,frag=no),
packets:3, bytes:294, used:2.389s,
actions:userspace(pid=2248102802,slow_path(controller))

The recirc_id(0x4) entry ends up with the slow_path(controller) action
resulting from table 24. I do not yet know why the earlier ct() actions
from table 19 were skipped.
Mickey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20170409/15c872fb/attachment-0001.html>


More information about the discuss mailing list