[ovs-discuss] external port range on internal logical ip seems weird

Flavio Fernandes flaviof at redhat.com
Wed Apr 22 14:23:40 UTC 2020


[inline]

On Tue, Apr 21, 2020 at 10:44 PM Ankur Sharma <ankur.sharma at nutanix.com>
wrote:

> Hi Flavio,
>
> Glad to see your feedback, please find my replies inline.
>

[flaviof] Heh, my pleasure, really.


>
> Regards,
> Ankur
>
> ------------------------------
> *From:* Flavio Fernandes <flaviof at redhat.com>
> *Sent:* Tuesday, April 21, 2020 6:59 AM
> *To:* Ankur Sharma <ankur.sharma at nutanix.com>
> *Cc:* Numan Siddique <nusiddiq at redhat.com>; Mark Michelson <
> mmichels at redhat.com>; Terry Wilson <twilson at redhat.com>;
> ovs-discuss at openvswitch.org <ovs-discuss at openvswitch.org>
> *Subject:* external port range on internal logical ip seems weird
>
> [cc Numan, Mark, Terry, ovs-discuss]
>
> Hi Ankur,
>
> I'm taking a deeper look at the changes for external port range [0] and
> scratching
> my head a little bit about a particular behavior.
>
> Let me start by mentioning about a basic setup I'm using:
>
> 1 internal switch with 1 logical port to represent a vm (10.0.0.3/24
> [10.0.0.3]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.3_24&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=EOrjT5V5KgG1-572M4PYc-Cpl3YVfweChueTgJQ0Rsk&e=>)
>
> 1 public switch (172.16.0.0/24 [172.16.0.0]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__172.16.0.0_24&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=BVQizpGDrG9m-l8TLsIdzkUoe5GN1hceNvpdLg_7NCk&e=>)
>
> 1 rtr that connects both logical switches (10.0.0.1, 172.16.0.100)
> 1 snat_and_dnat rule for translating the ip, using port range
>
> NOTE: The exact script is in this gist [1].
> ovn-nbctl lsp-add sw0 sw0-port1
> ovn-nbctl ls-add public
> ...
> ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:03 10.0.0.3"
> ovn-nbctl lr-add lr0
> ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24 [10.0.0.1]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1_24&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=v6U0a4mbM0ytxXZ34sIXRCemAZFqqVgvXgaAuwWtR5M&e=>
> ...
> ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.16.0.100/24
> [172.16.0.100]
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__172.16.0.100_24&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=7EHblkMPmz8SwKl9CBPmTnJcwQE232t3PjwGRa1FyyQ&e=>
> ...
> ovn-nbctl --portrange lr-nat-add lr0 dnat_and_snat 172.16.0.110 10.0.0.3
> sw0-port1 30:54:00:00:00:03 8080-8082
>
> And this is what the logical flow looks like regarding NAT:
> [root at ovn-central /]# ovn-sbctl dump-flows lr0 | grep -i -e 'ct_' -e
> 'nat'
>   table=5 (lr_in_unsnat       ), priority=100  , match=(ip && ip4.dst ==
> 172.16.0.110 && inport == "lr0-public"), action=(ct_snat;)
>   table=5 (lr_in_unsnat       ), priority=0    , match=(1), action=(next;)
>   table=6 (lr_in_dnat         ), priority=100  , match=(ip && ip4.dst ==
> 172.16.0.110 && inport == "lr0-public"),
> action=(ct_dnat(10.0.0.3,8080-8082);)
>   table=6 (lr_in_dnat         ), priority=0    , match=(1), action=(next;)
>   table=0 (lr_out_undnat      ), priority=100  , match=(ip && ip4.src ==
> 10.0.0.3 && outport == "lr0-public"), action=(eth.src = 30:54:00:00:00:03;
> ct_dnat;)
>   table=0 (lr_out_undnat      ), priority=0    , match=(1), action=(next;)
>   table=1 (lr_out_snat        ), priority=120  , match=(nd_ns),
> action=(next;)
>   table=1 (lr_out_snat        ), priority=33   , match=(ip && ip4.src ==
> 10.0.0.3 && outport == "lr0-public"), action=(eth.src = 30:54:00:00:00:03;
> ct_snat(172.16.0.110,8080-8082);)
>   table=1 (lr_out_snat        ), priority=0    , match=(1), action=(next;)
>   table=2 (lr_out_egr_loop    ), priority=100  , match=(ip4.dst ==
> 172.16.0.110 && outport == "lr0-public" &&
> is_chassis_resident("sw0-port1")), action=(clone { ct_clear; inport =
> outport; outport = ""; flags = 0; flags.loopback = 1; reg0 = 0; reg1 = 0;
> reg2 = 0; reg3 = 0; reg4 = 0; reg5 = 0; reg6 = 0; reg7 = 0; reg8 = 0; reg9
> = 0; reg9[0] = 1; next(pipeline=ingress, table=0); };)
>
> Out of that:
> [root at ovn-central /]# ovn-sbctl dump-flows lr0 | grep 8080
>   table=6 (lr_in_dnat         ), priority=100  , match=(ip && ip4.dst ==
> 172.16.0.110 && inport == "lr0-public"),
> action=(ct_dnat(10.0.0.3,8080-8082);)
>   table=1 (lr_out_snat        ), priority=33   , match=(ip && ip4.src ==
> 10.0.0.3 && outport == "lr0-public"), action=(eth.src = 30:54:00:00:00:03;
> ct_snat(172.16.0.110,8080-8082);)
>
> The rule "ct_dnat(10.0.0.3,8080-8082)" -- line 40 in gist [1] --  seems
> wrong to me because external port range should, as the name suggests, be
> only applied to the external ip[2]. Am I missing something? That particular
> code lives here [3][4].
>
> What do you think? Maybe we also need "internal_port_range" semantics?
>
> [ANKUR]: Idea behind port range is to specify the range for port address
> translation(PAT). Netfilter allows specification for translating port also,
> while doing (src/dest) IP translation. Now, this PAT happens in either
> direction (based on SNAT or DNAT) and probably thats the reason phrase
> "external" is causing confusion. We dont need separate semantics, we can
> just move to a generic semantics from "external_port_range" to "port_range".
>

[flaviof]  I better understand it now. I was hung up on the word "external"
to be about the external "ip". I see now that external in this context is
actually the port that the "other side (i.e. external)" of the connection
sees as the source port. And that indeed is irrelevant to whether we are
talking about dnat, snat, or both. So, I think we are good; just be ready
to emphasize that external is not talking about ip; and this is
specifically about the source port of the remote side of the nat connection.

While getting this straight in my head, I realized that since port_range
needs nat, we better not allow folks to use it together with the
--stateless flag. Thus, I submitted a follow up patch for this. Take a
look, please:


https://patchwork.ozlabs.org/project/openvswitch/patch/20200422100746.31008-1-flavio@flaviof.com/


>
> Not sure if you agree, but it could be easier to understand if we document
> the port range about the expected source and destination ports, as well as
> logical and external ips.
> [ANKUR]: Sorry, not clear on this point.
>

[flaviof]  Heh, my bad. Indeed not very clear and I think not relevant
given your answer above. Please ignore. ;)




>
> And while we are at this topic, I would like to ask if you would prefer
> using a more
> explicit min/max integer tuple instead of a plain string. For cases when
> only one port is used,
> we could have min and max with the same value; which is not valid right
> now.
>
>
> [ANKUR]: Not highly opinionated on it. However, not clear on the one port
> scenario.
>

[flaviof]   If we changed the string to become a [min,max] tuple, we could
make it easier for parsing and detecting overlaps.
Since the tuple would always require 2 values, we would lose the ability of
allowing a single integer to indicate one port. The implementation right
now does not allow something like "22-22"; it wants "22" for single port
cases. I actually like that better than 2 variations to indicate the same
outcome, too. Anyways, more of a food for thought and maybe not worth
changing at this point.


[flaviof]   So... there is a functionality in Openstack called port
forwarding:

https://specs.openstack.org/openstack/neutron-specs/specs/rocky/port-forwarding.html

My initial thinking was that I could leverage your work on port-range to
implement port-forwarding for Openstack. But the more I look at it, the
more I realize that these are more of an apples and oranges situation. The
way  port-range hooks into NAT is simply an extension to the "action"
portion of the rule; where port-forwarding is all about the matching of the
port. The load-balancing hooks in OVN are looking a lot more inline for
that. I'm taking that approach unless I hear screaming and kicks. ;^)

Best,

-- flaviof





> For a single port, portrange is specified as a single value.
> Something like:
> ovn-nbctl --portrange lr-nat-add router dnat_and_snat 10.15.24.136
> 50.0.0.10 10000
>  table=6 (lr_in_dnat         ), priority=100  , match=(ip && ip4.dst ==
> 10.15.24.136 && inport == "router-to-underlay" &&
> is_chassis_resident("cr-router-to-underlay")),
> action=(ct_dnat(50.0.0.10,10000);)
>     cookie=0xef4fded0, duration=14.025s, table=14, n_packets=0, n_bytes=0,
> priority=100,ip,reg14=0x3,metadata=0x5,nw_dst=10.15.24.136
> actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=
> 50.0.0.10:10000))
> table=1 (lr_out_snat        ), priority=161  , match=(ip && ip4.src ==
> 50.0.0.10 && outport == "router-to-underlay" &&
> is_chassis_resident("cr-router-to-underlay")),
> action=(ct_snat(10.15.24.136,10000);)
>    cookie=0x77727ea, duration=14.034s, table=41, n_packets=0, n_bytes=0,
> priority=161,ip,reg15=0x3,metadata=0x5,nw_src=50.0.0.10
> actions=ct(commit,table=42,zone=NXM_NX_REG12[0..15],nat(src=
> 10.15.24.136:10000))
>
>
> Thanks,
>
> -- flaviof
>
> [0]: https://github.com/ovn-org/ovn/commit/509733cb1e95357072e14715bf2645c88f6c935e
> [github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_commit_509733cb1e95357072e14715bf2645c88f6c935e&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=dzOvM8W5Le_1fw6VzrGkcQkfJIUdoXH0c099b6mCjQQ&e=>
> [1] :
> https://gist.github.com/flavio-fernandes/b3511cad133d9ea9c44276eb7b670f18
> [gist.github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_flavio-2Dfernandes_b3511cad133d9ea9c44276eb7b670f18&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=vJK23VA-S956KndzSPCyarhiDdPo20km0aQSSXKHYkw&e=>
> [2]: https://github.com/ovn-org/ovn/blob/9287f425e8bc5781728b2ff1c60413d3c39c33a8/ovn-nb.xml#L2576-L2595
> [github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_9287f425e8bc5781728b2ff1c60413d3c39c33a8_ovn-2Dnb.xml-23L2576-2DL2595&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=y75WfvcrsbG0f7K3YHgFttwWTXt_2sshOx5s1Gf-dAc&e=>
> [3]: https://github.com/ovn-org/ovn/blob/509733cb1e95357072e14715bf2645c88f6c935e/northd/ovn-northd.c#L8851-L8858
> [github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_509733cb1e95357072e14715bf2645c88f6c935e_northd_ovn-2Dnorthd.c-23L8851-2DL8858&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=qTLRikyEb_YOT3Lcx2tGEVWK9ALUAGJOaJuVwRg98U4&e=>
> [4]: https://github.com/ovn-org/ovn/blob/509733cb1e95357072e14715bf2645c88f6c935e/northd/ovn-northd.c#L8886-L8891
> [github.com]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ovn-2Dorg_ovn_blob_509733cb1e95357072e14715bf2645c88f6c935e_northd_ovn-2Dnorthd.c-23L8886-2DL8891-2CL8851-2DL8858&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=Eh1dvjQkyV1R6KJihRDRIiM4etjHHDc1ENTGU2v0WfY&s=EsWe3Dy5ZWUzCE6Vmw3nJuEtniVq8qTEkR2nzrQIhaw&e=>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200422/44a99853/attachment-0001.html>


More information about the discuss mailing list