[ovs-dev] [PATCH ovn v2] ovn-northd: Skip unsnat flows for load balancer vips in router ingress pipeline
Mark Michelson
mmichels at redhat.com
Thu Mar 26 14:52:18 UTC 2020
I think you may have accidentally re-sent v1 patches with v2's commit
message. The LB protocol is still not in the new flow's match, and the
`ovn-sbctl dump-flows` command is still in the test.
On 3/26/20 10:35 AM, numans at ovn.org wrote:
> From: Numan Siddique <numans at ovn.org>
>
> Suppose there is below NAT entry with external_ip = 172.168.0.100
>
> nat <UUID>
> external ip: "172.168.0.100"
> logical ip: "10.0.0.0/24"
> type: "snat"
>
> And a load balancer with the VIP - 172.168.0.100
>
> _uuid : <UUID>
> external_ids : {}
> name : lb1
> protocol : tcp
> vips : {"172.168.0.100:8080"="10.0.0.4:8080"}
>
> And if these are associated to a gateway logical router
>
> Then we will see the below lflows in the router pipeline
>
> ...
> table=5 (lr_in_unsnat ), priority=90 , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
> ...
> table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
> table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)
>
> When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
> is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
> dnatted to 10.0.0.4 in the dnat conntrack zone.
>
> But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
> stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
> as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
> looks like it is resetting the ct flags.
>
> In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
> lflows in lr_in_dnat stage hit.
>
> This needs to be investigated further. But we can avoid this scenario in OVN
> by adding the below lflow.
>
> table=5 (lr_in_unsnat ), priority=120 , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)
>
> This patch adds the above lflow if the lb vip also has an entry in the NAT table.
>
> This patch is also required to support sctp load balancers in OVN.
>
> Reported-by: Tim Rozet <trozet at redhat.com>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
> Signed-off-by: Numan Siddique <numans at ovn.org>
> ---
> v1 -> v2
> ------
> * Addressed the review comments from Mark - Added the protocol in the
> match.
> * I'm not sure if this is an ovs issue. So updated the commit message
> accordingly.
>
> northd/ovn-northd.8.xml | 14 ++++++++++
> northd/ovn-northd.c | 58 +++++++++++++++++++++++++++++++----------
> tests/ovn-northd.at | 40 ++++++++++++++++++++++++++++
> tests/system-ovn.at | 51 +++++++++++++++++++++++++++++++++++-
> 4 files changed, 148 insertions(+), 15 deletions(-)
>
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 1e0993e07..0682b8b79 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -2138,6 +2138,20 @@ icmp6 {
> <code>redirect-chassis</code>.
> </p>
>
> + <p>
> + For each IPv4 address <var>A</var> defined as load balancer
> + VIP (associated with the logical router) with the protocol
> + <var>P</var> (and the protocol port <var>T</var> if defined) is also
> + present as an <code>external_ip</code> in the NAT table,
> + a priority-120 logical flow is added with the match
> + <code>ip4 && ip4.dst == <var>A</var> && </code> with
> + the actions <code>next;</code> to advance the packet to the next
> + table. If the load balancer also has protocol port <code>B</code> is
> + defined, then the match also has
> + <code><var>P</var>.dst == <var>B</var></code>.
> + The same flows are added for IPv6 load balancers too.
> + </p>
> +
> <p>
> A priority-0 logical flow with match <code>1</code> has actions
> <code>next;</code>.
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index b76df0554..18fbbc7f1 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -7574,7 +7574,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> struct ds *match, struct ds *actions, int priority,
> const char *lb_force_snat_ip, struct lb_vip *lb_vip,
> bool is_udp, struct nbrec_load_balancer *lb,
> - struct shash *meter_groups)
> + struct shash *meter_groups, struct sset *nat_entries)
> {
> build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
> meter_groups);
> @@ -7607,6 +7607,39 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> free(new_match);
> free(est_match);
>
> + const char *ip_match = NULL;
> + if (lb_vip->addr_family == AF_INET) {
> + ip_match = "ip4";
> + } else {
> + ip_match = "ip6";
> + }
> +
> + if (sset_contains(nat_entries, lb_vip->vip)) {
> + /* The load balancer vip is also present in the NAT entries.
> + * So add a high priority lflow to advance the the packet
> + * destined to the vip (and the vip port if defined)
> + * in the S_ROUTER_IN_UNSNAT stage.
> + * There seems to be an issue with ovs-vswitchd. When the new
> + * connection packet destined for the lb vip is received,
> + * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
> + * conntrack zone. For the next packet, if it goes through
> + * unsnat stage, the conntrack flags are not set properly, and
> + * it doesn't hit the established state flows in
> + * S_ROUTER_IN_DNAT stage. */
> + struct ds unsnat_match = DS_EMPTY_INITIALIZER;
> + ds_put_format(&unsnat_match, "%s && %s.dst == %s",
> + ip_match, ip_match, lb_vip->vip);
> + if (lb_vip->vip_port) {
> + ds_put_format(&unsnat_match, " && %s.dst == %d",
> + is_udp ? "udp" : "tcp", lb_vip->vip_port);
> + }
> +
> + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
> + ds_cstr(&unsnat_match), "next;", &lb->header_);
> +
> + ds_destroy(&unsnat_match);
> + }
> +
> if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
> return;
> }
> @@ -7616,19 +7649,11 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> * router has a gateway router port associated.
> */
> struct ds undnat_match = DS_EMPTY_INITIALIZER;
> - if (lb_vip->addr_family == AF_INET) {
> - ds_put_cstr(&undnat_match, "ip4 && (");
> - } else {
> - ds_put_cstr(&undnat_match, "ip6 && (");
> - }
> + ds_put_format(&undnat_match, "%s && (", ip_match);
>
> for (size_t i = 0; i < lb_vip->n_backends; i++) {
> struct lb_vip_backend *backend = &lb_vip->backends[i];
> - if (backend->addr_family == AF_INET) {
> - ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
> - } else {
> - ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
> - }
> + ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
>
> if (backend->port) {
> ds_put_format(&undnat_match, " && %s.src == %d) || ",
> @@ -8879,6 +8904,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> &nat->header_);
> sset_add(&nat_entries, nat->external_ip);
> }
> + } else {
> + /* Add the NAT external_ip to the nat_entries even for
> + * gateway routers. This is required for adding load balancer
> + * flows.*/
> + sset_add(&nat_entries, nat->external_ip);
> }
>
> /* Egress UNDNAT table: It is for already established connections'
> @@ -9059,8 +9089,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> }
> }
>
> - sset_destroy(&nat_entries);
> -
> /* Handle force SNAT options set in the gateway router. */
> if (dnat_force_snat_ip && !od->l3dgw_port) {
> /* If a packet with destination IP address as that of the
> @@ -9121,6 +9149,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> /* Load balancing and packet defrag are only valid on
> * Gateway routers or router with gateway port. */
> if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
> + sset_destroy(&nat_entries);
> continue;
> }
>
> @@ -9195,10 +9224,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> }
> add_router_lb_flow(lflows, od, &match, &actions, prio,
> lb_force_snat_ip, lb_vip, is_udp,
> - nb_lb, meter_groups);
> + nb_lb, meter_groups, &nat_entries);
> }
> }
> sset_destroy(&all_ips);
> + sset_destroy(&nat_entries);
> }
>
> /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index a2989e78e..c783ea2ac 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -1288,3 +1288,43 @@ ovn-nbctl --wait=sb lb-del lb2
> OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor | wc -l`])
>
> AT_CLEANUP
> +
> +AT_SETUP([ovn -- Load balancer VIP in NAT entries])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +ovn-nbctl lr-add lr0
> +ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
> +ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
> +
> +ovn-nbctl set logical_router lr0 options:chassis=ch1
> +
> +ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
> +ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080"
> +ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
> +ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
> +
> +ovn-nbctl lr-lb-add lr0 lb1
> +ovn-nbctl lr-lb-add lr0 lb2
> +ovn-nbctl lr-lb-add lr0 lb3
> +ovn-nbctl lr-lb-add lr0 lb4
> +
> +ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
> +ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
> +ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
> +
> +OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.1 && tcp.dst == 8080" -c) ])
> +
> +ovn-sbctl dump-flows lr0
> +
> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.4 && tcp.dst == 8080" -c) ])
> +
> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.5 && tcp.dst == 8080" -c) ])
> +
> +AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.6 && tcp.dst == 8080" -c) ])
> +
> +AT_CLEANUP
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 3b3379840..f1ae69b20 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -1635,7 +1635,6 @@ ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
> ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
> external_ip=30.0.0.2 -- add logical_router R2 nat @nat
>
> -
> # Wait for ovn-controller to catch up.
> ovn-nbctl --wait=hv sync
> OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
> @@ -1671,6 +1670,56 @@ tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
> tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> ])
>
> +check_est_flows () {
> + n=$(ovs-ofctl dump-flows br-int table=14 | grep \
> +"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
> +| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
> +
> + echo "n_packets=$n"
> + test "$n" != 0
> +}
> +
> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> +
> +
> +ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
> +
> +# Destroy the load balancer and create again. ovn-controller will
> +# clear the OF flows and re add again and clears the n_packets
> +# for these flows.
> +ovn-nbctl destroy load_balancer $uuid
> +uuid=`ovn-nbctl create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
> +ovn-nbctl set logical_router R2 load_balancer=$uuid
> +
> +# Config OVN load-balancer with another VIP (this time with ports).
> +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
> +
> +ovn-nbctl list load_balancer
> +ovn-sbctl dump-flows R2
> +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
> +grep 'nat(src=20.0.0.2)'])
> +
> +dnl Test load-balancing that includes L4 ports in NAT.
> +for i in `seq 1 20`; do
> + echo Request $i
> + NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
> +done
> +
> +dnl Each server should have at least one connection.
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +])
> +
> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> +
> OVS_APP_EXIT_AND_WAIT([ovn-controller])
>
> as ovn-sb
>
More information about the dev
mailing list