[ovs-dev] [PATCH ovn] ovn-northd: Skip unsnat flows for load balancer vips in router ingress pipeline
Numan Siddique
numans at ovn.org
Thu Mar 26 14:17:29 UTC 2020
On Thu, Mar 26, 2020 at 6:06 PM Mark Michelson <mmichels at redhat.com> wrote:
>
> The gist of this change is that the configured SNAT external address and
> the load balancer VIP are the same. This means that we end up with both
> SNAT and DNAT occurring for the same external address. Would we see a
> similar problem if there were no load balancer, but we had configured
> the nat type to be "dnat_and_snat"? In this case, we wouldn't have the
> same conntrack-based lb flows in table 6. However, would we still run
> into the same issue with the conntrack state not being "est" because of
> the table 5 ct_snat flow? Or is there some safeguard in place to ensure
> this does not happen?
I don't think it's required in the case of dnat_and_snat. That's
because we don't check ct flags for
dnat_and_snat entries in lr_in_dnat.
Let's say we have below dnat_and_snat entry
***
nat c34ff396-80d2-42aa-a4cd-747150e4b1ab
external ip: "172.168.0.110"
logical ip: "10.0.0.3"
type: "dnat_and_snat"
****
Then the flow is
table=6 (lr_in_dnat ), priority=100 , match=(ip && ip4.dst
== 172.168.0.110 && inport == "lr0-public"),
action=(ct_dnat(10.0.0.3);)
And the corresponding OF flow is
table=14, n_packets=0, n_bytes=0, idle_age=523,
priority=100,ip,reg14=0x3,metadata=0x3,nw_dst=172.168.0.110
actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=10.0.0.3))
Probably we may have to revisit the logical flows and see why ct
flags are not used in the case of dnat_and_snat entries.
>
> See inline below for more
>
> On 3/26/20 7:32 AM, numans at ovn.org wrote:
> > From: Numan Siddique <numans at ovn.org>
> >
> > Suppose there is below NAT entry with external_ip = 172.168.0.100
> >
> > nat <UUID>
> > external ip: "172.168.0.100"
> > logical ip: "10.0.0.0/24"
> > type: "snat"
> >
> > And a load balancer with the VIP - 172.168.0.100
> >
> > _uuid : <UUID>
> > external_ids : {}
> > name : lb1
> > protocol : tcp
> > vips : {"172.168.0.100:8080"="10.0.0.4:8080"}
> >
> > And if these are associated to a gateway logical router
> >
> > Then we will see the below lflows in the router pipeline
> >
> > ...
> > table=5 (lr_in_unsnat ), priority=90 , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
> > ...
> > table=6 (lr_in_dnat ), priority=120 , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
> > table=6 (lr_in_dnat ), priority=120 , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)
> >
> > When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
> > is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
> > dnatted to 10.0.0.4 in the dnat conntrack zone.
> >
> > But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
> > stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
> > as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
> > but looks like there is some issue in either ovs-vswitchd or in datapath, and the ct flags
> > are not set properly.
> >
> > In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
> > lflows in lr_in_dnat stage hit.
> >
> > This needs to be investigated further. But we can avoid this scenario in OVN
> > by adding the below lflow.
> >
> > table=5 (lr_in_unsnat ), priority=120 , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)
> >
> > This patch adds the above lflow if the lb vip also has an entry in the NAT table.
> >
> > This patch is also required to support sctp load balancers in OVN.
> >
> > Reported-by: Tim Rozet <trozet at redhat.com>
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
> > Signed-off-by: Numan Siddique <numans at ovn.org>
> > ---
> > northd/ovn-northd.8.xml | 14 ++++++++++
> > northd/ovn-northd.c | 58 +++++++++++++++++++++++++++++++----------
> > tests/ovn-northd.at | 40 ++++++++++++++++++++++++++++
> > tests/system-ovn.at | 51 +++++++++++++++++++++++++++++++++++-
> > 4 files changed, 148 insertions(+), 15 deletions(-)
> >
> > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > index 1e0993e07..0682b8b79 100644
> > --- a/northd/ovn-northd.8.xml
> > +++ b/northd/ovn-northd.8.xml
> > @@ -2138,6 +2138,20 @@ icmp6 {
> > <code>redirect-chassis</code>.
> > </p>
> >
> > + <p>
> > + For each IPv4 address <var>A</var> defined as load balancer
> > + VIP (associated with the logical router) with the protocol
> > + <var>P</var> (and the protocol port <var>T</var> if defined) is also
> > + present as an <code>external_ip</code> in the NAT table,
> > + a priority-120 logical flow is added with the match
> > + <code>ip4 && ip4.dst == <var>A</var> && </code> with
> > + the actions <code>next;</code> to advance the packet to the next
> > + table. If the load balancer also has protocol port <code>B</code> is
> > + defined, then the match also has
> > + <code><var>P</var>.dst == <var>B</var></code>.
> > + The same flows are added for IPv6 load balancers too.
> > + </p>
> > +
> > <p>
> > A priority-0 logical flow with match <code>1</code> has actions
> > <code>next;</code>.
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index b76df0554..18fbbc7f1 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -7574,7 +7574,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> > struct ds *match, struct ds *actions, int priority,
> > const char *lb_force_snat_ip, struct lb_vip *lb_vip,
> > bool is_udp, struct nbrec_load_balancer *lb,
> > - struct shash *meter_groups)
> > + struct shash *meter_groups, struct sset *nat_entries)
> > {
> > build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
> > meter_groups);
> > @@ -7607,6 +7607,39 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> > free(new_match);
> > free(est_match);
> >
> > + const char *ip_match = NULL;
> > + if (lb_vip->addr_family == AF_INET) {
> > + ip_match = "ip4";
> > + } else {
> > + ip_match = "ip6";
> > + }
> > +
> > + if (sset_contains(nat_entries, lb_vip->vip)) {
> > + /* The load balancer vip is also present in the NAT entries.
> > + * So add a high priority lflow to advance the the packet
> > + * destined to the vip (and the vip port if defined)
> > + * in the S_ROUTER_IN_UNSNAT stage.
> > + * There seems to be an issue with ovs-vswitchd. When the new
> > + * connection packet destined for the lb vip is received,
> > + * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
> > + * conntrack zone. For the next packet, if it goes through
> > + * unsnat stage, the conntrack flags are not set properly, and
> > + * it doesn't hit the established state flows in
> > + * S_ROUTER_IN_DNAT stage. */
> > + struct ds unsnat_match = DS_EMPTY_INITIALIZER;
> > + ds_put_format(&unsnat_match, "%s && %s.dst == %s",
> > + ip_match, ip_match, lb_vip->vip);
>
> I think you should add the load balancer l4 protocol here. Consider that
> you have the described NAT setup with a UDP load balancer with no VIP
> port. TCP traffic would hit this flow intended for load balancer traffic
> instead of the lower priority flow intended to UNSNAT.
Agree. I'll do in v2.
>
> > + if (lb_vip->vip_port) {
> > + ds_put_format(&unsnat_match, " && %s.dst == %d",
> > + is_udp ? "udp" : "tcp", lb_vip->vip_port);
> > + }
> > +
> > + ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
> > + ds_cstr(&unsnat_match), "next;", &lb->header_);
> > +
> > + ds_destroy(&unsnat_match);
> > + }
> > +
> > if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
> > return;
> > }
> > @@ -7616,19 +7649,11 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> > * router has a gateway router port associated.
> > */
> > struct ds undnat_match = DS_EMPTY_INITIALIZER;
> > - if (lb_vip->addr_family == AF_INET) {
> > - ds_put_cstr(&undnat_match, "ip4 && (");
> > - } else {
> > - ds_put_cstr(&undnat_match, "ip6 && (");
> > - }
> > + ds_put_format(&undnat_match, "%s && (", ip_match);
> >
> > for (size_t i = 0; i < lb_vip->n_backends; i++) {
> > struct lb_vip_backend *backend = &lb_vip->backends[i];
> > - if (backend->addr_family == AF_INET) {
> > - ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
> > - } else {
> > - ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
> > - }
> > + ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
> >
> > if (backend->port) {
> > ds_put_format(&undnat_match, " && %s.src == %d) || ",
> > @@ -8879,6 +8904,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> > &nat->header_);
> > sset_add(&nat_entries, nat->external_ip);
> > }
> > + } else {
> > + /* Add the NAT external_ip to the nat_entries even for
> > + * gateway routers. This is required for adding load balancer
> > + * flows.*/
> > + sset_add(&nat_entries, nat->external_ip);
> > }
> >
> > /* Egress UNDNAT table: It is for already established connections'
> > @@ -9059,8 +9089,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> > }
> > }
> >
> > - sset_destroy(&nat_entries);
> > -
> > /* Handle force SNAT options set in the gateway router. */
> > if (dnat_force_snat_ip && !od->l3dgw_port) {
> > /* If a packet with destination IP address as that of the
> > @@ -9121,6 +9149,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> > /* Load balancing and packet defrag are only valid on
> > * Gateway routers or router with gateway port. */
> > if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
> > + sset_destroy(&nat_entries);
> > continue;
> > }
> >
> > @@ -9195,10 +9224,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> > }
> > add_router_lb_flow(lflows, od, &match, &actions, prio,
> > lb_force_snat_ip, lb_vip, is_udp,
> > - nb_lb, meter_groups);
> > + nb_lb, meter_groups, &nat_entries);
> > }
> > }
> > sset_destroy(&all_ips);
> > + sset_destroy(&nat_entries);
> > }
> >
> > /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index a2989e78e..c783ea2ac 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -1288,3 +1288,43 @@ ovn-nbctl --wait=sb lb-del lb2
> > OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor | wc -l`])
> >
> > AT_CLEANUP
> > +
> > +AT_SETUP([ovn -- Load balancer VIP in NAT entries])
> > +AT_SKIP_IF([test $HAVE_PYTHON = no])
> > +ovn_start
> > +
> > +ovn-nbctl lr-add lr0
> > +ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
> > +ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
> > +
> > +ovn-nbctl set logical_router lr0 options:chassis=ch1
> > +
> > +ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
> > +ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080"
> > +ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
> > +ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
> > +
> > +ovn-nbctl lr-lb-add lr0 lb1
> > +ovn-nbctl lr-lb-add lr0 lb2
> > +ovn-nbctl lr-lb-add lr0 lb3
> > +ovn-nbctl lr-lb-add lr0 lb4
> > +
> > +ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
> > +ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
> > +ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
> > +
> > +OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.1 && tcp.dst == 8080" -c) ])
> > +
> > +ovn-sbctl dump-flows lr0
> Is this left over from debugging?
Yes. I'll clean it in v2.
Thanks
Numan
>
> > +
> > +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.4 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.5 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.6 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CLEANUP
> > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > index 3b3379840..f1ae69b20 100644
> > --- a/tests/system-ovn.at
> > +++ b/tests/system-ovn.at
> > @@ -1635,7 +1635,6 @@ ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
> > ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
> > external_ip=30.0.0.2 -- add logical_router R2 nat @nat
> >
> > -
> > # Wait for ovn-controller to catch up.
> > ovn-nbctl --wait=hv sync
> > OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
> > @@ -1671,6 +1670,56 @@ tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
> > tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > ])
> >
> > +check_est_flows () {
> > + n=$(ovs-ofctl dump-flows br-int table=14 | grep \
> > +"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
> > +| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
> > +
> > + echo "n_packets=$n"
> > + test "$n" != 0
> > +}
> > +
> > +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> > +
> > +
> > +ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
> > +
> > +# Destroy the load balancer and create again. ovn-controller will
> > +# clear the OF flows and re add again and clears the n_packets
> > +# for these flows.
> > +ovn-nbctl destroy load_balancer $uuid
> > +uuid=`ovn-nbctl create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
> > +ovn-nbctl set logical_router R2 load_balancer=$uuid
> > +
> > +# Config OVN load-balancer with another VIP (this time with ports).
> > +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
> > +
> > +ovn-nbctl list load_balancer
> > +ovn-sbctl dump-flows R2
> > +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
> > +grep 'nat(src=20.0.0.2)'])
> > +
> > +dnl Test load-balancing that includes L4 ports in NAT.
> > +for i in `seq 1 20`; do
> > + echo Request $i
> > + NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
> > +done
> > +
> > +dnl Each server should have at least one connection.
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> > +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> > +tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +])
> > +
> > +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> > +
> > OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >
> > as ovn-sb
> >
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
More information about the dev
mailing list