[ovs-dev] [PATCH ovn] ovn-northd: Skip unsnat flows for load balancer vips in router ingress pipeline

Numan Siddique numans at ovn.org
Thu Mar 26 14:17:29 UTC 2020


On Thu, Mar 26, 2020 at 6:06 PM Mark Michelson <mmichels at redhat.com> wrote:
>
> The gist of this change is that the configured SNAT external address and
> the load balancer VIP are the same. This means that we end up with both
> SNAT and DNAT occurring for the same external address. Would we see a
> similar problem if there were no load balancer, but we had configured
> the nat type to be "dnat_and_snat"? In this case, we wouldn't have the
> same conntrack-based lb flows in table 6. However, would we still run
> into the same issue with the conntrack state not being "est" because of
> the table 5 ct_snat flow? Or is there some safeguard in place to ensure
> this does not happen?

I don't think it's required in the case of dnat_and_snat. That's
because we don't check ct flags for
dnat_and_snat entries in lr_in_dnat.

Let's say we have below dnat_and_snat entry

***
nat c34ff396-80d2-42aa-a4cd-747150e4b1ab
        external ip: "172.168.0.110"
        logical ip: "10.0.0.3"
        type: "dnat_and_snat"
****
Then the flow is
 table=6 (lr_in_dnat         ), priority=100  , match=(ip && ip4.dst
== 172.168.0.110 && inport == "lr0-public"),
action=(ct_dnat(10.0.0.3);)

And the corresponding OF flow is

table=14, n_packets=0, n_bytes=0, idle_age=523,
priority=100,ip,reg14=0x3,metadata=0x3,nw_dst=172.168.0.110
actions=ct(commit,table=15,zone=NXM_NX_REG11[0..15],nat(dst=10.0.0.3))

Probably we may have to revisit the logical flows and see why ct
flags are not used in the case of dnat_and_snat entries.

>
> See inline below for more
>
> On 3/26/20 7:32 AM, numans at ovn.org wrote:
> > From: Numan Siddique <numans at ovn.org>
> >
> > Suppose there is below NAT entry with external_ip = 172.168.0.100
> >
> > nat <UUID>
> >      external ip: "172.168.0.100"
> >      logical ip: "10.0.0.0/24"
> >      type: "snat"
> >
> > And a load balancer with the VIP - 172.168.0.100
> >
> > _uuid               : <UUID>
> > external_ids        : {}
> > name                : lb1
> > protocol            : tcp
> > vips                : {"172.168.0.100:8080"="10.0.0.4:8080"}
> >
> > And if these are associated to a gateway logical router
> >
> > Then we will see the below lflows in the router pipeline
> >
> > ...
> > table=5 (lr_in_unsnat       ), priority=90   , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
> > ...
> > table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
> > table=6 (lr_in_dnat         ), priority=120  , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)
> >
> > When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
> > is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
> > dnatted to 10.0.0.4 in the dnat conntrack zone.
> >
> > But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
> > stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
> > as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
> > but looks like there is some issue in either ovs-vswitchd or in datapath, and the ct flags
> > are not set properly.
> >
> > In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
> > lflows in lr_in_dnat stage hit.
> >
> > This needs to be investigated further. But we can avoid this scenario in OVN
> > by adding the below lflow.
> >
> > table=5 (lr_in_unsnat       ), priority=120  , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)
> >
> > This patch adds the above lflow if the lb vip also has an entry in the NAT table.
> >
> > This patch is also required to support sctp load balancers in OVN.
> >
> > Reported-by: Tim Rozet <trozet at redhat.com>
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
> > Signed-off-by: Numan Siddique <numans at ovn.org>
> > ---
> >   northd/ovn-northd.8.xml | 14 ++++++++++
> >   northd/ovn-northd.c     | 58 +++++++++++++++++++++++++++++++----------
> >   tests/ovn-northd.at     | 40 ++++++++++++++++++++++++++++
> >   tests/system-ovn.at     | 51 +++++++++++++++++++++++++++++++++++-
> >   4 files changed, 148 insertions(+), 15 deletions(-)
> >
> > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> > index 1e0993e07..0682b8b79 100644
> > --- a/northd/ovn-northd.8.xml
> > +++ b/northd/ovn-northd.8.xml
> > @@ -2138,6 +2138,20 @@ icmp6 {
> >             <code>redirect-chassis</code>.
> >           </p>
> >
> > +        <p>
> > +          For each IPv4 address <var>A</var> defined as load balancer
> > +          VIP (associated with the logical router) with the protocol
> > +          <var>P</var> (and the protocol port <var>T</var> if defined) is also
> > +          present as an <code>external_ip</code> in the NAT table,
> > +          a priority-120 logical flow is added with the match
> > +          <code>ip4 && ip4.dst == <var>A</var> && </code> with
> > +          the actions <code>next;</code> to advance the packet to the next
> > +          table. If the load balancer also has protocol port <code>B</code> is
> > +          defined, then the match also has
> > +          <code><var>P</var>.dst == <var>B</var></code>.
> > +          The same flows are added for IPv6 load balancers too.
> > +        </p>
> > +
> >           <p>
> >             A priority-0 logical flow with match <code>1</code> has actions
> >             <code>next;</code>.
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index b76df0554..18fbbc7f1 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -7574,7 +7574,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> >                      struct ds *match, struct ds *actions, int priority,
> >                      const char *lb_force_snat_ip, struct lb_vip *lb_vip,
> >                      bool is_udp, struct nbrec_load_balancer *lb,
> > -                   struct shash *meter_groups)
> > +                   struct shash *meter_groups, struct sset *nat_entries)
> >   {
> >       build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
> >                                 meter_groups);
> > @@ -7607,6 +7607,39 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> >       free(new_match);
> >       free(est_match);
> >
> > +    const char *ip_match = NULL;
> > +    if (lb_vip->addr_family == AF_INET) {
> > +        ip_match = "ip4";
> > +    } else {
> > +        ip_match = "ip6";
> > +    }
> > +
> > +    if (sset_contains(nat_entries, lb_vip->vip)) {
> > +        /* The load balancer vip is also present in the NAT entries.
> > +         * So add a high priority lflow to advance the the packet
> > +         * destined to the vip (and the vip port if defined)
> > +         * in the S_ROUTER_IN_UNSNAT stage.
> > +         * There seems to be an issue with ovs-vswitchd. When the new
> > +         * connection packet destined for the lb vip is received,
> > +         * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
> > +         * conntrack zone. For the next packet, if it goes through
> > +         * unsnat stage, the conntrack flags are not set properly, and
> > +         * it doesn't hit the established state flows in
> > +         * S_ROUTER_IN_DNAT stage. */
> > +        struct ds unsnat_match = DS_EMPTY_INITIALIZER;
> > +        ds_put_format(&unsnat_match, "%s && %s.dst == %s",
> > +                      ip_match, ip_match, lb_vip->vip);
>
> I think you should add the load balancer l4 protocol here. Consider that
> you have the described NAT setup with a UDP load balancer with no VIP
> port. TCP traffic would hit this flow intended for load balancer traffic
> instead of the lower priority flow intended to UNSNAT.

Agree. I'll do in v2.


>
> > +        if (lb_vip->vip_port) {
> > +            ds_put_format(&unsnat_match, " && %s.dst == %d",
> > +                          is_udp ? "udp" : "tcp", lb_vip->vip_port);
> > +        }
> > +
> > +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
> > +                                ds_cstr(&unsnat_match), "next;", &lb->header_);
> > +
> > +        ds_destroy(&unsnat_match);
> > +    }
> > +
> >       if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
> >           return;
> >       }
> > @@ -7616,19 +7649,11 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
> >        * router has a gateway router port associated.
> >        */
> >       struct ds undnat_match = DS_EMPTY_INITIALIZER;
> > -    if (lb_vip->addr_family == AF_INET) {
> > -        ds_put_cstr(&undnat_match, "ip4 && (");
> > -    } else {
> > -        ds_put_cstr(&undnat_match, "ip6 && (");
> > -    }
> > +    ds_put_format(&undnat_match, "%s && (", ip_match);
> >
> >       for (size_t i = 0; i < lb_vip->n_backends; i++) {
> >           struct lb_vip_backend *backend = &lb_vip->backends[i];
> > -        if (backend->addr_family == AF_INET) {
> > -            ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
> > -        } else {
> > -            ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
> > -        }
> > +        ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
> >
> >           if (backend->port) {
> >               ds_put_format(&undnat_match, " && %s.src == %d) || ",
> > @@ -8879,6 +8904,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >                                               &nat->header_);
> >                       sset_add(&nat_entries, nat->external_ip);
> >                   }
> > +            } else {
> > +                /* Add the NAT external_ip to the nat_entries even for
> > +                 * gateway routers. This is required for adding load balancer
> > +                 * flows.*/
> > +                sset_add(&nat_entries, nat->external_ip);
> >               }
> >
> >               /* Egress UNDNAT table: It is for already established connections'
> > @@ -9059,8 +9089,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >               }
> >           }
> >
> > -        sset_destroy(&nat_entries);
> > -
> >           /* Handle force SNAT options set in the gateway router. */
> >           if (dnat_force_snat_ip && !od->l3dgw_port) {
> >               /* If a packet with destination IP address as that of the
> > @@ -9121,6 +9149,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >           /* Load balancing and packet defrag are only valid on
> >            * Gateway routers or router with gateway port. */
> >           if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
> > +            sset_destroy(&nat_entries);
> >               continue;
> >           }
> >
> > @@ -9195,10 +9224,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
> >                   }
> >                   add_router_lb_flow(lflows, od, &match, &actions, prio,
> >                                      lb_force_snat_ip, lb_vip, is_udp,
> > -                                   nb_lb, meter_groups);
> > +                                   nb_lb, meter_groups, &nat_entries);
> >               }
> >           }
> >           sset_destroy(&all_ips);
> > +        sset_destroy(&nat_entries);
> >       }
> >
> >       /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index a2989e78e..c783ea2ac 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -1288,3 +1288,43 @@ ovn-nbctl --wait=sb lb-del lb2
> >   OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor |  wc -l`])
> >
> >   AT_CLEANUP
> > +
> > +AT_SETUP([ovn -- Load balancer VIP in NAT entries])
> > +AT_SKIP_IF([test $HAVE_PYTHON = no])
> > +ovn_start
> > +
> > +ovn-nbctl lr-add lr0
> > +ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
> > +ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
> > +
> > +ovn-nbctl set logical_router lr0 options:chassis=ch1
> > +
> > +ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
> > +ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080"
> > +ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
> > +ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
> > +
> > +ovn-nbctl lr-lb-add lr0 lb1
> > +ovn-nbctl lr-lb-add lr0 lb2
> > +ovn-nbctl lr-lb-add lr0 lb3
> > +ovn-nbctl lr-lb-add lr0 lb4
> > +
> > +ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
> > +ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
> > +ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
> > +
> > +OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.1 && tcp.dst == 8080" -c) ])
> > +
> > +ovn-sbctl dump-flows lr0
> Is this left over from debugging?

Yes. I'll clean it in v2.

Thanks
Numan

>
> > +
> > +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.4 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.5 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> > +grep "ip4 && ip4.dst == 192.168.2.6 && tcp.dst == 8080" -c) ])
> > +
> > +AT_CLEANUP
> > diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> > index 3b3379840..f1ae69b20 100644
> > --- a/tests/system-ovn.at
> > +++ b/tests/system-ovn.at
> > @@ -1635,7 +1635,6 @@ ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
> >   ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
> >       external_ip=30.0.0.2 -- add logical_router R2 nat @nat
> >
> > -
> >   # Wait for ovn-controller to catch up.
> >   ovn-nbctl --wait=hv sync
> >   OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
> > @@ -1671,6 +1670,56 @@ tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
> >   tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> >   ])
> >
> > +check_est_flows () {
> > +    n=$(ovs-ofctl dump-flows br-int table=14 | grep \
> > +"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
> > +| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
> > +
> > +    echo "n_packets=$n"
> > +    test "$n" != 0
> > +}
> > +
> > +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> > +
> > +
> > +ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
> > +
> > +# Destroy the load balancer and create again. ovn-controller will
> > +# clear the OF flows and re add again and clears the n_packets
> > +# for these flows.
> > +ovn-nbctl destroy load_balancer $uuid
> > +uuid=`ovn-nbctl  create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
> > +ovn-nbctl set logical_router R2 load_balancer=$uuid
> > +
> > +# Config OVN load-balancer with another VIP (this time with ports).
> > +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
> > +
> > +ovn-nbctl list load_balancer
> > +ovn-sbctl dump-flows R2
> > +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
> > +grep 'nat(src=20.0.0.2)'])
> > +
> > +dnl Test load-balancing that includes L4 ports in NAT.
> > +for i in `seq 1 20`; do
> > +    echo Request $i
> > +    NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
> > +done
> > +
> > +dnl Each server should have at least one connection.
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> > +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
> > +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> > +tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> > +])
> > +
> > +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> > +
> >   OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >
> >   as ovn-sb
> >
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list