[ovs-dev] [PATCH ovn v3] ovn-northd: Skip unsnat flows for load balancer vips in router ingress pipeline

Mark Michelson mmichels at redhat.com
Thu Mar 26 14:58:54 UTC 2020


Acked-by: Mark Michelson <mmichels at redhat.com>

On 3/26/20 10:49 AM, numans at ovn.org wrote:
> From: Numan Siddique <numans at ovn.org>
> 
> Suppose there is below NAT entry with external_ip = 172.168.0.100
> 
> nat <UUID>
>      external ip: "172.168.0.100"
>      logical ip: "10.0.0.0/24"
>      type: "snat"
> 
> And a load balancer with the VIP - 172.168.0.100
> 
> _uuid               : <UUID>
> external_ids        : {}
> name                : lb1
> protocol            : tcp
> vips                : {"172.168.0.100:8080"="10.0.0.4:8080"}
> 
> And if these are associated to a gateway logical router
> 
> Then we will see the below lflows in the router pipeline
> 
> ...
> table=5 (lr_in_unsnat       ), priority=90   , match=(ip && ip4.dst == 172.168.0.100), action=(ct_snat;)
> ...
> table=6 (lr_in_dnat         ), priority=120  , match=(ct.new && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_lb(10.0.0.4:8080);)
> table=6 (lr_in_dnat         ), priority=120  , match=(ct.est && ip && ip4.dst == 172.168.0.100 && tcp && tcp.dst == 8080), action=(ct_dnat;)
> 
> When a new connection packet destinated for the lb vip 172.168.0.100 and tcp.dst = 8080
> is received, the ct.new flow in the lr_in_dnat is hit and the packet's ip4.dst is
> dnatted to 10.0.0.4 in the dnat conntrack zone.
> 
> But for the subsequent packet destined to the vip, the ct.est lflow in the lr_in_dnat
> stage doesn't get hit. In this case, the packet first hits the lr_in_unsnat pri 90 flow
> as mentioned above with the action ct_snat. Even though ct_snat should have no effect,
> looks like it is resetting the ct flags.
> 
> In the case of tcp, the ct.new flow is hit instead of ct.est. In the the case of sctp, neither of the above
> lflows in lr_in_dnat stage hit.
> 
> This needs to be investigated further. But we can avoid this scenario in OVN
> by adding the below lflow.
> 
> table=5 (lr_in_unsnat       ), priority=120  , match=(ip4 && ip4.dst == 172.168.0.100 && tcp.dst == 8080), action=(next;)
> 
> This patch adds the above lflow if the lb vip also has an entry in the NAT table.
> 
> This patch is also required to support sctp load balancers in OVN.
> 
> Reported-by: Tim Rozet <trozet at redhat.com>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1815217
> Signed-off-by: Numan Siddique <numans at ovn.org>
> ---
> v2 -> v3
> -----
>    * v2 was nothing but v1 as I forgot to commit the modified files.
>    * Also updated the ovn-northd.8.xml.
> 
> v1 -> v2
> ------
>    * Addressed the review comments from Mark - Added the protocol in the
>      match.
>    * I'm not sure if this is an ovs issue. So updated the commit message
>      accordingly.
> 
> 
>   northd/ovn-northd.8.xml | 27 +++++++++++++++++++
>   northd/ovn-northd.c     | 59 +++++++++++++++++++++++++++++++----------
>   tests/ovn-northd.at     | 38 ++++++++++++++++++++++++++
>   tests/system-ovn.at     | 51 ++++++++++++++++++++++++++++++++++-
>   4 files changed, 160 insertions(+), 15 deletions(-)
> 
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 1e0993e07..b5e4d6d84 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -2075,6 +2075,33 @@ icmp6 {
>         unSNATted here.
>       </p>
>   
> +    <p>Ingress Table 5: UNSNAT on Gateway and Distributed Routers</p>
> +    <ul>
> +      <li>
> +        <p>
> +          If the Router (Gateway or Distributed) is configured with
> +          load balancers, then below lflows are added:
> +        </p>
> +
> +        <p>
> +          For each IPv4 address <var>A</var> defined as load balancer
> +          VIP with the protocol <var>P</var> (and the protocol port
> +          <var>T</var> if defined) is also present as an
> +          <code>external_ip</code> in the NAT table,
> +          a priority-120 logical flow is added with the match
> +          <code>ip4 && ip4.dst == <var>A</var> &&
> +          <var>P</var></code> with the action <code>next;</code> to
> +          advance the packet to the next table. If the load balancer
> +          has protocol port <code>B</code> defined, then the match also has
> +          <code><var>P</var>.dst == <var>B</var></code>.
> +        </p>
> +
> +        <p>
> +          The above flows are also added for IPv6 load balancers.
> +        </p>
> +      </li>
> +    </ul>
> +
>       <p>Ingress Table 5: UNSNAT on Gateway Routers</p>
>   
>       <ul>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index b76df0554..7ef049310 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -7574,7 +7574,7 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>                      struct ds *match, struct ds *actions, int priority,
>                      const char *lb_force_snat_ip, struct lb_vip *lb_vip,
>                      bool is_udp, struct nbrec_load_balancer *lb,
> -                   struct shash *meter_groups)
> +                   struct shash *meter_groups, struct sset *nat_entries)
>   {
>       build_empty_lb_event_flow(od, lflows, lb_vip, lb, S_ROUTER_IN_DNAT,
>                                 meter_groups);
> @@ -7607,6 +7607,40 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>       free(new_match);
>       free(est_match);
>   
> +    const char *ip_match = NULL;
> +    if (lb_vip->addr_family == AF_INET) {
> +        ip_match = "ip4";
> +    } else {
> +        ip_match = "ip6";
> +    }
> +
> +    if (sset_contains(nat_entries, lb_vip->vip)) {
> +        /* The load balancer vip is also present in the NAT entries.
> +         * So add a high priority lflow to advance the the packet
> +         * destined to the vip (and the vip port if defined)
> +         * in the S_ROUTER_IN_UNSNAT stage.
> +         * There seems to be an issue with ovs-vswitchd. When the new
> +         * connection packet destined for the lb vip is received,
> +         * it is dnat'ed in the S_ROUTER_IN_DNAT stage in the dnat
> +         * conntrack zone. For the next packet, if it goes through
> +         * unsnat stage, the conntrack flags are not set properly, and
> +         * it doesn't hit the established state flows in
> +         * S_ROUTER_IN_DNAT stage. */
> +        struct ds unsnat_match = DS_EMPTY_INITIALIZER;
> +        ds_put_format(&unsnat_match, "%s && %s.dst == %s && %s",
> +                      ip_match, ip_match, lb_vip->vip,
> +                      is_udp ? "udp" : "tcp");
> +        if (lb_vip->vip_port) {
> +            ds_put_format(&unsnat_match, " && %s.dst == %d",
> +                          is_udp ? "udp" : "tcp", lb_vip->vip_port);
> +        }
> +
> +        ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_UNSNAT, 120,
> +                                ds_cstr(&unsnat_match), "next;", &lb->header_);
> +
> +        ds_destroy(&unsnat_match);
> +    }
> +
>       if (!od->l3dgw_port || !od->l3redirect_port || !lb_vip->n_backends) {
>           return;
>       }
> @@ -7616,19 +7650,11 @@ add_router_lb_flow(struct hmap *lflows, struct ovn_datapath *od,
>        * router has a gateway router port associated.
>        */
>       struct ds undnat_match = DS_EMPTY_INITIALIZER;
> -    if (lb_vip->addr_family == AF_INET) {
> -        ds_put_cstr(&undnat_match, "ip4 && (");
> -    } else {
> -        ds_put_cstr(&undnat_match, "ip6 && (");
> -    }
> +    ds_put_format(&undnat_match, "%s && (", ip_match);
>   
>       for (size_t i = 0; i < lb_vip->n_backends; i++) {
>           struct lb_vip_backend *backend = &lb_vip->backends[i];
> -        if (backend->addr_family == AF_INET) {
> -            ds_put_format(&undnat_match, "(ip4.src == %s", backend->ip);
> -        } else {
> -            ds_put_format(&undnat_match, "(ip6.src == %s", backend->ip);
> -        }
> +        ds_put_format(&undnat_match, "(%s.src == %s", ip_match, backend->ip);
>   
>           if (backend->port) {
>               ds_put_format(&undnat_match, " && %s.src == %d) || ",
> @@ -8879,6 +8905,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>                                               &nat->header_);
>                       sset_add(&nat_entries, nat->external_ip);
>                   }
> +            } else {
> +                /* Add the NAT external_ip to the nat_entries even for
> +                 * gateway routers. This is required for adding load balancer
> +                 * flows.*/
> +                sset_add(&nat_entries, nat->external_ip);
>               }
>   
>               /* Egress UNDNAT table: It is for already established connections'
> @@ -9059,8 +9090,6 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>               }
>           }
>   
> -        sset_destroy(&nat_entries);
> -
>           /* Handle force SNAT options set in the gateway router. */
>           if (dnat_force_snat_ip && !od->l3dgw_port) {
>               /* If a packet with destination IP address as that of the
> @@ -9121,6 +9150,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>           /* Load balancing and packet defrag are only valid on
>            * Gateway routers or router with gateway port. */
>           if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) {
> +            sset_destroy(&nat_entries);
>               continue;
>           }
>   
> @@ -9195,10 +9225,11 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>                   }
>                   add_router_lb_flow(lflows, od, &match, &actions, prio,
>                                      lb_force_snat_ip, lb_vip, is_udp,
> -                                   nb_lb, meter_groups);
> +                                   nb_lb, meter_groups, &nat_entries);
>               }
>           }
>           sset_destroy(&all_ips);
> +        sset_destroy(&nat_entries);
>       }
>   
>       /* Logical router ingress table ND_RA_OPTIONS & ND_RA_RESPONSE: IPv6 Router
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index a2989e78e..d127152f5 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -1288,3 +1288,41 @@ ovn-nbctl --wait=sb lb-del lb2
>   OVS_WAIT_UNTIL([test 0 = `ovn-sbctl list service_monitor |  wc -l`])
>   
>   AT_CLEANUP
> +
> +AT_SETUP([ovn -- Load balancer VIP in NAT entries])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +ovn-nbctl lr-add lr0
> +ovn-nbctl lrp-add lr0 lr0-public 00:00:01:01:02:04 192.168.2.1/24
> +ovn-nbctl lrp-add lr0 lr0-join 00:00:01:01:02:04 10.10.0.1/24
> +
> +ovn-nbctl set logical_router lr0 options:chassis=ch1
> +
> +ovn-nbctl lb-add lb1 "192.168.2.1:8080" "10.0.0.4:8080"
> +ovn-nbctl lb-add lb2 "192.168.2.4:8080" "10.0.0.5:8080" udp
> +ovn-nbctl lb-add lb3 "192.168.2.5:8080" "10.0.0.6:8080"
> +ovn-nbctl lb-add lb4 "192.168.2.6:8080" "10.0.0.7:8080"
> +
> +ovn-nbctl lr-lb-add lr0 lb1
> +ovn-nbctl lr-lb-add lr0 lb2
> +ovn-nbctl lr-lb-add lr0 lb3
> +ovn-nbctl lr-lb-add lr0 lb4
> +
> +ovn-nbctl lr-nat-add lr0 snat 192.168.2.1 10.0.0.0/24
> +ovn-nbctl lr-nat-add lr0 dnat_and_snat 192.168.2.4 10.0.0.4
> +ovn-nbctl lr-nat-add lr0 dnat 192.168.2.5 10.0.0.5
> +
> +OVS_WAIT_UNTIL([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.1 && tcp && tcp.dst == 8080" -c) ])
> +
> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.4 && udp && udp.dst == 8080" -c) ])
> +
> +AT_CHECK([test 1 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.5 && tcp && tcp.dst == 8080" -c) ])
> +
> +AT_CHECK([test 0 = $(ovn-sbctl dump-flows lr0 | grep lr_in_unsnat | \
> +grep "ip4 && ip4.dst == 192.168.2.6 && tcp && tcp.dst == 8080" -c) ])
> +
> +AT_CLEANUP
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 3b3379840..f1ae69b20 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -1635,7 +1635,6 @@ ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.16
>   ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
>       external_ip=30.0.0.2 -- add logical_router R2 nat @nat
>   
> -
>   # Wait for ovn-controller to catch up.
>   ovn-nbctl --wait=hv sync
>   OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | \
> @@ -1671,6 +1670,56 @@ tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(sr
>   tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
>   ])
>   
> +check_est_flows () {
> +    n=$(ovs-ofctl dump-flows br-int table=14 | grep \
> +"priority=120,ct_state=+est+trk,tcp,metadata=0x2,nw_dst=30.0.0.2,tp_dst=8000" \
> +| grep nat | sed -n 's/.*n_packets=\([[0-9]]\{1,\}\).*/\1/p')
> +
> +    echo "n_packets=$n"
> +    test "$n" != 0
> +}
> +
> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> +
> +
> +ovn-nbctl set logical_router R2 options:lb_force_snat_ip="20.0.0.2"
> +
> +# Destroy the load balancer and create again. ovn-controller will
> +# clear the OF flows and re add again and clears the n_packets
> +# for these flows.
> +ovn-nbctl destroy load_balancer $uuid
> +uuid=`ovn-nbctl  create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"`
> +ovn-nbctl set logical_router R2 load_balancer=$uuid
> +
> +# Config OVN load-balancer with another VIP (this time with ports).
> +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"'
> +
> +ovn-nbctl list load_balancer
> +ovn-sbctl dump-flows R2
> +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-flows br-int table=41 | \
> +grep 'nat(src=20.0.0.2)'])
> +
> +dnl Test load-balancing that includes L4 ports in NAT.
> +for i in `seq 1 20`; do
> +    echo Request $i
> +    NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log])
> +done
> +
> +dnl Each server should have at least one connection.
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) |
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(20.0.0.2) |
> +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl
> +tcp,orig=(src=172.16.1.2,dst=192.168.1.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=192.168.2.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=20.0.0.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +])
> +
> +OVS_WAIT_UNTIL([check_est_flows], [check established flows])
> +
>   OVS_APP_EXIT_AND_WAIT([ovn-controller])
>   
>   as ovn-sb
> 



More information about the dev mailing list