[ovs-dev] [PATCH ovn v2] northd: Add lflows to send all pkts to conntrack if LB is configured on a lswitch.
Mark Michelson
mmichels at redhat.com
Fri Sep 11 14:43:46 UTC 2020
Acked-by: Mark Michelson <mmichels at redhat.com>
Thanks Numan!
On 9/11/20 5:10 AM, numans at ovn.org wrote:
> From: Numan Siddique <numans at ovn.org>
>
> Prior to this patch, if a load balancer is configured on a logical switch but with no
> ACLs with allow-related configured, then in the ingress pipeline only the packets
> with ip.dst = VIP will be sent to conntrack using the zone id of the source logical port.
>
> If the backend of the load balancer, sends an invalid packet (for example invalid tcp
> sequence number), then such packets will be delivered to the source logical port VIF
> without unDNATting. This causes the source to reset the connection.
>
> This patch fixes this issue by sending all the packets to conntrack if a load balancer
> is configured on the logical switch. Because of this, any invalid (ct.inv) packets will
> be dropped in the ingress pipeline itself.
>
> Unfortunately this impacts the performance as now there will be extra recirculations
> because of ct and ct_commit (for new connections) actions.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1870359
> Reported-by: Tim Rozet (trozet at redhat.com)
> Signed-off-by: Numan Siddique <numans at ovn.org>
> ---
> v1 -> v2
> ----
> * Addressed the review comments from Dumitru and Mark.
>
> northd/ovn-northd.8.xml | 41 +++++++++++++++++-----
> northd/ovn-northd.c | 77 +++++++++++++++++++++++++----------------
> 2 files changed, 80 insertions(+), 38 deletions(-)
>
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index a275442a82..a9826e8e9e 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -340,15 +340,12 @@
> it contains a priority-110 flow to move IPv6 Neighbor Discovery traffic
> to the next table. If load balancing rules with virtual IP addresses
> (and ports) are configured in <code>OVN_Northbound</code> database for a
> - logical switch datapath, a priority-100 flow is added for each configured
> - virtual IP address <var>VIP</var>. For IPv4 <var>VIPs</var>, the match is
> - <code>ip && ip4.dst == <var>VIP</var></code>. For IPv6
> - <var>VIPs</var>, the match is <code>ip &&
> - ip6.dst == <var>VIP</var></code>. The flow sets an action
> + logical switch datapath, a priority-100 flow is added with the match
> + <code>ip</code> to match on IP packets and sets the action
> <code>reg0[0] = 1; next;</code> to act as a hint for table
> <code>Pre-stateful</code> to send IP packets to the connection tracker
> - for packet de-fragmentation before eventually advancing to ingress table
> - <code>LB</code>.
> + for packet de-fragmentation before eventually advancing to ingress
> + table <code>LB</code>.
> If controller_event has been enabled and load balancing rules with
> empty backends have been added in <code>OVN_Northbound</code>, a 130 flow
> is added to trigger ovn-controller events whenever the chassis receives a
> @@ -356,6 +353,32 @@
> previously created, it will be associated to the empty_lb logical flow
> </p>
>
> + <p>
> + Prior to <code>OVN 20.09</code> we were setting the
> + <code>reg0[0] = 1</code> only if the IP destination matches the load
> + balancer VIP. However this had few issues cases where a logical switch
> + doesn't have any ACLs with <code>allow-related</code> action.
> + To understand the issue lets a take a TCP load balancer -
> + <code>10.0.0.10:80=10.0.0.3:80</code>. If a logical port - p1 with
> + IP - 10.0.0.5 opens a TCP connection with the VIP - 10.0.0.10, then the
> + packet in the ingress pipeline of 'p1' is sent to the p1's conntrack zone
> + id and the packet is load balanced to the backend - 10.0.0.3. For the
> + reply packet from the backend lport, it is not sent to the conntrack of
> + backend lport's zone id. This is fine as long as the packet is valid.
> + Suppose the backend lport sends an invalid TCP packet (like incorrect
> + sequence number), the packet gets delivered to the lport 'p1' without
> + unDNATing the packet to the VIP - 10.0.0.10. And this causes the
> + connection to be reset by the lport p1's VIF.
> + </p>
> +
> + <p>
> + We can't fix this issue by adding a logical flow to drop ct.inv packets
> + in the egress pipeline since it will drop all other connections not
> + destined to the load balancers. To fix this issue, we send all the
> + packets to the conntrack in the ingress pipeline if a load balancer is
> + configured. We can now add a lflow to drop ct.inv packets.
> + </p>
> +
> <p>
> This table also has a priority-110 flow with the match
> <code>eth.dst == <var>E</var></code> for all logical switch
> @@ -430,8 +453,8 @@
> <p>
> This table also contains a priority 0 flow with action
> <code>next;</code>, so that ACLs allow packets by default. If the
> - logical datapath has a statetful ACL, the following flows will
> - also be added:
> + logical datapath has a statetful ACL or a load balancer with VIP
> + configured, the following flows will also be added:
> </p>
>
> <ul>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index b95d6cd8a1..3967bae569 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -5034,6 +5034,19 @@ build_empty_lb_event_flow(struct ovn_datapath *od, struct hmap *lflows,
> free(action);
> }
>
> +static bool
> +has_lb_vip(struct ovn_datapath *od)
> +{
> + for (int i = 0; i < od->nbs->n_load_balancer; i++) {
> + struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
> + if (!smap_is_empty(&nb_lb->vips)) {
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> static void
> build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
> struct shash *meter_groups, struct hmap *lbs)
> @@ -5072,8 +5085,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
> 110, lflows);
> }
>
> - struct sset all_ips_v4 = SSET_INITIALIZER(&all_ips_v4);
> - struct sset all_ips_v6 = SSET_INITIALIZER(&all_ips_v6);
> bool vip_configured = false;
> for (int i = 0; i < od->nbs->n_load_balancer; i++) {
> struct nbrec_load_balancer *nb_lb = od->nbs->load_balancer[i];
> @@ -5083,12 +5094,6 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
>
> for (size_t j = 0; j < lb->n_vips; j++) {
> struct lb_vip *lb_vip = &lb->vips[j];
> - if (lb_vip->addr_family == AF_INET) {
> - sset_add(&all_ips_v4, lb_vip->vip);
> - } else {
> - sset_add(&all_ips_v6, lb_vip->vip);
> - }
> -
> build_empty_lb_event_flow(od, lflows, lb_vip, nb_lb,
> S_SWITCH_IN_PRE_LB, meter_groups);
>
> @@ -5102,26 +5107,37 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
> }
>
> /* 'REGBIT_CONNTRACK_DEFRAG' is set to let the pre-stateful table send
> - * packet to conntrack for defragmentation. */
> - const char *ip_address;
> - SSET_FOR_EACH (ip_address, &all_ips_v4) {
> - char *match = xasprintf("ip && ip4.dst == %s", ip_address);
> - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> - free(match);
> - }
> -
> - SSET_FOR_EACH (ip_address, &all_ips_v6) {
> - char *match = xasprintf("ip && ip6.dst == %s", ip_address);
> - ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> - 100, match, REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> - free(match);
> - }
> -
> - sset_destroy(&all_ips_v4);
> - sset_destroy(&all_ips_v6);
> -
> + * packet to conntrack for defragmentation.
> + *
> + * Send all the packets to conntrack in the ingress pipeline if the
> + * logical switch has a load balancer with VIP configured. Earlier
> + * we used to set the REGBIT_CONNTRACK_DEFRAG flag in the ingress pipeline
> + * if the IP destination matches the VIP. But this causes few issues when
> + * a logical switch has no ACLs configured with allow-related.
> + * To understand the issue, lets a take a TCP load balancer -
> + * 10.0.0.10:80=10.0.0.3:80.
> + * If a logical port - p1 with IP - 10.0.0.5 opens a TCP connection with
> + * the VIP - 10.0.0.10, then the packet in the ingress pipeline of 'p1'
> + * is sent to the p1's conntrack zone id and the packet is load balanced
> + * to the backend - 10.0.0.3. For the reply packet from the backend lport,
> + * it is not sent to the conntrack of backend lport's zone id. This is fine
> + * as long as the packet is valid. Suppose the backend lport sends an
> + * invalid TCP packet (like incorrect sequence number), the packet gets
> + * delivered to the lport 'p1' without unDNATing the packet to the
> + * VIP - 10.0.0.10. And this causes the connection to be reset by the
> + * lport p1's VIF.
> + *
> + * We can't fix this issue by adding a logical flow to drop ct.inv packets
> + * in the egress pipeline since it will drop all other connections not
> + * destined to the load balancers.
> + *
> + * To fix this issue, we send all the packets to the conntrack in the
> + * ingress pipeline if a load balancer is configured. We can now
> + * add a lflow to drop ct.inv packets.
> + */
> if (vip_configured) {
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB,
> + 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB,
> 100, "ip", REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> }
> @@ -5482,7 +5498,7 @@ static void
> build_acls(struct ovn_datapath *od, struct hmap *lflows,
> struct hmap *port_groups)
> {
> - bool has_stateful = has_stateful_acl(od);
> + bool has_stateful = (has_stateful_acl(od) || has_lb_vip(od));
>
> /* Ingress and Egress ACL Table (Priority 0): Packets are allowed by
> * default. A related rule at priority 1 is added below if there
> @@ -5746,12 +5762,15 @@ build_lb(struct ovn_datapath *od, struct hmap *lflows)
> ovn_lflow_add(lflows, od, S_SWITCH_IN_LB, 0, "1", "next;");
> ovn_lflow_add(lflows, od, S_SWITCH_OUT_LB, 0, "1", "next;");
>
> - if (od->nbs->load_balancer) {
> + if (od->nbs->n_load_balancer) {
> for (size_t i = 0; i < od->n_router_ports; i++) {
> skip_port_from_conntrack(od, od->router_ports[i],
> S_SWITCH_IN_LB, S_SWITCH_OUT_LB,
> UINT16_MAX, lflows);
> }
> + }
> +
> + if (has_lb_vip(od)) {
> /* Ingress and Egress LB Table (Priority 65534).
> *
> * Send established traffic through conntrack for just NAT. */
>
More information about the dev
mailing list