[ovs-dev] [PATCH ovn 2/2] ovn-northd: Support hairpinning for logical switch load balancing.
Numan Siddique
numans at ovn.org
Mon Jan 20 13:25:50 UTC 2020
On Thu, Jan 16, 2020 at 9:08 PM Dumitru Ceara <dceara at redhat.com> wrote:
>
> In case a VIF is trying to connect to a load balancer VIP that includes in
> its backends the VIF itself, traffic would get DNAT-ed, ct_lb(VIP), but
> when it reaches the VIF, the VIF will try to reply locally as the source IP
> is known to be local. For this kind of hairpinning to work properly, reply
> traffic must be sent back through OVN and the way to enforce that is to
> perform SNAT (VIF source IP -> VIP) on hairpinned packets.
>
> For load balancers configured on gateway logical routers we already have the
> possibility of using 'lb_force_snat_ip' but for load balancers configured
> on logical switches there's no such configuration.
>
> For this second case we take an automatic approach which determines if
> load balanced traffic needs to be hairpinned and execute the SNAT. To achieve
> this, two new stages are added to the logical switch ingress pipeline:
> - Ingress Table 11: Pre-Hairpin: which matches on load balanced traffic
> coming from VIFs that needs to be hairpinned and sets REGBIT_HAIRPIN
> (reg0[6]) to 1. If the traffic is in the direction that initiated the
> connection then 'ct_snat(VIP)' is performed, otherwise 'ct_snat' is
> used to unSNAT replies.
> - Ingress Table 12: Hairpin: which hairpins packets at L2 (swaps Ethernet
> addresses and loops traffic back on the ingress port) if REGBIT_HAIRPIN
> is 1.
>
> Also, update all references to logical switch ingress pipeline tables to use
> the correct indices.
>
> Reported-at: https://github.com/ovn-org/ovn-kubernetes/issues/817
> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
> ---
> northd/ovn-northd.8.xml | 57 ++++++++--
> northd/ovn-northd.c | 260 ++++++++++++++++++++++++++++++---------------
> tests/ovn.at | 209 ++++++++++++++++++++++++++++++++----
> utilities/ovn-trace.8.xml | 4 -
> 4 files changed, 406 insertions(+), 124 deletions(-)
Hi Dumitru,
The patch LGTM. I have a small comment below, please take a look.
Can you please add or enhance the system tests in system-ovn.at to
handle this scenario ?
Thanks
Numan
>
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 4b227ca..3e7315c 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -527,7 +527,40 @@
> </li>
> </ul>
>
> - <h3>Ingress Table 11: ARP/ND responder</h3>
> + <h3>Ingress Table 11: Pre-Hairpin</h3>
> + <ul>
> + <li>
> + For all configured load balancer backends a priority-2 flow that
> + matches on traffic that needs to be hairpinned, i.e., after load
> + balancing the destination IP matches the source IP, which sets
> + <code>reg0[6] = 1 </code> and executes <code>ct_snat(VIP)</code>
> + to force replies to these packets to come back through OVN.
> + </li>
> + <li>
> + For all configured load balancer backends a priority-1 flow that
> + matches on replies to hairpinned traffic, i.e., destination IP is VIP,
> + source IP is the backend IP and source L4 port is backend port, which
> + sets <code>reg0[6] = 1 </code> and executes <code>ct_snat;</code>.
> + </li>
> + <li>
> + A priority-0 flow that simply moves traffic to the next table.
> + </li>
> + </ul>
> +
> + <h3>Ingress Table 12: Hairpin</h3>
> + <ul>
> + <li>
> + A priority-1 flow that hairpins traffic matched by non-default
> + flows in the Pre-Hairpin table. Hairpinning is done at L2, Ethernet
> + addresses are swapped and the packets are looped back on the input
> + port.
> + </li>
> + <li>
> + A priority-0 flow that simply moves traffic to the next table.
> + </li>
> + </ul>
> +
> + <h3>Ingress Table 13: ARP/ND responder</h3>
>
> <p>
> This table implements ARP/ND responder in a logical switch for known
> @@ -772,7 +805,7 @@ output;
> </li>
> </ul>
>
> - <h3>Ingress Table 12: DHCP option processing</h3>
> + <h3>Ingress Table 14: DHCP option processing</h3>
>
> <p>
> This table adds the DHCPv4 options to a DHCPv4 packet from the
> @@ -829,11 +862,11 @@ next;
> </li>
>
> <li>
> - A priority-0 flow that matches all packets to advances to table 11.
> + A priority-0 flow that matches all packets to advances to table 15.
> </li>
> </ul>
>
> - <h3>Ingress Table 13: DHCP responses</h3>
> + <h3>Ingress Table 15: DHCP responses</h3>
>
> <p>
> This table implements DHCP responder for the DHCP replies generated by
> @@ -911,11 +944,11 @@ output;
> </li>
>
> <li>
> - A priority-0 flow that matches all packets to advances to table 12.
> + A priority-0 flow that matches all packets to advances to table 16.
> </li>
> </ul>
>
> - <h3>Ingress Table 14 DNS Lookup</h3>
> + <h3>Ingress Table 16 DNS Lookup</h3>
>
> <p>
> This table looks up and resolves the DNS names to the corresponding
> @@ -944,7 +977,7 @@ reg0[4] = dns_lookup(); next;
> </li>
> </ul>
>
> - <h3>Ingress Table 15 DNS Responses</h3>
> + <h3>Ingress Table 17 DNS Responses</h3>
>
> <p>
> This table implements DNS responder for the DNS replies generated by
> @@ -979,7 +1012,7 @@ output;
> </li>
> </ul>
>
> - <h3>Ingress table 16 External ports</h3>
> + <h3>Ingress table 18 External ports</h3>
>
> <p>
> Traffic from the <code>external</code> logical ports enter the ingress
> @@ -1007,11 +1040,11 @@ output;
> </li>
>
> <li>
> - A priority-0 flow that matches all packets to advances to table 17.
> + A priority-0 flow that matches all packets to advances to table 19.
> </li>
> </ul>
>
> - <h3>Ingress Table 17 Destination Lookup</h3>
> + <h3>Ingress Table 19 Destination Lookup</h3>
>
> <p>
> This table implements switching behavior. It contains these logical
> @@ -1221,14 +1254,14 @@ output;
> A priority 34000 logical flow is added for each logical port which
> has DHCPv4 options defined to allow the DHCPv4 reply packet and which has
> DHCPv6 options defined to allow the DHCPv6 reply packet from the
> - <code>Ingress Table 13: DHCP responses</code>.
> + <code>Ingress Table 15: DHCP responses</code>.
> </li>
>
> <li>
> A priority 34000 logical flow is added for each logical switch datapath
> configured with DNS records with the match <code>udp.dst = 53</code>
> to allow the DNS reply packet from the
> - <code>Ingress Table 15:DNS responses</code>.
> + <code>Ingress Table 17: DNS responses</code>.
> </li>
> </ul>
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 4ac9668..0a729aa 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -145,13 +145,15 @@ enum ovn_stage {
> PIPELINE_STAGE(SWITCH, IN, QOS_METER, 8, "ls_in_qos_meter") \
> PIPELINE_STAGE(SWITCH, IN, LB, 9, "ls_in_lb") \
> PIPELINE_STAGE(SWITCH, IN, STATEFUL, 10, "ls_in_stateful") \
> - PIPELINE_STAGE(SWITCH, IN, ARP_ND_RSP, 11, "ls_in_arp_rsp") \
> - PIPELINE_STAGE(SWITCH, IN, DHCP_OPTIONS, 12, "ls_in_dhcp_options") \
> - PIPELINE_STAGE(SWITCH, IN, DHCP_RESPONSE, 13, "ls_in_dhcp_response") \
> - PIPELINE_STAGE(SWITCH, IN, DNS_LOOKUP, 14, "ls_in_dns_lookup") \
> - PIPELINE_STAGE(SWITCH, IN, DNS_RESPONSE, 15, "ls_in_dns_response") \
> - PIPELINE_STAGE(SWITCH, IN, EXTERNAL_PORT, 16, "ls_in_external_port") \
> - PIPELINE_STAGE(SWITCH, IN, L2_LKUP, 17, "ls_in_l2_lkup") \
> + PIPELINE_STAGE(SWITCH, IN, PRE_HAIRPIN, 11, "ls_in_pre_hairpin") \
> + PIPELINE_STAGE(SWITCH, IN, HAIRPIN, 12, "ls_in_hairpin") \
> + PIPELINE_STAGE(SWITCH, IN, ARP_ND_RSP, 13, "ls_in_arp_rsp") \
> + PIPELINE_STAGE(SWITCH, IN, DHCP_OPTIONS, 14, "ls_in_dhcp_options") \
> + PIPELINE_STAGE(SWITCH, IN, DHCP_RESPONSE, 15, "ls_in_dhcp_response") \
> + PIPELINE_STAGE(SWITCH, IN, DNS_LOOKUP, 16, "ls_in_dns_lookup") \
> + PIPELINE_STAGE(SWITCH, IN, DNS_RESPONSE, 17, "ls_in_dns_response") \
> + PIPELINE_STAGE(SWITCH, IN, EXTERNAL_PORT, 18, "ls_in_external_port") \
> + PIPELINE_STAGE(SWITCH, IN, L2_LKUP, 19, "ls_in_l2_lkup") \
> \
> /* Logical switch egress stages. */ \
> PIPELINE_STAGE(SWITCH, OUT, PRE_LB, 0, "ls_out_pre_lb") \
> @@ -209,6 +211,7 @@ enum ovn_stage {
> #define REGBIT_DHCP_OPTS_RESULT "reg0[3]"
> #define REGBIT_DNS_LOOKUP_RESULT "reg0[4]"
> #define REGBIT_ND_RA_OPTS_RESULT "reg0[5]"
> +#define REGBIT_HAIRPIN "reg0[6]"
>
> /* Register definitions for switches and routers. */
> #define REGBIT_NAT_REDIRECT "reg9[0]"
> @@ -5323,6 +5326,133 @@ build_lb(struct ovn_datapath *od, struct hmap *lflows)
> }
>
> static void
> +build_lb_hairpin_rules(struct ovn_datapath *od, struct hmap *lflows,
> + struct lb_vip *lb_vip, const char *ip_match,
> + const char *proto)
> +{
> + /* Ingress Pre-Hairpin table.
> + * - Priority 200: SNAT load balanced traffic that needs to be hairpinned.
> + * - Priority 100: unSNAT replies to hairpinned load balanced traffic.
> + */
> + for (size_t i = 0; i < lb_vip->n_backends; i++) {
> + struct lb_vip_backend *backend = &lb_vip->backends[i];
> + struct ds action = DS_EMPTY_INITIALIZER;
> + struct ds match = DS_EMPTY_INITIALIZER;
> + struct ds proto_match = DS_EMPTY_INITIALIZER;
> +
> + /* Packets that after load balancing have equal source and
> + * destination IPs should be hairpinned. SNAT them so that the reply
> + * traffic is directed also through OVN.
> + */
> + if (lb_vip->vip_port) {
> + ds_put_format(&proto_match, " && %s && %s.dst == %"PRIu16,
> + proto, proto, backend->port);
> + }
> + ds_put_format(&match, "%s.src == %s && %s.dst == %s%s",
> + ip_match, backend->ip, ip_match, backend->ip,
> + ds_cstr(&proto_match));
> + ds_put_format(&action, REGBIT_HAIRPIN " = 1;ct_snat(%s);",
Can you please add a space after ";" just for readability ?
> + lb_vip->vip);
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_HAIRPIN, 2, ds_cstr(&match),
> + ds_cstr(&action));
> +
> + /* If the packets are replies for hairpinned traffic, UNSNAT them. */
> + ds_clear(&proto_match);
> + ds_clear(&match);
> + if (lb_vip->vip_port) {
> + ds_put_format(&proto_match, " && %s && %s.src == %"PRIu16,
> + proto, proto, backend->port);
> + }
> + ds_put_format(&match, "%s.src == %s && %s.dst == %s%s",
> + ip_match, backend->ip, ip_match, lb_vip->vip,
> + ds_cstr(&proto_match));
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_HAIRPIN, 1, ds_cstr(&match),
> + REGBIT_HAIRPIN " = 1;ct_snat;");
Same as above.
> +
> + ds_destroy(&action);
> + ds_destroy(&match);
> + ds_destroy(&proto_match);
> + }
> +}
> +
> +static void
> +build_lb_rules(struct ovn_datapath *od, struct hmap *lflows, struct ovn_lb *lb)
> +{
> + for (size_t i = 0; i < lb->n_vips; i++) {
> + struct lb_vip *lb_vip = &lb->vips[i];
> +
> + const char *ip_match = NULL;
> + if (lb_vip->addr_family == AF_INET) {
> + ip_match = "ip4";
> + } else {
> + ip_match = "ip6";
> + }
> +
> + const char *proto = NULL;
> + if (lb_vip->vip_port) {
> + if (lb->nlb->protocol && !strcmp(lb->nlb->protocol, "udp")) {
> + proto = "udp";
> + } else {
> + proto = "tcp";
> + }
> + }
> +
> + /* New connections in Ingress table. */
> + struct ds action = DS_EMPTY_INITIALIZER;
> + if (lb_vip->health_check) {
> + ds_put_cstr(&action, "ct_lb(");
> +
> + size_t n_active_backends = 0;
> + for (size_t j = 0; j < lb_vip->n_backends; j++) {
> + struct lb_vip_backend *backend = &lb_vip->backends[j];
> + bool is_up = true;
> + if (backend->health_check && backend->sbrec_monitor &&
> + backend->sbrec_monitor->status &&
> + strcmp(backend->sbrec_monitor->status, "online")) {
> + is_up = false;
> + }
> +
> + if (is_up) {
> + n_active_backends++;
> + ds_put_format(&action, "%s:%"PRIu16",",
> + backend->ip, backend->port);
> + }
> + }
> +
> + if (!n_active_backends) {
> + ds_clear(&action);
> + ds_put_cstr(&action, "drop;");
> + } else {
> + ds_chomp(&action, ',');
> + ds_put_cstr(&action, ");");
> + }
> + } else {
> + ds_put_format(&action, "ct_lb(%s);", lb_vip->backend_ips);
> + }
> +
> + struct ds match = DS_EMPTY_INITIALIZER;
> + ds_put_format(&match, "ct.new && %s.dst == %s", ip_match, lb_vip->vip);
> + if (lb_vip->vip_port) {
> + ds_put_format(&match, " && %s.dst == %d", proto, lb_vip->vip_port);
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL, 120,
> + ds_cstr(&match), ds_cstr(&action));
> + } else {
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL, 110,
> + ds_cstr(&match), ds_cstr(&action));
> + }
> +
> + ds_destroy(&match);
> + ds_destroy(&action);
> +
> + /* Also install flows that allow hairpinning of traffic (i.e., if
> + * a load balancer VIP is DNAT-ed to a backend that happens to be
> + * the source of the traffic).
> + */
> + build_lb_hairpin_rules(od, lflows, lb_vip, ip_match, proto);
> + }
> +}
> +
> +static void
> build_stateful(struct ovn_datapath *od, struct hmap *lflows, struct hmap *lbs)
> {
> /* Ingress and Egress stateful Table (Priority 0): Packets are
> @@ -5359,68 +5489,28 @@ build_stateful(struct ovn_datapath *od, struct hmap *lflows, struct hmap *lbs)
> for (int i = 0; i < od->nbs->n_load_balancer; i++) {
> struct ovn_lb *lb =
> ovn_lb_find(lbs, &od->nbs->load_balancer[i]->header_.uuid);
> - ovs_assert(lb);
> -
> - for (size_t j = 0; j < lb->n_vips; j++) {
> - struct lb_vip *lb_vip = &lb->vips[j];
> - /* New connections in Ingress table. */
> - struct ds action = DS_EMPTY_INITIALIZER;
> - if (lb_vip->health_check) {
> - ds_put_cstr(&action, "ct_lb(");
> -
> - size_t n_active_backends = 0;
> - for (size_t k = 0; k < lb_vip->n_backends; k++) {
> - struct lb_vip_backend *backend = &lb_vip->backends[k];
> - bool is_up = true;
> - if (backend->health_check && backend->sbrec_monitor &&
> - backend->sbrec_monitor->status &&
> - strcmp(backend->sbrec_monitor->status, "online")) {
> - is_up = false;
> - }
> -
> - if (is_up) {
> - n_active_backends++;
> - ds_put_format(&action, "%s:%"PRIu16",",
> - backend->ip, backend->port);
> - }
> - }
>
> - if (!n_active_backends) {
> - ds_clear(&action);
> - ds_put_cstr(&action, "drop;");
> - } else {
> - ds_chomp(&action, ',');
> - ds_put_cstr(&action, ");");
> - }
> - } else {
> - ds_put_format(&action, "ct_lb(%s);", lb_vip->backend_ips);
> - }
> + ovs_assert(lb);
> + build_lb_rules(od, lflows, lb);
> + }
>
> - struct ds match = DS_EMPTY_INITIALIZER;
> - if (lb_vip->addr_family == AF_INET) {
> - ds_put_format(&match, "ct.new && ip4.dst == %s", lb_vip->vip);
> - } else {
> - ds_put_format(&match, "ct.new && ip6.dst == %s", lb_vip->vip);
> - }
> - if (lb_vip->vip_port) {
> - if (lb->nlb->protocol && !strcmp(lb->nlb->protocol, "udp")) {
> - ds_put_format(&match, " && udp.dst == %d",
> - lb_vip->vip_port);
> - } else {
> - ds_put_format(&match, " && tcp.dst == %d",
> - lb_vip->vip_port);
> - }
> - ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL,
> - 120, ds_cstr(&match), ds_cstr(&action));
> - } else {
> - ovn_lflow_add(lflows, od, S_SWITCH_IN_STATEFUL,
> - 110, ds_cstr(&match), ds_cstr(&action));
> - }
> + /* Ingress Pre-Hairpin table (Priority 0). Packets that don't need
> + * hairpinning should continue processing.
> + */
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_HAIRPIN, 0, "1", "next;");
>
> - ds_destroy(&match);
> - ds_destroy(&action);
> - }
> - }
> + /* Ingress Hairpin table.
> + * - Priority 0: Packets that don't need hairpinning should continue
> + * processing.
> + * - Priority 1: Packets that were SNAT-ed for hairpinning should be
> + * looped back (i.e., swap ETH addresses and send back on inport).
> + */
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_HAIRPIN, 1, REGBIT_HAIRPIN " == 1",
> + "eth.dst <-> eth.src;"
> + "outport = inport;"
> + "flags.loopback = 1;"
> + "output;");
> + ovn_lflow_add(lflows, od, S_SWITCH_IN_HAIRPIN, 0, "1", "next;");
> }
>
> static void
> @@ -5526,7 +5616,7 @@ build_lrouter_groups(struct hmap *ports, struct ovs_list *lr_list)
> }
>
> /*
> - * Ingress table 17: Flows that flood self originated ARP/ND packets in the
> + * Ingress table 19: Flows that flood self originated ARP/ND packets in the
> * switching domain.
> */
> static void
> @@ -5569,7 +5659,7 @@ build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port *op,
> }
>
> /*
> - * Ingress table 17: Flows that forward ARP/ND requests only to the routers
> + * Ingress table 19: Flows that forward ARP/ND requests only to the routers
> * that own the addresses. Other ARP/ND packets are still flooded in the
> * switching domain as regular broadcast.
> */
> @@ -5618,7 +5708,7 @@ build_lswitch_rport_arp_req_flow_for_ip(struct sset *ips,
> }
>
> /*
> - * Ingress table 17: Flows that forward ARP/ND requests only to the routers
> + * Ingress table 19: Flows that forward ARP/ND requests only to the routers
> * that own the addresses.
> * Priorities:
> * - 80: self originated GARPs that need to follow regular processing.
> @@ -5744,7 +5834,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
>
> build_lswitch_input_port_sec(ports, datapaths, lflows);
>
> - /* Ingress table 11: ARP/ND responder, skip requests coming from localnet
> + /* Ingress table 13: ARP/ND responder, skip requests coming from localnet
> * and vtep ports. (priority 100); see ovn-northd.8.xml for the
> * rationale. */
> struct ovn_port *op;
> @@ -5762,7 +5852,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> }
> }
>
> - /* Ingress table 11: ARP/ND responder, reply for known IPs.
> + /* Ingress table 13: ARP/ND responder, reply for known IPs.
> * (priority 50). */
> HMAP_FOR_EACH (op, key_node, ports) {
> if (!op->nbsp) {
> @@ -5912,7 +6002,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> }
> }
>
> - /* Ingress table 11: ARP/ND responder, by default goto next.
> + /* Ingress table 13: ARP/ND responder, by default goto next.
> * (priority 0)*/
> HMAP_FOR_EACH (od, key_node, datapaths) {
> if (!od->nbs) {
> @@ -5922,7 +6012,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> ovn_lflow_add(lflows, od, S_SWITCH_IN_ARP_ND_RSP, 0, "1", "next;");
> }
>
> - /* Ingress table 11: ARP/ND responder for service monitor source ip.
> + /* Ingress table 13: ARP/ND responder for service monitor source ip.
> * (priority 110)*/
> struct ovn_lb *lb;
> HMAP_FOR_EACH (lb, hmap_node, lbs) {
> @@ -5962,8 +6052,8 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> }
>
>
> - /* Logical switch ingress table 12 and 13: DHCP options and response
> - * priority 100 flows. */
> + /* Logical switch ingress table 14 and 15: DHCP options and response
> + * priority 100 flows. */
> HMAP_FOR_EACH (op, key_node, ports) {
> if (!op->nbsp) {
> continue;
> @@ -6101,7 +6191,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> }
> }
>
> - /* Logical switch ingress table 14 and 15: DNS lookup and response
> + /* Logical switch ingress table 17 and 18: DNS lookup and response
> * priority 100 flows.
> */
> HMAP_FOR_EACH (od, key_node, datapaths) {
> @@ -6133,11 +6223,11 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> ds_destroy(&action);
> }
>
> - /* Ingress table 12 and 13: DHCP options and response, by default goto
> + /* Ingress table 14 and 15: DHCP options and response, by default goto
> * next. (priority 0).
> - * Ingress table 14 and 15: DNS lookup and response, by default goto next.
> + * Ingress table 16 and 17: DNS lookup and response, by default goto next.
> * (priority 0).
> - * Ingress table 16 - External port handling, by default goto next.
> + * Ingress table 18 - External port handling, by default goto next.
> * (priority 0). */
>
> HMAP_FOR_EACH (od, key_node, datapaths) {
> @@ -6158,7 +6248,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> continue;
> }
>
> - /* Table 16: External port. Drop ARP request for router ips from
> + /* Table 18: External port. Drop ARP request for router ips from
> * external ports on chassis not binding those ports.
> * This makes the router pipeline to be run only on the chassis
> * binding the external ports. */
> @@ -6204,7 +6294,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> }
>
> char *svc_check_match = xasprintf("eth.dst == %s", svc_monitor_mac);
> - /* Ingress table 17: Destination lookup, broadcast and multicast handling
> + /* Ingress table 19: Destination lookup, broadcast and multicast handling
> * (priority 70 - 100). */
> HMAP_FOR_EACH (od, key_node, datapaths) {
> if (!od->nbs) {
> @@ -6275,7 +6365,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> }
> free(svc_check_match);
>
> - /* Ingress table 17: Add IP multicast flows learnt from IGMP
> + /* Ingress table 19: Add IP multicast flows learnt from IGMP
> * (priority 90). */
> struct ovn_igmp_group *igmp_group;
>
> @@ -6320,14 +6410,14 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> ds_cstr(&match), ds_cstr(&actions));
> }
>
> - /* Ingress table 17: Destination lookup, unicast handling (priority 50), */
> + /* Ingress table 19: Destination lookup, unicast handling (priority 50), */
> HMAP_FOR_EACH (op, key_node, ports) {
> if (!op->nbsp || lsp_is_external(op->nbsp)) {
> continue;
> }
>
> /* For ports connected to logical routers add flows to bypass the
> - * broadcast flooding of ARP/ND requests in table 17. We direct the
> + * broadcast flooding of ARP/ND requests in table 19. We direct the
> * requests only to the router port that owns the IP address.
> */
> if (!strcmp(op->nbsp->type, "router")) {
> @@ -6447,7 +6537,7 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
> }
> }
>
> - /* Ingress table 17: Destination lookup for unknown MACs (priority 0). */
> + /* Ingress table 19: Destination lookup for unknown MACs (priority 0). */
> HMAP_FOR_EACH (od, key_node, datapaths) {
> if (!od->nbs) {
> continue;
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 411b768..b70491e 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -13045,17 +13045,17 @@ ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> AT_CHECK([ovn-sbctl dump-flows ls1 | grep "offerip = 10.0.0.6" | \
> wc -l], [0], [0
> ])
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | wc -l], [0], [0
> ])
> @@ -13086,17 +13086,17 @@ port_binding logical_port=ls1-lp_ext1`
>
> # No DHCPv4/v6 flows for the external port - ls1-lp_ext1 - 10.0.0.6 in hv1 and hv2
> # as no localnet port added to ls1 yet.
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | wc -l], [0], [0
> ])
> @@ -13118,38 +13118,38 @@ logical_port=ls1-lp_ext1`
> test "$chassis" = "$hv1_uuid"])
>
> # There should be DHCPv4/v6 OF flows for the ls1-lp_ext1 port in hv1
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | grep reg14=0x$ln_public_key | \
> wc -l], [0], [3
> ])
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | \
> grep reg14=0x$ln_public_key | wc -l], [0], [1
> ])
>
> # There should be no DHCPv4/v6 flows for ls1-lp_ext1 on hv2
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | wc -l], [0], [0
> ])
>
> # No DHCPv4/v6 flows for the external port - ls1-lp_ext2 - 10.0.0.7 in hv1 and
> # hv2 as requested-chassis option is not set.
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.07" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.07" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.07" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.07" | wc -l], [0], [0
> ])
> @@ -13391,21 +13391,21 @@ logical_port=ls1-lp_ext1`
> test "$chassis" = "$hv2_uuid"])
>
> # There should be OF flows for DHCP4/v6 for the ls1-lp_ext1 port in hv2
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | grep reg14=0x$ln_public_key | \
> wc -l], [0], [3
> ])
> -AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | \
> grep reg14=0x$ln_public_key | wc -l], [0], [1
> ])
>
> # There should be no DHCPv4/v6 flows for ls1-lp_ext1 on hv1
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep "0a.00.00.06" | wc -l], [0], [0
> ])
> -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=20 | \
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep table=22 | \
> grep controller | grep tp_src=546 | grep \
> "ae.70.00.00.00.00.00.00.00.00.00.00.00.00.00.06" | \
> grep reg14=0x$ln_public_key | wc -l], [0], [0
> @@ -14839,9 +14839,9 @@ ovn-nbctl --wait=hv sync
> ovn-sbctl dump-flows sw0 | grep ls_in_arp_rsp | grep bind_vport > lflows.txt
>
> AT_CHECK([cat lflows.txt], [0], [dnl
> - table=11(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p1" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> - table=11(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p2" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> - table=11(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p3" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> + table=13(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p1" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> + table=13(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p2" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> + table=13(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p3" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> ])
>
> ovn-sbctl dump-flows lr0 | grep lr_in_arp_resolve | grep "reg0 == 10.0.0.10" \
> @@ -15018,8 +15018,8 @@ ovn-nbctl --wait=hv set logical_switch_port sw0-vir options:virtual-ip=10.0.0.10
> ovn-sbctl dump-flows sw0 | grep ls_in_arp_rsp | grep bind_vport > lflows.txt
>
> AT_CHECK([cat lflows.txt], [0], [dnl
> - table=11(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p1" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> - table=11(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p3" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> + table=13(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p1" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> + table=13(ls_in_arp_rsp ), priority=100 , match=(inport == "sw0-p3" && ((arp.op == 1 && arp.spa == 10.0.0.10 && arp.tpa == 10.0.0.10) || (arp.op == 2 && arp.spa == 10.0.0.10))), action=(bind_vport("sw0-vir", inport); next;)
> ])
>
> ovn-nbctl --wait=hv remove logical_switch_port sw0-vir options virtual-parents
> @@ -17338,3 +17338,162 @@ OVS_WAIT_UNTIL([
>
> OVN_CLEANUP([hv1])
> AT_CLEANUP
> +
> +AT_SETUP([ovn -- Load Balancer LS hairpin])
> +ovn_start
> +
> +reset_pcap_file() {
> + local iface=$1
> + local pcap_file=$2
> + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> +options:rxq_pcap=dummy-rx.pcap
> + rm -f ${pcap_file}*.pcap
> + ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
> +options:rxq_pcap=${pcap_file}-rx.pcap
> +}
> +
> +ip_to_hex() {
> + printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +build_udp() {
> + local sport=$1 dport=$2 chksum=$3
> + local len=000a
> + echo ${sport}${dport}${len}${chksum}0000
> +}
> +
> +build_tcp_syn() {
> + local sport=$1 dport=$2 chksum=$3
> + local seq=00000001
> + local ack=00000000
> + local hlen_flags=5002
> + local win=00ff
> + local urg=0000
> + echo ${sport}${dport}${seq}${ack}${hlen_flags}${win}${chksum}${urg}
> +}
> +
> +send_ipv4_pkt() {
> + local hv=$1 inport=$2 eth_src=$3 eth_dst=$4
> + local ip_src=$5 ip_dst=$6 ip_proto=$7 ip_len=$8 ip_chksum=$9
> + local l4_payload=${10}
> + local hp_l4_payload=${11}
> + local outfile=${12}
> +
> + local ip_ttl=40
> +
> + local eth=${eth_dst}${eth_src}0800
> + local hp_eth=${eth_src}${eth_dst}0800
> + local ip=4500${ip_len}00004000${ip_ttl}${ip_proto}${ip_chksum}${ip_src}${ip_dst}
> + local hp_ip=4500${ip_len}00004000${ip_ttl}${ip_proto}${ip_chksum}${ip_dst}${ip_src}
> + local packet=${eth}${ip}${l4_payload}
> + local hp_packet=${hp_eth}${hp_ip}${hp_l4_payload}
> +
> + echo ${hp_packet} >> ${outfile}
> + as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet}
> +}
> +
> +send_ipv6_pkt() {
> + local hv=$1 inport=$2 eth_src=$3 eth_dst=$4
> + local ip_src=$5 ip_dst=$6 ip_proto=$7 ip_len=$8
> + local l4_payload=$9
> + local hp_l4_payload=${10}
> + local outfile=${11}
> +
> + local ip_ttl=40
> +
> + local eth=${eth_dst}${eth_src}86dd
> + local hp_eth=${eth_src}${eth_dst}86dd
> + local ip=60000000${ip_len}${ip_proto}${ip_ttl}${ip_src}${ip_dst}
> + local hp_ip=60000000${ip_len}${ip_proto}${ip_ttl}${ip_dst}${ip_src}
> + local packet=${eth}${ip}${l4_payload}
> + local hp_packet=${hp_eth}${hp_ip}${hp_l4_payload}
> +
> + echo ${hp_packet} >> ${outfile}
> + as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet}
> +}
> +
> +net_add n1
> +sim_add hv1
> +as hv1
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> + set interface hv1-vif1 external-ids:iface-id=lsp \
> + options:tx_pcap=hv1/vif1-tx.pcap \
> + options:rxq_pcap=hv1/vif1-rx.pcap \
> + ofport-request=1
> +
> +# One logical switch with IPv4 and IPv6 load balancers that hairpin the
> +# traffic.
> +ovn-nbctl ls-add sw
> +ovn-nbctl lsp-add sw lsp -- lsp-set-addresses lsp 00:00:00:00:00:01
> +ovn-nbctl lb-add lb-ipv4-tcp 88.88.88.88:8080 42.42.42.1:4041 tcp
> +ovn-nbctl lb-add lb-ipv4-udp 88.88.88.88:4040 42.42.42.1:2021 udp
> +ovn-nbctl lb-add lb-ipv6-tcp [[8800::0088]]:8080 [[4200::1]]:4041 tcp
> +ovn-nbctl lb-add lb-ipv6-udp [[8800::0088]]:4040 [[4200::1]]:2021 udp
> +ovn-nbctl ls-lb-add sw lb-ipv4-tcp
> +ovn-nbctl ls-lb-add sw lb-ipv4-udp
> +ovn-nbctl ls-lb-add sw lb-ipv6-tcp
> +ovn-nbctl ls-lb-add sw lb-ipv6-udp
> +
> +ovn-nbctl lr-add rtr
> +ovn-nbctl lrp-add rtr rtr-sw 00:00:00:00:01:00 42.42.42.254/24 4200::00ff/64
> +ovn-nbctl lsp-add sw sw-rtr \
> + -- lsp-set-type sw-rtr router \
> + -- lsp-set-addresses sw-rtr 00:00:00:00:01:00 \
> + -- lsp-set-options sw-rtr router-port=rtr-sw
> +
> +ovn-nbctl --wait=hv sync
> +
> +# Inject IPv4 TCP packet from lsp.
> +> expected
> +tcp_payload=$(build_tcp_syn 84d0 1f90 05a7)
> +hp_tcp_payload=$(build_tcp_syn 84d0 0fc9 156e)
> +send_ipv4_pkt hv1 hv1-vif1 000000000001 000000000100 \
> + $(ip_to_hex 42 42 42 1) $(ip_to_hex 88 88 88 88) \
> + 06 0028 35f5 \
> + ${tcp_payload} ${hp_tcp_payload} \
> + expected
> +
> +# Check that traffic is hairpinned.
> +OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [expected])
> +
> +# Inject IPv4 UDP packet from lsp.
> +udp_payload=$(build_udp 84d0 0fc8 6666)
> +hp_udp_payload=$(build_udp 84d0 07e5 6e49)
> +send_ipv4_pkt hv1 hv1-vif1 000000000001 000000000100 \
> + $(ip_to_hex 42 42 42 1) $(ip_to_hex 88 88 88 88) \
> + 11 001e 35f4 \
> + ${udp_payload} ${hp_udp_payload} \
> + expected
> +
> +# Check that traffic is hairpinned.
> +OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [expected])
> +
> +# Inject IPv6 TCP packet from lsp.
> +tcp_payload=$(build_tcp_syn 84d0 1f90 3ff9)
> +hp_tcp_payload=$(build_tcp_syn 84d0 0fc9 4fc0)
> +send_ipv6_pkt hv1 hv1-vif1 000000000001 000000000100 \
> + 42000000000000000000000000000001 88000000000000000000000000000088 \
> + 06 0014 \
> + ${tcp_payload} ${hp_tcp_payload} \
> + expected
> +
> +# Check that traffic is hairpinned.
> +OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [expected])
> +
> +# Inject IPv6 UDP packet from lsp.
> +udp_payload=$(build_udp 84d0 0fc8 a0b8)
> +hp_udp_payload=$(build_udp 84d0 07e5 a89b)
> +send_ipv6_pkt hv1 hv1-vif1 000000000001 000000000100 \
> + 42000000000000000000000000000001 88000000000000000000000000000088 \
> + 11 000a \
> + ${udp_payload} ${hp_udp_payload} \
> + expected
> +
> +# Check that traffic is hairpinned.
> +OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [expected])
> +
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> diff --git a/utilities/ovn-trace.8.xml b/utilities/ovn-trace.8.xml
> index 01e7411..db25a27 100644
> --- a/utilities/ovn-trace.8.xml
> +++ b/utilities/ovn-trace.8.xml
> @@ -150,7 +150,7 @@
> packet matches a logical flow in table 0 (aka
> <code>ls_in_port_sec_l2</code>) with priority 50 and executes
> <code>next(1);</code> to pass to table 1. Tables 1 through 11 are
> - trivial and omitted. In table 12 (aka <code>ls_in_l2_lkup</code>), the
> + trivial and omitted. In table 19 (aka <code>ls_in_l2_lkup</code>), the
> packet matches a flow with priority 50 based on its Ethernet destination
> address and the flow's actions output the packet to the
> <code>lrp11-attachement</code> logical port.
> @@ -161,7 +161,7 @@
> ---------------------------------
> 0. ls_in_port_sec_l2: inport == "lp111", priority 50
> next(1);
> - 12. ls_in_l2_lkup: eth.dst == 00:00:00:00:ff:11, priority 50
> + 19. ls_in_l2_lkup: eth.dst == 00:00:00:00:ff:11, priority 50
> outport = "lrp11-attachment";
> output;
> </pre>
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
More information about the dev
mailing list