[ovs-dev] [PATCH v7 2/2] OVN: Enable N-S Traffic, Vlan backed DVR
Numan Siddique
nusiddiq at redhat.com
Fri May 10 12:44:06 UTC 2019
On Thu, May 9, 2019 at 4:29 AM Ankur Sharma <ankur.sharma at nutanix.com>
wrote:
> Background:
> [1]
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353066.html
> [2]
> https://docs.google.com/document/d/1uoQH478wM1OZ16HrxzbOUvk5LvFnfNEWbkPT6Zmm9OU/edit?usp=sharing
>
> This Series:
> Layer 2, Layer 3 E-W and Layer 3 N-S (NO NAT) changes for vlan
> backed distributed logical router.
>
> This patch:
> For North-South traffic, we need a chassis which will respond to
> ARP requests for router port coming from outside. For this purpose,
> we will reply upon gateway-chassis construct in OVN, on a logical
> router port, we will associate one or more chassis as gateway chassis.
>
> One of these chassis would be active at a point and will become
> entry point to traffic, bound for end points behind logical router
> coming from outside network (North to South).
>
> This patch make some enhancements to gateway chassis implementation
> to manage above used case.
>
> A.
> Do not replace router port mac with chassis mac on gateway
> chassis.
> This is done, because:
> i. Chassisredirect port is NOT a distributed port, hence
> we need not replace its mac address
> (which same as router port mac).
>
> ii. ARP cache will be consistent everywhere, i.e just like
> endpoints on OVN chassis will see configured router port
> mac as resolved mac for router port ip, outside endpoints
> will see that as well.
>
> iii. For implementing Network Address Translation. Although
> not a part of this series. But, follow up series would
> be having this feature and approach would rely upon
> sending packets to redirect chassis using chassis redirect
> router port mac as dest mac.
>
> B.
> Advertise router port GARP on gateway chassis.
> This is needed, especially if a failover happens and
> chassisredirect port moves to a new gateway chassis.
> Otherwise, there would be packet drops till outside
> router ARPs for router port ip again.
>
> Intention of this GARP is to update top of the rack (TOR)
> to direct router port mac to new hypervisor.
>
> Hence, we could have done the same using RARP as well, but
> because ovn-controller has implementation for GARP already,
> hence it did not look like worthy to add a RARP implementation
> just for this.
>
> C.
> For South to North traffic, we need not pass through gateway
> chassis, if there is no address transalation needed.
>
> For overlay networks, NATing is a must to talk to outside networks.
> However, for vlan backed networks, NATing is not a must, and hence
> in the absence of NATing configuration we need redirect the packet
> to gateway chassis.
>
> Signed-off-by: Ankur Sharma <ankur.sharma at nutanix.com>
> ---
>
Hi Ankur,
I am little confused with the approach taken here. I ran the tests added in
this patch.
I see below logical flows in the router pipeline
***
table=1 (lr_in_ip_input ), priority=90 , match=(inport ==
"router-to-ls1" && arp.spa == 192.168.1.0/24 && arp.tpa == 192.168.1.3 &&
arp.op == 1), action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src;
eth.src = 00:00:01:01:02:03; arp.op = 2; /* ARP reply */ arp.tha = arp.sha;
arp.sha = 00:00:01:01:02:03; arp.tpa = arp.spa; arp.spa = 192.168.1.3;
outport = "router-to-ls1"; flags.loopback = 1; output;)
table=1 (lr_in_ip_input ), priority=90 , match=(inport ==
"router-to-ls1" && nd_ns && ip6.dst == {fe80::200:1ff:fe01:203,
ff02::1:ff01:203} && nd.target == fe80::200:1ff:fe01:203 &&
is_chassis_resident("cr-router-to-ls1")), action=(put_nd(inport, ip6.src,
nd.sll); nd_na_router { eth.src = 00:00:01:01:02:03; ip6.src =
fe80::200:1ff:fe01:203; nd.target = fe80::200:1ff:fe01:203; nd.tll =
00:00:01:01:02:03; outport = inport; flags.loopback = 1; output; };)
table=1 (lr_in_ip_input ), priority=90 , match=(inport ==
"router-to-ls2" && arp.spa == 192.168.2.0/24 && arp.tpa == 192.168.2.3 &&
arp.op == 1), action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src;
eth.src = 00:00:01:01:02:05; arp.op = 2; /* ARP reply */ arp.tha = arp.sha;
arp.sha = 00:00:01:01:02:05; arp.tpa = arp.spa; arp.spa = 192.168.2.3;
outport = "router-to-ls2"; flags.loopback = 1; output;)
table=1 (lr_in_ip_input ), priority=90 , match=(inport ==
"router-to-ls2" && nd_ns && ip6.dst == {fe80::200:1ff:fe01:205,
ff02::1:ff01:205} && nd.target == fe80::200:1ff:fe01:205 &&
is_chassis_resident("cr-router-to-ls2")), action=(put_nd(inport, ip6.src,
nd.sll); nd_na_router { eth.src = 00:00:01:01:02:05; ip6.src =
fe80::200:1ff:fe01:205; nd.target = fe80::200:1ff:fe01:205; nd.tll =
00:00:01:01:02:05; outport = inport; flags.loopback = 1; output; };)
****
But if I inspect the port_binding rows in the Southbound DB I don't see
"cr-router-to-ls1" and "cr-router-to-ls2".
I don't understand what's the purpose of these logical flows.
>From the patch I understand that you are trying to reply for the ARP
requests to the logical router ports on the gateway-chassis (where chassis
redirect port of the distributed
router port resides).
I think there is no need to add code to handle that in pinctrl.c. We
already have the framework to do that.
If you see this patch which I proposed -
https://patchwork.ozlabs.org/patch/1093738/ - It sends GARPs for the
logical router ports connected to the localnet switches
on the gateway chassis if the option - reside-on-redirect-chassis is set. I
suggest to explore in that direction to handle the GARPs.
I think you can do something similar for the type - bridged networks.
The only advantage I see in this patch is that the S/N traffic is not
centralized but N/S traffic is centralized.
But what about the case where S/N traffic can't be distributed i.e if the
bridged logical switch which provides external connectivity uses a
different ovn-bridge-mappings.
(i.e the network_name of the localnet port is different).
Thanks
Numan
> ovn/controller/physical.c | 24 +++-
> ovn/controller/pinctrl.c | 205 ++++++++++++++++++++++++++----
> ovn/controller/pinctrl.h | 6 +
> ovn/lib/ovn-util.c | 31 +++++
> ovn/lib/ovn-util.h | 6 +
> ovn/northd/ovn-northd.c | 43 +++++--
> ovn/ovn-architecture.7.xml | 87 ++++++++++++-
> tests/ovn.at | 307
> ++++++++++++++++++++++++++++++++++++++++++++-
> 8 files changed, 665 insertions(+), 44 deletions(-)
>
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index d689e89..2069de1 100644
> --- a/ovn/controller/physical.c
> +++ b/ovn/controller/physical.c
> @@ -21,6 +21,7 @@
> #include "lflow.h"
> #include "lport.h"
> #include "chassis.h"
> +#include "pinctrl.h"
> #include "lib/bundle.h"
> #include "openvswitch/poll-loop.h"
> #include "lib/uuid.h"
> @@ -235,9 +236,12 @@ get_zone_ids(const struct sbrec_port_binding *binding,
> }
>
> static void
> -put_replace_router_port_mac_flows(const struct
> +put_replace_router_port_mac_flows(struct ovsdb_idl_index
> + *sbrec_port_binding_by_name,
> + const struct
> sbrec_port_binding *localnet_port,
> const struct sbrec_chassis *chassis,
> + const struct sset *active_tunnels,
> const struct hmap *local_datapaths,
> struct ofpbuf *ofpacts_p,
> ofp_port_t ofport,
> @@ -278,8 +282,21 @@ put_replace_router_port_mac_flows(const struct
> char *err_str = NULL;
> struct match match;
> struct ofpact_mac *replace_mac;
> + char *cr_peer_name = xasprintf("cr-%s",
> rport_binding->logical_port);
>
> - /* Table 65, priority 150.
> +
> + if (pinctrl_is_chassis_resident(sbrec_port_binding_by_name,
> + chassis, active_tunnels,
> + cr_peer_name)) {
> + /* If a router port's chassisredirect port is
> + * resident on this chassis, then we need not do mac replace.
> */
> + free(cr_peer_name);
> + continue;
> + }
> +
> + free(cr_peer_name);
> +
> + /* Table 65, priority 150.
> * =======================
> *
> * Implements output to localnet port.
> @@ -792,7 +809,8 @@ consider_port_binding(struct ovsdb_idl_index
> *sbrec_port_binding_by_name,
> &match, ofpacts_p);
>
> if (!strcmp(binding->type, "localnet")) {
> - put_replace_router_port_mac_flows(binding, chassis,
> + put_replace_router_port_mac_flows(sbrec_port_binding_by_name,
> + binding, chassis,
> active_tunnels,
> local_datapaths, ofpacts_p,
> ofport, flow_table);
> }
> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
> index 2ae79cf..d8da904 100644
> --- a/ovn/controller/pinctrl.c
> +++ b/ovn/controller/pinctrl.c
> @@ -221,6 +221,8 @@ static bool may_inject_pkts(void);
> COVERAGE_DEFINE(pinctrl_drop_put_mac_binding);
> COVERAGE_DEFINE(pinctrl_drop_buffered_packets_map);
>
> +#define GARP_DEF_REPEAT_INTERVAL_MS (3 * 60 * 1000) /* 3 minutes */
> +
> void
> pinctrl_init(void)
> {
> @@ -237,6 +239,25 @@ pinctrl_init(void)
> &pinctrl);
> }
>
> +bool
> +pinctrl_is_chassis_resident(struct ovsdb_idl_index
> *sbrec_port_binding_by_name,
> + const struct sbrec_chassis *chassis,
> + const struct sset *active_tunnels,
> + const char *port_name)
> +{
> + const struct sbrec_port_binding *pb
> + = lport_lookup_by_name(sbrec_port_binding_by_name, port_name);
> + if (!pb || !pb->chassis) {
> + return false;
> + }
> + if (strcmp(pb->type, "chassisredirect")) {
> + return pb->chassis == chassis;
> + } else {
> + return ha_chassis_group_is_active(pb->ha_chassis_group,
> + active_tunnels, chassis);
> + }
> +}
> +
> static ovs_be32
> queue_msg(struct rconn *swconn, struct ofpbuf *msg)
> {
> @@ -2525,6 +2546,8 @@ struct garp_data {
> int backoff; /* Backoff for the next announcement. */
> uint32_t dp_key; /* Datapath used to output this GARP. */
> uint32_t port_key; /* Port to inject the GARP into. */
> + bool is_repeat; /* Send GARPs continously */
> + long long int repeat_interval; /* Interval between GARP bursts in ms
> */
> };
>
> /* Contains GARPs to be sent. Protected by pinctrl_mutex*/
> @@ -2545,7 +2568,8 @@ destroy_send_garps(void)
> /* Runs with in the main ovn-controller thread context. */
> static void
> add_garp(const char *name, const struct eth_addr ea, ovs_be32 ip,
> - uint32_t dp_key, uint32_t port_key)
> + uint32_t dp_key, uint32_t port_key, bool is_repeat,
> + long long int repeat_interval)
> {
> struct garp_data *garp = xmalloc(sizeof *garp);
> garp->ea = ea;
> @@ -2554,6 +2578,8 @@ add_garp(const char *name, const struct eth_addr ea,
> ovs_be32 ip,
> garp->backoff = 1;
> garp->dp_key = dp_key;
> garp->port_key = port_key;
> + garp->is_repeat = is_repeat;
> + garp->repeat_interval = repeat_interval;
> shash_add(&send_garp_data, name, garp);
>
> /* Notify pinctrl_handler so that it can wakeup and process
> @@ -2563,7 +2589,8 @@ add_garp(const char *name, const struct eth_addr ea,
> ovs_be32 ip,
>
> /* Add or update a vif for which GARPs need to be announced. */
> static void
> -send_garp_update(const struct sbrec_port_binding *binding_rec,
> +send_garp_update(struct ovsdb_idl_index *sbrec_port_binding_by_name,
> + const struct sbrec_port_binding *binding_rec,
> struct shash *nat_addresses)
> {
> volatile struct garp_data *garp = NULL;
> @@ -2588,7 +2615,7 @@ send_garp_update(const struct sbrec_port_binding
> *binding_rec,
> add_garp(name, laddrs->ea,
> laddrs->ipv4_addrs[i].addr,
> binding_rec->datapath->tunnel_key,
> - binding_rec->tunnel_key);
> + binding_rec->tunnel_key, false, 0);
> }
> free(name);
> }
> @@ -2598,6 +2625,64 @@ send_garp_update(const struct sbrec_port_binding
> *binding_rec,
> return;
> }
>
> + /* Update GARPs for local chassisredirect port, if the peer
> + * layer 2 switch is of type vlan.
> + */
> + if (!strcmp(binding_rec->type, "chassisredirect")) {
> + struct eth_addr mac;
> + ovs_be32 ip, mask;
> + uint32_t dp_key = 0;
> + uint32_t port_key = 0;
> + const struct sbrec_port_binding *peer_port = NULL;
> + const struct sbrec_port_binding *distributed_port = NULL;
> +
> + if (!ovn_sbrec_get_port_binding_ip_mac(binding_rec, &mac,
> + &ip, &mask)) {
> + /* Router Port binding without ip and mac configured. */
> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> + VLOG_WARN_RL(&rl, "cannot send garp, router port binding: %s,
> "
> + "does not have proper ip,mac values: %s",
> + binding_rec->logical_port, *binding_rec->mac);
> + return;
> + }
> +
> + const char *lrp_name = smap_get(&binding_rec->options,
> + "distributed-port");
> + ovs_assert(lrp_name);
> +
> + distributed_port =
> lport_lookup_by_name(sbrec_port_binding_by_name,
> + lrp_name);
> + ovs_assert(distributed_port);
> +
> + const char *peer_name = smap_get(&distributed_port->options,
> "peer");
> + ovs_assert(peer_name);
> +
> + peer_port = lport_lookup_by_name(sbrec_port_binding_by_name,
> + peer_name);
> + ovs_assert(peer_port);
> +
> + const char *network_type =
> smap_get(&peer_port->datapath->external_ids,
> + "network-type");
> +
> + /* Advertise GARP only of logical switch is of type bridged. */
> + if (!network_type || strcmp(network_type, "bridged")) {
> + return;
> + }
> +
> + dp_key = peer_port->datapath->tunnel_key;
> + port_key = peer_port->tunnel_key;
> +
> + garp = shash_find_data(&send_garp_data,
> binding_rec->logical_port);
> + if (garp) {
> + garp->dp_key = dp_key;
> + garp->port_key = port_key;
> + } else {
> + add_garp(binding_rec->logical_port, mac, ip,
> + dp_key, port_key, true, GARP_DEF_REPEAT_INTERVAL_MS);
> + }
> + return;
> + }
> +
> /* Update GARP for vif if it exists. */
> garp = shash_find_data(&send_garp_data, binding_rec->logical_port);
> if (garp) {
> @@ -2617,7 +2702,8 @@ send_garp_update(const struct sbrec_port_binding
> *binding_rec,
>
> add_garp(binding_rec->logical_port,
> laddrs.ea, laddrs.ipv4_addrs[0].addr,
> - binding_rec->datapath->tunnel_key,
> binding_rec->tunnel_key);
> + binding_rec->datapath->tunnel_key,
> binding_rec->tunnel_key,
> + false, 0);
>
> destroy_lport_addresses(&laddrs);
> break;
> @@ -2679,7 +2765,12 @@ send_garp(struct rconn *swconn, struct garp_data
> *garp,
> garp->backoff *= 2;
> garp->announce_time = current_time + garp->backoff * 1000;
> } else {
> - garp->announce_time = LLONG_MAX;
> + if (garp->is_repeat) {
> + garp->backoff = 1;
> + garp->announce_time = current_time + garp->repeat_interval;
> + } else {
> + garp->announce_time = LLONG_MAX;
> + }
> }
> return garp->announce_time;
> }
> @@ -2763,25 +2854,6 @@ get_localnet_vifs_l3gwports(
> sbrec_port_binding_index_destroy_row(target);
> }
>
> -static bool
> -pinctrl_is_chassis_resident(struct ovsdb_idl_index
> *sbrec_port_binding_by_name,
> - const struct sbrec_chassis *chassis,
> - const struct sset *active_tunnels,
> - const char *port_name)
> -{
> - const struct sbrec_port_binding *pb
> - = lport_lookup_by_name(sbrec_port_binding_by_name, port_name);
> - if (!pb || !pb->chassis) {
> - return false;
> - }
> - if (strcmp(pb->type, "chassisredirect")) {
> - return pb->chassis == chassis;
> - } else {
> - return ha_chassis_group_is_active(pb->ha_chassis_group,
> - active_tunnels, chassis);
> - }
> -}
> -
> /* Extracts the mac, IPv4 and IPv6 addresses, and logical port from
> * 'addresses' which should be of the format 'MAC [IP1 IP2 ..]
> * [is_chassis_resident("LPORT_NAME")]', where IPn should be a valid IPv4
> @@ -2923,6 +2995,67 @@ get_nat_addresses_and_keys(struct ovsdb_idl_index
> *sbrec_port_binding_by_name,
> }
>
> static void
> +get_local_cr_ports(struct ovsdb_idl_index *sbrec_port_binding_by_name,
> + struct sset *local_cr_ports,
> + struct sset *local_l3gw_ports,
> + const struct sbrec_chassis *chassis,
> + const struct sset *active_tunnels)
> +{
> + const char *gw_port;
> + SSET_FOR_EACH (gw_port, local_l3gw_ports) {
> + const struct sbrec_port_binding *binding_rec;
> +
> + binding_rec = lport_lookup_by_name(sbrec_port_binding_by_name,
> + gw_port);
> + if (!binding_rec) {
> + continue;
> + }
> +
> + /* For the patch port we will add send garp for peer's ip and
> mac. */
> + if (!strcmp(binding_rec->type, "patch")) {
> + const struct sbrec_port_binding *cr_port = NULL;
> +
> + bool is_cr_resident;
> + struct eth_addr mac;
> + ovs_be32 ip, mask;
> +
> + const char *peer_name = smap_get(&binding_rec->options,
> "peer");
> + ovs_assert(peer_name);
> +
> + char *cr_peer_name = xasprintf("cr-%s", peer_name);
> + cr_port = lport_lookup_by_name(sbrec_port_binding_by_name,
> + cr_peer_name);
> + free(cr_peer_name);
> +
> + if (!cr_port) {
> + continue;
> + }
> +
> + is_cr_resident = pinctrl_is_chassis_resident
> + (sbrec_port_binding_by_name,
> + chassis,
> + active_tunnels,
> + cr_port->logical_port);
> + if (!is_cr_resident) {
> + continue;
> + }
> +
> + if (!ovn_sbrec_get_port_binding_ip_mac(cr_port, &mac, &ip,
> + &mask)) {
> + /* Router Port binding without ip and mac configured. */
> + static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(1, 1);
> + VLOG_WARN_RL(&rl, "cannot send garp, router port binding:
> %s, "
> + "does not have proper ip,mac values: %s",
> + cr_port->logical_port, *cr_port->mac);
> + return;
> + }
> +
> + sset_add(local_cr_ports, cr_port->logical_port);
> + }
> + }
> +}
> +
> +static void
> send_garp_wait(long long int send_garp_time)
> {
> /* Set the poll timer for next garp only if there is garp data to
> @@ -2967,6 +3100,8 @@ send_garp_prepare(struct ovsdb_idl_index
> *sbrec_port_binding_by_datapath,
> {
> struct sset localnet_vifs = SSET_INITIALIZER(&localnet_vifs);
> struct sset local_l3gw_ports = SSET_INITIALIZER(&local_l3gw_ports);
> + struct sset local_cr_ports = SSET_INITIALIZER(&local_cr_ports);
> +
> struct sset nat_ip_keys = SSET_INITIALIZER(&nat_ip_keys);
> struct shash nat_addresses;
>
> @@ -2981,11 +3116,17 @@ send_garp_prepare(struct ovsdb_idl_index
> *sbrec_port_binding_by_datapath,
> &nat_ip_keys, &local_l3gw_ports,
> chassis, active_tunnels,
> &nat_addresses);
> +
> + get_local_cr_ports(sbrec_port_binding_by_name,
> + &local_cr_ports, &local_l3gw_ports,
> + chassis, active_tunnels);
> +
> /* For deleted ports and deleted nat ips, remove from send_garp_data.
> */
> struct shash_node *iter, *next;
> SHASH_FOR_EACH_SAFE (iter, next, &send_garp_data) {
> if (!sset_contains(&localnet_vifs, iter->name) &&
> - !sset_contains(&nat_ip_keys, iter->name)) {
> + !sset_contains(&nat_ip_keys, iter->name) &&
> + !sset_contains(&local_cr_ports, iter->name)) {
> send_garp_delete(iter->name);
> }
> }
> @@ -2996,7 +3137,7 @@ send_garp_prepare(struct ovsdb_idl_index
> *sbrec_port_binding_by_datapath,
> const struct sbrec_port_binding *pb = lport_lookup_by_name(
> sbrec_port_binding_by_name, iface_id);
> if (pb) {
> - send_garp_update(pb, &nat_addresses);
> + send_garp_update(sbrec_port_binding_by_name, pb,
> &nat_addresses);
> }
> }
>
> @@ -3006,7 +3147,17 @@ send_garp_prepare(struct ovsdb_idl_index
> *sbrec_port_binding_by_datapath,
> const struct sbrec_port_binding *pb
> = lport_lookup_by_name(sbrec_port_binding_by_name, gw_port);
> if (pb) {
> - send_garp_update(pb, &nat_addresses);
> + send_garp_update(sbrec_port_binding_by_name, pb,
> &nat_addresses);
> + }
> + }
> +
> + /* Update send_garp_data for chassisredirect router ports. */
> + const char *cr_port;
> + SSET_FOR_EACH (cr_port, &local_cr_ports) {
> + const struct sbrec_port_binding *pb
> + = lport_lookup_by_name(sbrec_port_binding_by_name, cr_port);
> + if (pb) {
> + send_garp_update(sbrec_port_binding_by_name, pb,
> &nat_addresses);
> }
> }
>
> diff --git a/ovn/controller/pinctrl.h b/ovn/controller/pinctrl.h
> index f61d705..92f704e 100644
> --- a/ovn/controller/pinctrl.h
> +++ b/ovn/controller/pinctrl.h
> @@ -44,4 +44,10 @@ void pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> void pinctrl_wait(struct ovsdb_idl_txn *ovnsb_idl_txn);
> void pinctrl_destroy(void);
>
> +bool
> +pinctrl_is_chassis_resident(struct ovsdb_idl_index
> *sbrec_port_binding_by_name,
> + const struct sbrec_chassis *chassis,
> + const struct sset *active_tunnels,
> + const char *port_name);
> +
> #endif /* ovn/pinctrl.h */
> diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c
> index 0f07d80..3d0ad8e 100644
> --- a/ovn/lib/ovn-util.c
> +++ b/ovn/lib/ovn-util.c
> @@ -16,6 +16,7 @@
> #include "ovn-util.h"
> #include "dirs.h"
> #include "openvswitch/vlog.h"
> +#include "openvswitch/ofp-parse.h"
> #include "ovn/lib/ovn-nb-idl.h"
> #include "ovn/lib/ovn-sb-idl.h"
>
> @@ -371,3 +372,33 @@ ovn_logical_flow_hash(const struct uuid
> *logical_datapath,
> hash = hash_string(match, hash);
> return hash_string(actions, hash);
> }
> +
> +/* Extracts the mac, ip and mask for a sbrec_port_binding.
> + *
> + * Expects following format:
> + * "MAC_ADDRESS IP/MASK"
> + *
> + * Return true if MAC, IP and MASK are found, false otherwise.
> + */
> +bool
> +ovn_sbrec_get_port_binding_ip_mac(const struct sbrec_port_binding
> *binding,
> + struct eth_addr *mac,
> + ovs_be32 *ip, ovs_be32 *mask)
> +{
> + char *err_str = NULL;
> +
> + err_str = str_to_mac(binding->mac[0], mac);
> + if (err_str) {
> + free(err_str);
> + return false;
> + }
> +
> + err_str = ip_parse_masked(binding->mac[0] + ETH_ADDR_STRLEN + 1,
> + ip, mask);
> + if (err_str) {
> + free(err_str);
> + return false;
> + }
> +
> + return true;
> +}
> diff --git a/ovn/lib/ovn-util.h b/ovn/lib/ovn-util.h
> index 6d5e1df..c01595a 100644
> --- a/ovn/lib/ovn-util.h
> +++ b/ovn/lib/ovn-util.h
> @@ -19,6 +19,7 @@
> #include "lib/packets.h"
>
> struct nbrec_logical_router_port;
> +struct sbrec_port_binding;
> struct sbrec_logical_flow;
> struct uuid;
>
> @@ -81,4 +82,9 @@ uint32_t ovn_logical_flow_hash(const struct uuid
> *logical_datapath,
> uint16_t priority,
> const char *match, const char *actions);
>
> +bool
> +ovn_sbrec_get_port_binding_ip_mac(const struct sbrec_port_binding
> *binding,
> + struct eth_addr *mac, ovs_be32 *ip,
> + ovs_be32 *mask);
> +
> #endif
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 74d3692..2b30bee 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -6264,6 +6264,20 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
> * from different chassis. */
> ds_put_format(&match, " && is_chassis_resident(%s)",
> op->od->l3redirect_port->json_key);
> + } else if (op->peer &&
> + op->peer->od->network_type == DP_NETWORK_BRIDGED) {
> +
> + /* For a vlan backed router port, we will always have the
> + * is_chassis_resident check. This is because there could
> be
> + * vm/server on vlan network, but not on OVN chassis and
> could
> + * end up arping for router port ip.
> + *
> + * This check works on the assumption that for OVN
> chassis VMs,
> + * logical switch ARP responder will respond to ARP
> requests
> + * for router port IP.
> + */
> + ds_put_format(&match, " &&
> is_chassis_resident(\"cr-%s\")",
> + op->key);
> }
>
> ds_clear(&actions);
> @@ -7365,18 +7379,23 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
> ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 300,
> REGBIT_DISTRIBUTED_NAT" == 1", "next;");
>
> - /* For traffic with outport == l3dgw_port, if the
> - * packet did not match any higher priority redirect
> - * rule, then the traffic is redirected to the central
> - * instance of the l3dgw_port. */
> - ds_clear(&match);
> - ds_put_format(&match, "outport == %s",
> - od->l3dgw_port->json_key);
> - ds_clear(&actions);
> - ds_put_format(&actions, "outport = %s; next;",
> - od->l3redirect_port->json_key);
> - ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 50,
> - ds_cstr(&match), ds_cstr(&actions));
> + /* For VLAN backed networks, default match will not redirect
> to
> + * chassis redirect port. */
> + if (od->l3dgw_port->peer &&
> + od->l3dgw_port->peer->od->network_type ==
> DP_NETWORK_OVERLAY) {
> + /* For traffic with outport == l3dgw_port, if the
> + * packet did not match any higher priority redirect
> + * rule, then the traffic is redirected to the central
> + * instance of the l3dgw_port. */
> + ds_clear(&match);
> + ds_put_format(&match, "outport == %s",
> + od->l3dgw_port->json_key);
> + ds_clear(&actions);
> + ds_put_format(&actions, "outport = %s; next;",
> + od->l3redirect_port->json_key);
> + ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 50,
> + ds_cstr(&match), ds_cstr(&actions));
> + }
>
> /* If the Ethernet destination has not been resolved,
> * redirect to the central instance of the l3dgw_port.
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index 6275db1..6df711e 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -1441,7 +1441,7 @@
> </li>
> </ol>
>
> - <h3>External traffic</h3>
> + <h3>External traffic (NAT)</h3>
>
> <p>
> The following happens when a VM sends an external traffic (which
> requires
> @@ -1607,6 +1607,91 @@
> </li>
> </ol>
>
> + <h3>External traffic (NO NAT)</h3>
> + <p>
> + The following happens when a VM sends an external traffic (i.e to non
> + logical router connected network), but there is not need for NATing.
> + </p>
> +
> + <p>
> + Since, there is no NATing required, hence we need not redirect the
> packet
> + to a gateway chassis. As a result, this packet flow is same as
> East-West.
> + In order to ensure that OVN will not redirect the packet over a tunnel
> + to gateway-chassis, "network_type" of destination localnet logical
> switch,
> + should be set as "bridged". A "bridged" logical switch ensures that
> there
> + is no tunnel encapsulation done while forwarding the packet on it.
> + Please refer to <code>ovn-nb</code>(5) for more details.
> + </p>
> +
> + <ol>
> + <li>
> + It first enters the ingress pipeline, and then egress pipeline of
> the
> + source localnet logical switch datapath. It then enters the ingress
> + pipeline of the logical router datapath via the logical router port
> in
> + the source chassis.
> + </li>
> +
> + <li>
> + Routing decision is taken. Since, destination network is NOT
> directly
> + connected to logial router, hence a static route is expected, which
> will
> + provide next hop ip.
> + </li>
> +
> + <li>
> + From the router datapath, packet enters the ingress pipeline and
> then
> + egress pipeline of the destination localnet logical switch datapath
> + (it is of type "bridged" and this is where the next hop is present)
> + and goes out of the integration bridge to the provider bridge (
> + belonging to the destination logical switch) via the localnet port.
> + Same as East-West, source mac will replaced with chassis mac.
> + </li>
> + </ol>
> +
> + <p>
> + The following happens for the reverse external traffic.
> + </p>
> +
> + <ol>
> + <li>
> + The gateway chassis receives the packet from the localnet port of
> + the logical switch (bridged type) which provides external
> connectivity.
> + The packet then enters the ingress pipeline and then egress
> pipeline of
> + the localnet logical switch (which provides external connectivity).
> + The packet then enters the ingress pipeline of the logical router
> + datapath.
> + </li>
> +
> + <li>
> + Routing decision is taken and logical switch of destination VM is
> + identified.
> + </li>
> +
> + <li>
> + The packet then enters the ingress pipeline and then egress
> + pipeline of VM's localnet logical switch. Since the source VM
> + doesn't reside in the gateway chassis, the packet is sent out via
> the
> + localnet port of the VM's logical switch. Source mac of this packet
> + will be replaced with chassis unique mac.
> + </li>
> +
> + <li>
> + VM's chassis receives the packet via the localnet port and
> + sends it to the integration bridge. The packet enters the
> + ingress pipeline and then egress pipeline of the localnet
> + logical switch and finally gets delivered to the VM port.
> + </li>
> + </ol>
> +
> + <p>
> + One thing to note here is that, while VM to External traffic did not
> + require redirection to gateway chassis, the reverse traffic is through
> + gateway chassis only. This is because, for external router, OVN
> logical
> + router port IP will be the next hop to reach the endpoints behind it.
> + As a result, we need a centralized chassis, which will respond to ARP
> + requests coming from external network. This centralized chassis, is
> the
> + gateway chassis which is attached to corresponding router port.
> + </p>
> +
> <h2>Life Cycle of a VTEP gateway</h2>
>
> <p>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index e1b757f..b1ff172 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -14021,7 +14021,7 @@ ovn-hv4-0
> OVN_CLEANUP([hv1], [hv2], [hv3])
> AT_CLEANUP
>
> -AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR chassis mac])
> +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR E-W chassis mac])
> ovn_start
>
>
> @@ -14031,6 +14031,8 @@ ovn_start
> # of VIF port name indicates the hypervisor it is bound to, e.g.
> # lp23 means VIF 3 on hv2.
> #
> +# Both the switches are connected to a logical router "router".
> +#
> # Each switch's VLAN tag and their logical switch ports are:
> # - ls1:
> # - tagged with VLAN 101
> @@ -14188,6 +14190,7 @@ test_ip() {
> echo "------ OVN dump ------"
> ovn-nbctl show
> ovn-sbctl show
> +ovn-sbctl list port_binding
>
> echo "------ hv1 dump ------"
> as hv1 ovs-vsctl show
> @@ -14214,6 +14217,308 @@ as hv2 ovs-appctl fdb/show br-phys
>
> OVN_CHECK_PACKETS([hv2/vif22-tx.pcap], [vif22.expected])
>
> +
> +# Associate a chassis as gateway chassis and validate garp.
> +
Looks like you wanted to add the above comment in the next test case ?
>
+OVN_CLEANUP([hv1],[hv2])
> +
> +AT_CLEANUP
> +
> +
> +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR N-S GARP])
> +ovn_start
> +
> +
> +# In this test cases we create 2 switches, all connected to same
> +# physical network (through br-phys on each HV). Each switch has
> +# 1 VIF. Each HV has 1 VIF port. The first digit
> +# of VIF port name indicates the hypervisor it is bound to, e.g.
> +# lp23 means VIF 3 on hv2.
> +#
> +# Both the switches are connected to a logical router "router".
> +#
> +# Additionally, we create a logical switch (ls-underlay) for N-S traffic.
> +#
> +# Each switch's VLAN tag and their logical switch ports are:
> +# - ls1:
> +# - tagged with VLAN 101
> +# - ports: lp11
> +# - ls2:
> +# - tagged with VLAN 201
> +# - ports: lp22
> +# - ls-underlay:
> +# - tagged with VLAN 1000
> +#
> +# Note: a localnet port is created for each switch to connect to
> +# physical network.
> +# lsp_to_ls LSP
> +#
> +# Prints the name of the logical switch that contains LSP.
> +
> +net_add n1
> +for i in 1 2; do
> + sim_add hv$i
> + as hv$i
> + ovs-vsctl add-br br-phys
> + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> + ovs-vsctl set open .
> external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i"
> + ovs-vsctl set open . external-ids:system-id="HV$i"
> + ovn_attach n1 br-phys 192.168.0.$i
> + ovs-vsctl set-controller br-int ptcp:
> +done
> +
> +ovn-nbctl ls-add ls-underlay bridged
> +ovn-nbctl lsp-add ls-underlay ln3 "" 1000
> +ovn-nbctl lsp-set-addresses ln3 unknown
> +ovn-nbctl lsp-set-type ln3 localnet
> +ovn-nbctl lsp-set-options ln3 network_name=phys
> +
> +ovn-nbctl lr-add router
> +ovn-nbctl lrp-add router router-to-underlay 00:00:01:01:02:07
> 172.31.0.1/24
> +
> +ovn-nbctl lsp-add ls-underlay underlay-to-router -- set
> Logical_Switch_Port \
> + underlay-to-router type=router \
> + options:router-port=router-to-underlay \
> + -- lsp-set-addresses underlay-to-router
> router
> +
> +ovn-nbctl --wait=sb sync
> +
> +# Associate hv2 as gateway chassis
> +ovn-nbctl lrp-set-gateway-chassis router-to-underlay hv2
> +
> +ovn-nbctl show
> +ovn-sbctl show
> +
> +# Dump a bunch of info helpful for debugging if there's a failure.
> +
> +echo "------ OVN dump ------"
> +ovn-nbctl show
> +ovn-sbctl show
> +
> +echo "------ hv1 dump ------"
> +as hv1 ovs-vsctl show
> +as hv1 ovs-vsctl list Open_Vswitch
> +
> +echo "------ hv2 dump ------"
> +as hv2 ovs-vsctl show
> +as hv2 ovs-vsctl list Open_Vswitch
> +
> +sleep 1
>
You can deletethe above sleep and use OVS_WAIT_UNTIL instead of AT_CHECK
below.
> +
> +echo "----------- Post Traffic hv1 dump -----------"
> +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +as hv1 ovs-appctl fdb/show br-phys
> +
> +echo "----------- Post Traffic hv2 dump -----------"
> +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +as hv2 ovs-appctl fdb/show br-phys
> +
> +AT_CHECK([as hv2 ovs-appctl fdb/show br-phys | grep 00:00:01:01:02:07 |
> grep 1000 | wc -l], [0], [[1
> +]])
> +
> OVN_CLEANUP([hv1],[hv2])
>
> AT_CLEANUP
> +
> +
> +AT_SETUP([ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR N-S Ping])
> +ovn_start
> +
> +# In this test cases we create 3 switches, all connected to same
> +# physical network (through br-phys on each HV). LS1 and LS2 have
> +# 1 VIF each. Each HV has 1 VIF port. The first digit
> +# of VIF port name indicates the hypervisor it is bound to, e.g.
> +# lp23 means VIF 3 on hv2.
> +#
> +# All the switches are connected to a logical router "router".
> +#
> +# Each switch's VLAN tag and their logical switch ports are:
> +# - ls1:
> +# - tagged with VLAN 101
> +# - ports: lp11
> +# - ls2:
> +# - tagged with VLAN 201
> +# - ports: lp22
> +# - ls-underlay:
> +# - tagged with VLAN 1000
> +# Note: a localnet port is created for each switch to connect to
> +# physical network.
> +
> +for i in 1 2; do
> + ls_name=ls$i
> + ovn-nbctl ls-add $ls_name bridged
> + ln_port_name=ln$i
> + if test $i -eq 1; then
> + ovn-nbctl lsp-add $ls_name $ln_port_name "" 101
> + elif test $i -eq 2; then
> + ovn-nbctl lsp-add $ls_name $ln_port_name "" 201
> + fi
> + ovn-nbctl lsp-set-addresses $ln_port_name unknown
> + ovn-nbctl lsp-set-type $ln_port_name localnet
> + ovn-nbctl lsp-set-options $ln_port_name network_name=phys
> +done
> +
> +# lsp_to_ls LSP
> +#
> +# Prints the name of the logical switch that contains LSP.
> +lsp_to_ls () {
> + case $1 in dnl (
> + lp?[[11]]) echo ls1 ;; dnl (
> + lp?[[12]]) echo ls2 ;; dnl (
> + *) AT_FAIL_IF([:]) ;;
> + esac
> +}
> +
> +vif_to_hv () {
> + case $1 in dnl (
> + vif[[1]]?) echo hv1 ;; dnl (
> + vif[[2]]?) echo hv2 ;; dnl (
> + vif?[[north]]?) echo hv4 ;; dnl (
> + *) AT_FAIL_IF([:]) ;;
> + esac
> +}
> +
> +ip_to_hex() {
> + printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +net_add n1
> +for i in 1 2; do
> + sim_add hv$i
> + as hv$i
> + ovs-vsctl add-br br-phys
> + ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> + ovs-vsctl set open .
> external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:$i$i"
> + ovn_attach n1 br-phys 192.168.0.$i
> +
> + ovs-vsctl add-port br-int vif$i$i -- \
> + set Interface vif$i$i external-ids:iface-id=lp$i$i \
> + options:tx_pcap=hv$i/vif$i$i-tx.pcap \
> + options:rxq_pcap=hv$i/vif$i$i-rx.pcap \
> + ofport-request=$i$i
> +
> + lsp_name=lp$i$i
> + ls_name=$(lsp_to_ls $lsp_name)
> +
> + ovn-nbctl lsp-add $ls_name $lsp_name
> + ovn-nbctl lsp-set-addresses $lsp_name "f0:00:00:00:00:$i$i
> 192.168.$i.$i"
> + ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$i
> +
> + OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup])
> +
> +done
> +
> +ovn-nbctl ls-add ls-underlay bridged
> +ovn-nbctl lsp-add ls-underlay ln3 "" 1000
> +ovn-nbctl lsp-set-addresses ln3 unknown
> +ovn-nbctl lsp-set-type ln3 localnet
> +ovn-nbctl lsp-set-options ln3 network_name=phys
> +
> +ovn-nbctl ls-add ls-north bridged
> +ovn-nbctl lsp-add ls-north ln4 "" 1000
> +ovn-nbctl lsp-set-addresses ln4 unknown
> +ovn-nbctl lsp-set-type ln4 localnet
> +ovn-nbctl lsp-set-options ln4 network_name=phys
> +
> +# Add a VM on ls-north
> +ovn-nbctl lsp-add ls-north lp-north
> +ovn-nbctl lsp-set-addresses lp-north "f0:f0:00:00:00:11 172.31.0.10"
> +ovn-nbctl lsp-set-port-security lp-north f0:f0:00:00:00:11
> +
> +# Add 3rd hypervisor
> +sim_add hv3
> +as hv3 ovs-vsctl add-br br-phys
> +as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +as hv3 ovs-vsctl set open .
> external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:33"
> +as hv3 ovn_attach n1 br-phys 192.168.0.3
> +
> +# Add 4th hypervisor
> +sim_add hv4
> +as hv4 ovs-vsctl add-br br-phys
> +as hv4 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +as hv4 ovs-vsctl set open .
> external-ids:ovn-chassis-mac-mappings="phys:aa:bb:cc:dd:ee:44"
> +as hv4 ovn_attach n1 br-phys 192.168.0.4
> +
> +as hv4 ovs-vsctl add-port br-int vif-north -- \
> + set Interface vif-north external-ids:iface-id=lp-north \
> + options:tx_pcap=hv4/vif-north-tx.pcap \
> + options:rxq_pcap=hv4/vif-north-rx.pcap \
> + ofport-request=44
> +
> +ovn-nbctl lr-add router
> +ovn-nbctl lrp-add router router-to-ls1 00:00:01:01:02:03 192.168.1.3/24
> +ovn-nbctl <http://192.168.1.3/24+ovn-nbctl> lrp-add router router-to-ls2
> 00:00:01:01:02:05 192.168.2.3/24
> +ovn-nbctl <http://192.168.2.3/24+ovn-nbctl> lrp-add router
> router-to-underlay 00:00:01:01:02:07 172.31.0.1/24
> +
> +ovn-nbctl lsp-add ls1 ls1-to-router -- set Logical_Switch_Port
> ls1-to-router type=router \
> + options:router-port=router-to-ls1 -- lsp-set-addresses
> ls1-to-router router
> +ovn-nbctl lsp-add ls2 ls2-to-router -- set Logical_Switch_Port
> ls2-to-router type=router \
> + options:router-port=router-to-ls2 -- lsp-set-addresses
> ls2-to-router router
> +ovn-nbctl lsp-add ls-underlay underlay-to-router -- set
> Logical_Switch_Port \
> + underlay-to-router type=router \
> + options:router-port=router-to-underlay \
> + -- lsp-set-addresses underlay-to-router
> router
> +
> +ovn-nbctl lrp-set-gateway-chassis router-to-underlay hv3
> +
> +ovn-nbctl --wait=sb sync
> +
> +sleep 2
> +
> +OVN_POPULATE_ARP
> +
> +test_ip() {
> + # This packet has bad checksums but logical L3 routing doesn't check.
> + local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5
> + local
> packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
> + shift; shift; shift; shift; shift
> + hv=`vif_to_hv $inport`
> + as $hv ovs-appctl netdev-dummy/receive $inport $packet
> +}
> +
> +# Dump a bunch of info helpful for debugging if there's a failure.
> +
> +echo "------ OVN dump ------"
> +ovn-nbctl show
> +ovn-sbctl show
> +ovn-sbctl list port_binding
> +ovn-sbctl list mac_binding
> +
> +echo "------ hv1 dump ------"
> +as hv1 ovs-vsctl show
> +as hv1 ovs-vsctl list Open_Vswitch
> +
> +echo "------ hv2 dump ------"
> +as hv2 ovs-vsctl show
> +as hv2 ovs-vsctl list Open_Vswitch
> +
> +echo "Send traffic"
> +sip=`ip_to_hex 192 168 1 1`
> +dip=`ip_to_hex 172 31 0 10`
> +test_ip vif11 f00000000011 000001010203 $sip $dip vif-north
> +
> +sleep 1
> +
> +echo "----------- Post Traffic hv1 dump -----------"
> +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +as hv1 ovs-appctl fdb/show br-phys
> +
> +echo "----------- Post Traffic hv2 dump -----------"
> +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +as hv2 ovs-appctl fdb/show br-phys
> +
> +echo "----------- Post Traffic hv3 dump -----------"
> +as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +as hv3 ovs-appctl fdb/show br-phys
> +
> +echo "----------- Post Traffic hv4 dump -----------"
> +as hv4 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +as hv4 ovs-appctl fdb/show br-phys
> +
> +# Confirm that HV1 chassis mac is never seen on Gateway chassis, i.e HV3
> +AT_CHECK([as hv3 ovs-appctl fdb/show br-phys | grep aa:bb:cc:dd:ee:11 |
> wc -l], [0], [[0
> +]])
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3],[hv4])
> +
> +AT_CLEANUP
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
More information about the dev
mailing list