[ovs-dev] [PATCH ovn v2] Fix the routing for external logical ports of bridged logical switches.

Ankur Sharma ankur.sharma at nutanix.com
Mon Jul 13 05:43:41 UTC 2020


Hi Numan,

Thank you so much for the details.

Following is my analysis on the feature:
a. Port of type EXTERNAL means that we create a logical switch port in OVN without a VIF backing.
b. i.e the physical port corresponding to external port is NOT behind OVN managed vswitch (for SRIOV specific case it is not behind any vswitch, since PNIC sends the packet directly yo guest driver).
    Just for the sake of further discussion we will refer the PHYSICAL PORT/VM corresponding to external port as SRIOV PORT/VM.
c. Now from OVN perspective, packets from SRIOV VM will enter the OVN flow pipeline via localnet port (on the Active HA Chassis).
d. For DHCP requests, the logical switch pipeline responds.
e. Now, we were trying to get the routing working.

Based on the understanding mentioned above, i tried following scenario and observed following:
a. SRIOV VM could talk to endpoints on other LS attached to LR via Active gateway/HA chassis.

https://docs.google.com/document/d/117yskeP1S3qHmkNrBrZ0PxXCvMJCCVJhcPG4phw1Qls/edit?usp=sharing
[https://lh5.googleusercontent.com/vhEZ3Ws4BXXkqoxc7naEXKPFC9qMqhJxKhumRzSyXsw9L1jbDc4B4r8cHBZZfOl9J6vYwgtyVA=w1200-h630-p]<https://docs.google.com/document/d/117yskeP1S3qHmkNrBrZ0PxXCvMJCCVJhcPG4phw1Qls/edit?usp=sharing>
OVN EXTERNAL PORT ROUTING<https://docs.google.com/document/d/117yskeP1S3qHmkNrBrZ0PxXCvMJCCVJhcPG4phw1Qls/edit?usp=sharing>
OVN EXTERNAL PORT EW ROUTING LOGICAL TOPOLOGY CONFIGURATION LOGICAL ROUTER router 2bd894b1-81a0-4095-9c58-0472aa5de19d (router) port router-to-gvlan1 mac: "00:00:01:01:02:03" networks: ["20.0.0.1/24"] port router-to-underlay mac: "00:00:01:01:02:...
docs.google.com

Explanation:
a. Since packets are coming through localnet port, hence from Router perspective, it is an external endpoint, i.e NS.
b. Now for such cases, Router is designed to respond to ARP requests ONLY on gateway chassis and since we have centralized a chassis for NS now, hence from the Gateway chassis there is no MAC replacement.
c. Based on a. and b. above, i attached the same gateway chassis to LRP (Logical Router Port) connecting to SRIOV PORT's LS.
d. And routing across LR connected Switches worked fine.
e. Mac table on TOR and ARP table on SRIOV VM was also fine, i.e ARP cache had <router_port_ip, router_port_mac> and Mac table had an entry for router port mac.


Inference:
a. For Routing, traffic from localnet ports has to enter via gateway node.
b. Hence, the Router port which connects to SR IOV VM Logical Switch has to be on same gateway node as corresponding external port (Which can be achieved easily by attaching same HA chassis group to both).


Improvements:
a. Current state is slightly restrictive, because we support only ONE l3gateway port per router. Which means that SRIOV Logical Switch has to be connected to physical network gateway as well.
i.e SRIOV VMs have to on the same logical switch which has the gateway for all the external traffic.

b. A more generic and complete implementation would be to enhance  OVN to support multiple gateway ports in a distributed router.
https://patchwork.ozlabs.org/project/openvswitch/patch/20180312090911.9608-1-ligs@dtdream.com/

c. I did observe 1 bug, where we are NOT blocking ARP requests for router port via localnet port in the absence of a gateway port configuration. Absence of guard, leads to multiple ARP responses and duplicate ICMP reply packets.I will submit a fix for this.


Please let me know your thoughts on the same and please feel free to call out, if i missed something.

Thanks

Regards,
Ankur

________________________________
From: Numan Siddique <numans at ovn.org>
Sent: Friday, July 10, 2020 6:18 AM
To: Ankur Sharma <ankur.sharma at nutanix.com>
Cc: dev at openvswitch.org <dev at openvswitch.org>
Subject: Re: [ovs-dev] [PATCH ovn v2] Fix the routing for external logical ports of bridged logical switches.



On Fri, Jul 10, 2020 at 4:41 PM Numan Siddique <numans at ovn.org<mailto:numans at ovn.org>> wrote:


On Fri, Jul 10, 2020 at 12:45 AM Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>> wrote:
Hi Numan, Daniel,

I have not looked at the patch yet. But replacing arp.sha with chassis mac is not the correct approach from networking perspective.
Chassic mac is NOT meant to replace the IP-MAC binding of router port, it is ONLY meant to ensure that for EW traffic a distributed router port mac does not show on multiple TOR ports.
Both for NS and EW, ARP resolution for router port ip should be responded with router port mac ONLY.

I am trying to understand the use case and we can discuss an alternative in this thread.
Can you share the repro steps, i can try the same and will try to come up with an alternative.


Hi Ankur,

In this particular case, the originator of the traffic is from a logical port of type 'external'.

One example of using external ports is for SRIOV VMs. The traffic from these VMs are not seen
by the local ovn-controller. And we want to provide E-W routing and other OVN services like DHCP, DNS etc
to these VMS.

So one of the controller nodes (which can receive the traffic sent by these SRIOV VMs) binds these external ports
and it responds to the ARP requests and does the routing for it.

To reproduce the issue, can you please use own-fake-multi node setup from here ? - https://github.com/numansiddique/ovn-fake-multinode/tree/vlan_chassis_mac_issue [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_numansiddique_ovn-2Dfake-2Dmultinode_tree_vlan-5Fchassis-5Fmac-5Fissue&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=08JHsDFgxW2FTSXgjxSJtnLy7M1MTPORv9hizGS6GcM&e=>

The steps are:
1. Build OVN containers.
    ./ovn_cluster.sh build


Please note, before the 'start', you need to start openvswitch on the host.

Thanks
Numan

2. ./ovn_cluster.sh start

Run
3. sudo ip netns exec sw0-ext1 ping -c3 20.0.0.3
PING 20.0.0.3 (20.0.0.3) 56(84) bytes of data.
64 bytes from 20.0.0.3 [20.0.0.3]<https://urldefense.proofpoint.com/v2/url?u=http-3A__20.0.0.3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=62q3nvlIlZ0o_lW4MD9nBcQeO3dFMImwlVb2ImFAYwM&e=>: icmp_seq=1 ttl=63 time=0.074 ms
64 bytes from 20.0.0.3 [20.0.0.3]<https://urldefense.proofpoint.com/v2/url?u=http-3A__20.0.0.3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=62q3nvlIlZ0o_lW4MD9nBcQeO3dFMImwlVb2ImFAYwM&e=>: icmp_seq=1 ttl=63 time=0.086 ms (DUP!)
64 bytes from 20.0.0.3 [20.0.0.3]<https://urldefense.proofpoint.com/v2/url?u=http-3A__20.0.0.3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=62q3nvlIlZ0o_lW4MD9nBcQeO3dFMImwlVb2ImFAYwM&e=>: icmp_seq=1 ttl=63 time=0.089 ms (DUP!)
64 bytes from 20.0.0.3 [20.0.0.3]<https://urldefense.proofpoint.com/v2/url?u=http-3A__20.0.0.3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=62q3nvlIlZ0o_lW4MD9nBcQeO3dFMImwlVb2ImFAYwM&e=>: icmp_seq=2 ttl=63 time=0.105 ms
64 bytes from 20.0.0.3 [20.0.0.3]<https://urldefense.proofpoint.com/v2/url?u=http-3A__20.0.0.3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=62q3nvlIlZ0o_lW4MD9nBcQeO3dFMImwlVb2ImFAYwM&e=>: icmp_seq=2 ttl=63 time=0.120 ms (DUP!)
64 bytes from 20.0.0.3 [20.0.0.3]<https://urldefense.proofpoint.com/v2/url?u=http-3A__20.0.0.3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=62q3nvlIlZ0o_lW4MD9nBcQeO3dFMImwlVb2ImFAYwM&e=>: icmp_seq=2 ttl=63 time=0.124 ms (DUP!)
64 bytes from 20.0.0.3 [20.0.0.3]<https://urldefense.proofpoint.com/v2/url?u=http-3A__20.0.0.3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=62q3nvlIlZ0o_lW4MD9nBcQeO3dFMImwlVb2ImFAYwM&e=>: icmp_seq=3 ttl=63 time=0.145 ms

--- 20.0.0.3 ping statistics ---
3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time 2036ms
rtt min/avg/max/mdev = 0.074/0.106/0.145/0.023 ms

You will see a few DUP packets.

$sudo ip netns exec sw0-ext1 ping -c3 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1 [10.0.0.1]<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=wkUosQoosSX262Atr1UvYSkmSeaaizNOw4D4Yh4sNLM&e=>: icmp_seq=1 ttl=254 time=0.298 ms
64 bytes from 10.0.0.1 [10.0.0.1]<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=wkUosQoosSX262Atr1UvYSkmSeaaizNOw4D4Yh4sNLM&e=>: icmp_seq=1 ttl=254 time=0.358 ms (DUP!)
64 bytes from 10.0.0.1 [10.0.0.1]<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=wkUosQoosSX262Atr1UvYSkmSeaaizNOw4D4Yh4sNLM&e=>: icmp_seq=1 ttl=254 time=0.384 ms (DUP!)
64 bytes from 10.0.0.1 [10.0.0.1]<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=wkUosQoosSX262Atr1UvYSkmSeaaizNOw4D4Yh4sNLM&e=>: icmp_seq=2 ttl=254 time=0.598 ms
64 bytes from 10.0.0.1 [10.0.0.1]<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=wkUosQoosSX262Atr1UvYSkmSeaaizNOw4D4Yh4sNLM&e=>: icmp_seq=2 ttl=254 time=0.594 ms (DUP!)
64 bytes from 10.0.0.1 [10.0.0.1]<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=wkUosQoosSX262Atr1UvYSkmSeaaizNOw4D4Yh4sNLM&e=>: icmp_seq=2 ttl=254 time=0.656 ms (DUP!)
64 bytes from 10.0.0.1 [10.0.0.1]<https://urldefense.proofpoint.com/v2/url?u=http-3A__10.0.0.1&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=wkUosQoosSX262Atr1UvYSkmSeaaizNOw4D4Yh4sNLM&e=>: icmp_seq=3 ttl=254 time=0.715 ms

--- 10.0.0.1 ping statistics ---
3 packets transmitted, 3 received, +4 duplicates, 0% packet loss, time 2088ms
rtt min/avg/max/mdev = 0.298/0.514/0.715/0.152 ms

In the setup, sw0-ext1 represents an external logical switch port. If you see the script here [1],
sw0-ext1 is claimed by ovn-chassis-1 node.

And when sw0-ext1 sends ARP request to 10.0.0.1, the arp request is handled by ovn-chassis-1
and the reply has  - arp.sha = router mac  and eth.src = chassis mac of ovn-chassis-1.

And hence sw0-ext1 sends ping packets with the destination mac of router port  IP - 10.0.0.1.
And all the 3 nodes reply - ovn-chassis-1, ovn-chassis-2 and ovn-gw-1.

I'm not sure if you have played with ovn-fake-multinode before. If you run "docker ps", you will see a docker
container representing each chassis.

Please do "docker exec -it ovn-central bash" and run a few ovn-nbctl/ovn-sbctl commands to know more.

You can also see the script in [1] and reproduce the issue in your setup.

I didn't find any other way to solve this issue. Also in normal situations where external ports are not used,
any arp request to the router IP from bridge logical switch ports don't leave the chassis since the local
ovn-controller itself replies. This is for tenant bridged VLAN logical switches. I guess for provider VLAN networks
(which provide the N/S traffic, I guess the arp request for the router port can come from the physical network).


[1] - https://github.com/numansiddique/ovn-fake-multinode/blob/vlan_chassis_mac_issue/ovn_cluster.sh#L501 [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_numansiddique_ovn-2Dfake-2Dmultinode_blob_vlan-5Fchassis-5Fmac-5Fissue_ovn-5Fcluster.sh-23L501&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=Z81yBhdW1o0FHPWqPcFDOeU7DOgPOdbCPDt9jNS8hf4&e=>


Thanks
Numan



Regards,
Ankur
________________________________
From: numans at ovn.org<mailto:numans at ovn.org> <numans at ovn.org<mailto:numans at ovn.org>>
Sent: Thursday, July 9, 2020 2:11 AM
To: dev at openvswitch.org<mailto:dev at openvswitch.org> <dev at openvswitch.org<mailto:dev at openvswitch.org>>
Cc: Numan Siddique <numans at ovn.org<mailto:numans at ovn.org>>; Daniel Alvarez <dalvarez at redhat.com<mailto:dalvarez at redhat.com>>; Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>>
Subject: [PATCH ovn v2] Fix the routing for external logical ports of bridged logical switches.

From: Numan Siddique <numans at ovn.org<mailto:numans at ovn.org>>

Routing for external logical ports is broken if these ports belonged
to bridged logical switches (with localnet port) and 'ovn-chassis-mac-mappings'
is configured. External logical ports are those which are external to OVN,
but there is a logical port for it and it is claimed by one of the HA chassis.
The claimed chassis provides routing and other native OVN serices like dhcp and dns.

When the external port sends ARP request for the router IP, the claimed chassis
replies for the ARP request, but the arp.sha is set to the actual router mac instead
of the chassis mac. This causes the traffic from external port VM/container to be handled
incorrectly. A ping to the router ip, is replied by all the chassis which can see this
packet instead of just the claimed HA chassis.

To fix this, this patch does 2 things.

1. In the table - OFTABLE_LOG_TO_PHY (65), it adds a 160 priority flow to
   modify the ARP packets arp.sha to store the chassis mac.

2. And when the packet destined to the chassis mac is received, it replaces the
   chassis mac with the actual router mac in table 0.

Reported-at: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1829762&d=DwIDAg&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=u_maNAEOYzfy4_tzUirBX0TdPn35ePuIddtQDl4B8fs&s=T6SxlTDjkPxA6_Lsv_KjWkOSUSfesz0LIVnovPxBXlc&e=
Reported-by<https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1829762&d=DwIDAg&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=u_maNAEOYzfy4_tzUirBX0TdPn35ePuIddtQDl4B8fs&s=T6SxlTDjkPxA6_Lsv_KjWkOSUSfesz0LIVnovPxBXlc&e=Reported-by>: Daniel Alvarez <dalvarez at redhat.com<mailto:dalvarez at redhat.com>>
CC: Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>>
Signed-off-by: Numan Siddique <numans at ovn.org<mailto:numans at ovn.org>>
---

v1 -> v2
----
  * Rebased.

 controller/chassis.c  |  48 ++++++++------
 controller/chassis.h  |   2 +
 controller/physical.c | 145 +++++++++++++++++++++++++++++++++++++++---
 tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=Q2fRVpcNqWezUjPuzhRmq0flEygq4AKjSbnWP1tVpJk&e=>          | 131 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 299 insertions(+), 27 deletions(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index eec270ea39..25146d75f2 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -645,10 +645,11 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 }

 bool
-chassis_get_mac(const struct sbrec_chassis *chassis_rec,
-                const char *bridge_mapping,
-                struct eth_addr *chassis_mac)
+chassis_get_mac_mappings(const struct sbrec_chassis *chassis_rec,
+                         struct smap *chassis_mappings)
 {
+    smap_init(chassis_mappings);
+
     const char *tokens
         = get_chassis_mac_mappings(&chassis_rec->other_config);
     if (!tokens[0]) {
@@ -656,7 +657,6 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
     }

     char *save_ptr = NULL;
-    bool ret = false;
     char *tokstr = xstrdup(tokens);

     /* Format for a chassis mac configuration is:
@@ -669,24 +669,36 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
         char *chassis_mac_bridge = strtok_r(token, ":", &save_ptr2);
         char *chassis_mac_str = strtok_r(NULL, "", &save_ptr2);

-        if (!strcmp(chassis_mac_bridge, bridge_mapping)) {
-            struct eth_addr temp_mac;
+        smap_replace(chassis_mappings, chassis_mac_bridge, chassis_mac_str);
+    }

-            /* Return the first chassis mac. */
-            char *err_str = str_to_mac(chassis_mac_str, &temp_mac);
-            if (err_str) {
-                free(err_str);
-                continue;
-            }
+    free(tokstr);
+    return true;
+}

-            ret = true;
-            *chassis_mac = temp_mac;
-            break;
-        }
+bool
+chassis_get_mac(const struct sbrec_chassis *chassis_rec,
+                const char *bridge_mapping,
+                struct eth_addr *chassis_mac)
+{
+    struct smap chassis_mappings;
+
+    if (!chassis_get_mac_mappings(chassis_rec, &chassis_mappings)) {
+        return false;
     }

-    free(tokstr);
-    return ret;
+    const char *chassis_mac_str = smap_get_def(&chassis_mappings,
+                                               bridge_mapping, "");
+    struct eth_addr temp_mac;
+
+    char *err_str = str_to_mac(chassis_mac_str, &temp_mac);
+    if (err_str) {
+        free(err_str);
+        return false;
+    }
+
+    *chassis_mac = temp_mac;
+    return true;
 }

 /* Returns true if the database is all cleaned up, false if more work is
diff --git a/controller/chassis.h b/controller/chassis.h
index 178d2957e8..dae761312d 100644
--- a/controller/chassis.h
+++ b/controller/chassis.h
@@ -42,6 +42,8 @@ bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
 bool chassis_get_mac(const struct sbrec_chassis *chassis,
                      const char *bridge_mapping,
                      struct eth_addr *chassis_mac);
+bool chassis_get_mac_mappings(const struct sbrec_chassis *,
+                              struct smap *chassis_mappings);
 const char *chassis_get_id(void);
 const char * get_chassis_mac_mappings(const struct smap *ext_ids);

diff --git a/controller/physical.c b/controller/physical.c
index 6d7d8e93bc..b43a157b94 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -62,7 +62,8 @@ load_logical_ingress_metadata(const struct sbrec_port_binding *binding,
 /* UUID to identify OF flows not associated with ovsdb rows. */
 static struct uuid *hc_uuid = NULL;

-#define CHASSIS_MAC_TO_ROUTER_MAC_CONJID        100
+#define CHASSIS_MAC_TO_ROUTER_SRC_MAC_CONJID        100
+#define CHASSIS_MAC_TO_ROUTER_DST_MAC_CONJID        101

 void
 physical_register_ovs_idl(struct ovsdb_idl *ovs_idl)
@@ -148,6 +149,18 @@ put_move(enum mf_field_id src, int src_ofs,
     move->dst.n_bits = n_bits;
 }

+static void
+put_value(const uint8_t *data, size_t len,
+          enum mf_field_id dst, int ofs, int n_bits,
+          struct ofpbuf *ofpacts)
+{
+    struct ofpact_set_field *sf = ofpact_put_set_field(ofpacts,
+                                                       mf_from_id(dst), NULL,
+                                                       NULL);
+    bitwise_copy(data, len, 0, sf->value, sf->field->n_bytes, ofs, n_bits);
+    bitwise_one(ofpact_set_field_mask(sf), sf->field->n_bytes, ofs, n_bits);
+}
+
 static void
 put_resubmit(uint8_t table_id, struct ofpbuf *ofpacts)
 {
@@ -494,11 +507,10 @@ put_chassis_mac_conj_id_flow(const struct sbrec_chassis_table *chassis_table,
         ofpbuf_clear(ofpacts_p);
         match_init_catchall(&match);

-
         match_set_dl_src(&match, chassis_mac);

         conj = ofpact_put_CONJUNCTION(ofpacts_p);
-        conj->id = CHASSIS_MAC_TO_ROUTER_MAC_CONJID;
+        conj->id = CHASSIS_MAC_TO_ROUTER_SRC_MAC_CONJID;
         conj->n_clauses = 2;
         conj->clause = 0;
         ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
@@ -507,6 +519,51 @@ put_chassis_mac_conj_id_flow(const struct sbrec_chassis_table *chassis_table,
     }

     free_remote_chassis_macs();
+
+    /* We need to replace the packet destined to the chassis mac (eth.dst)
+     * with the router mac. This is required to support external ports.
+     * These ports don't see the router mac at all since we send the
+     * chassis MAC in the ARP reply for any ARP requests to the router IPs.
+     * Without these flows, the packets will not enter the router pipeline
+     * if they need to be routed.
+     * Please see put_replace_chassis_mac_flows() for the 2nd clause of
+     * conj id - CHASSIS_MAC_TO_ROUTER_DST_MAC_CONJID.
+     * */
+    struct smap chassis_mac_mappings = SMAP_INITIALIZER(&chassis_mac_mappings);
+    if (chassis_get_mac_mappings(chassis, &chassis_mac_mappings)) {
+        struct smap_node *node;
+        struct sset macs = SSET_INITIALIZER(&macs);
+        SMAP_FOR_EACH (node, &chassis_mac_mappings) {
+            struct eth_addr chassis_mac;
+
+            char *err_str = str_to_mac(node->value, &chassis_mac);
+            if (err_str) {
+                free(err_str);
+                continue;
+            }
+
+            if (!sset_add(&macs, node->value)) {
+                /* The OF flow for the mac is already added. */
+                continue;
+            }
+
+            ofpbuf_clear(ofpacts_p);
+            match_init_catchall(&match);
+
+            match_set_dl_dst(&match, chassis_mac);
+
+            struct ofpact_conjunction *conj;
+            conj = ofpact_put_CONJUNCTION(ofpacts_p);
+            conj->id = CHASSIS_MAC_TO_ROUTER_DST_MAC_CONJID;
+            conj->n_clauses = 2;
+            conj->clause = 0;
+            ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
+                            0, &match, ofpacts_p, hc_uuid);
+        }
+        sset_destroy(&macs);
+    }
+
+    smap_destroy(&chassis_mac_mappings);
 }

 static void
@@ -555,7 +612,7 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,

         /* Match on ingress port, vlan_id and conjunction id */
         match_set_in_port(&match, ofport);
-        match_set_conj_id(&match, CHASSIS_MAC_TO_ROUTER_MAC_CONJID);
+        match_set_conj_id(&match, CHASSIS_MAC_TO_ROUTER_SRC_MAC_CONJID);

         if (tag) {
             match_set_dl_vlan(&match, htons(tag), 0);
@@ -572,6 +629,37 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,
         replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p);
         replace_mac->mac = router_port_mac;

+        /* Resubmit to first logical ingress pipeline table. */
+        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
+        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
+                        rport_binding->header_.uuid.parts[0],
+                        &match, ofpacts_p, hc_uuid);
+
+        ofpbuf_clear(ofpacts_p);
+        match_init_catchall(&match);
+
+        /* Add flow, which will match on conjunction id and will
+         * replace destination mac with router port mac */
+
+        /* Match on ingress port, vlan_id and conjunction id */
+        match_set_in_port(&match, ofport);
+        match_set_conj_id(&match, CHASSIS_MAC_TO_ROUTER_DST_MAC_CONJID);
+
+        if (tag) {
+            match_set_dl_vlan(&match, htons(tag), 0);
+        } else {
+            match_set_dl_tci_masked(&match, 0, htons(VLAN_CFI));
+        }
+
+        /* Actions */
+
+        if (tag) {
+            ofpact_put_STRIP_VLAN(ofpacts_p);
+        }
+        load_logical_ingress_metadata(localnet_port, &zone_ids, ofpacts_p);
+        replace_mac = ofpact_put_SET_ETH_DST(ofpacts_p);
+        replace_mac->mac = router_port_mac;
+
         /* Resubmit to first logical ingress pipeline table. */
         put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
         ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
@@ -579,7 +667,7 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,
                         &match, ofpacts_p, hc_uuid);

         /* Provide second search criteria, i.e localnet port's
-         * vlan ID for conjunction flow */
+         * vlan ID for conjunction flows. */
         struct ofpact_conjunction *conj;
         ofpbuf_clear(ofpacts_p);
         match_init_catchall(&match);
@@ -591,12 +679,19 @@ put_replace_chassis_mac_flows(const struct simap *ct_zones,
         }

         conj = ofpact_put_CONJUNCTION(ofpacts_p);
-        conj->id = CHASSIS_MAC_TO_ROUTER_MAC_CONJID;
+        conj->id = CHASSIS_MAC_TO_ROUTER_SRC_MAC_CONJID;
+        conj->n_clauses = 2;
+        conj->clause = 1;
+
+        conj = ofpact_put_CONJUNCTION(ofpacts_p);
+        conj->id = CHASSIS_MAC_TO_ROUTER_DST_MAC_CONJID;
         conj->n_clauses = 2;
         conj->clause = 1;
+
         ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180,
                         rport_binding->header_.uuid.parts[0],
                         &match, ofpacts_p, hc_uuid);
+
     }
 }

@@ -665,9 +760,6 @@ put_replace_router_port_mac_flows(struct ovsdb_idl_index
          * a. Flow replaces ingress router port mac with a chassis mac.
          * b. Flow appends the vlan id localnet port is configured with.
          */
-        match_init_catchall(&match);
-        ofpbuf_clear(ofpacts_p);
-
         ovs_assert(rport_binding->n_mac == 1);
         char *err_str = str_to_mac(rport_binding->mac[0], &router_port_mac);
         if (err_str) {
@@ -679,6 +771,9 @@ put_replace_router_port_mac_flows(struct ovsdb_idl_index
         }

         /* Replace Router mac flow */
+        match_init_catchall(&match);
+        ofpbuf_clear(ofpacts_p);
+
         match_set_metadata(&match, htonll(dp_key));
         match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
         match_set_dl_src(&match, router_port_mac);
@@ -698,6 +793,38 @@ put_replace_router_port_mac_flows(struct ovsdb_idl_index
         ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 150,
                         localnet_port->header_.uuid.parts[0],
                         &match, ofpacts_p, &localnet_port->header_.uuid);
+
+        /* Replace Router mac in the ARP packets (arp.sha) to the chassis MAC.
+         * This is very important and required for external logical ports and
+         * when these ports send ARP for their router IPs, the chassis mac
+         * should be sent which has claimed these external ports. */
+        match_init_catchall(&match);
+        ofpbuf_clear(ofpacts_p);
+
+        match_set_metadata(&match, htonll(dp_key));
+        match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
+        match_set_dl_src(&match, router_port_mac);
+        match_set_dl_type(&match, htons(ETH_TYPE_ARP));
+        match_set_arp_sha(&match, router_port_mac);
+
+        replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p);
+        replace_mac->mac = chassis_mac;
+
+        if (tag) {
+            struct ofpact_vlan_vid *vlan_vid;
+            vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
+            vlan_vid->vlan_vid = tag;
+            vlan_vid->push_vlan_if_needed = true;
+        }
+
+        put_value(chassis_mac.ea, sizeof chassis_mac.ea, MFF_ARP_SHA,
+                  0, 48, ofpacts_p);
+
+        ofpact_put_OUTPUT(ofpacts_p)->port = ofport;
+
+        ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 160,
+                        localnet_port->header_.uuid.parts[0],
+                        &match, ofpacts_p, &localnet_port->header_.uuid);
     }
 }

diff --git a/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=Q2fRVpcNqWezUjPuzhRmq0flEygq4AKjSbnWP1tVpJk&e=> b/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=Q2fRVpcNqWezUjPuzhRmq0flEygq4AKjSbnWP1tVpJk&e=>
index 24d93bc245..f033401410 100644
--- a/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=Q2fRVpcNqWezUjPuzhRmq0flEygq4AKjSbnWP1tVpJk&e=>
+++ b/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=Q2fRVpcNqWezUjPuzhRmq0flEygq4AKjSbnWP1tVpJk&e=>
@@ -14748,6 +14748,137 @@ AT_CHECK([cat ext1_v6.packets | cut -c -120], [0], [expout])
 cat ext1_v6.expected | cut -c 125- > expout
 AT_CHECK([cat ext1_v6.packets | cut -c 125-], [0], [expout])

+# Configure ovn-chassis-mac-mappings on all the hypervisors.
+as hv1
+ovs-vsctl set open . external_ids:ovn-chassis-mac-mappings=phys:1e:02:ad:aa:bb:01
+
+as hv2
+ovs-vsctl set open . external_ids:ovn-chassis-mac-mappings=phys:1e:02:ad:aa:bb:02
+
+as hv3
+ovs-vsctl set open . external_ids:ovn-chassis-mac-mappings=phys:1e:02:ad:aa:bb:03
+
+OVS_WAIT_UNTIL([test 6 = $(as hv1 ovs-ofctl dump-flows br-int table=0 | grep conj -c)])
+OVS_WAIT_UNTIL([test 6 = $(as hv2 ovs-ofctl dump-flows br-int table=0 | grep conj -c)])
+OVS_WAIT_UNTIL([test 6 = $(as hv3 ovs-ofctl dump-flows br-int table=0 | grep conj -c)])
+
+OVS_WAIT_UNTIL([test 1 = $(as hv1 ovs-ofctl dump-flows br-int table=0 | \
+grep conj | grep "dl_dst=1e:02:ad:aa:bb:01" -c)])
+
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=0 | \
+grep conj | grep "dl_dst=1e:02:ad:aa:bb:02" -c)])
+
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=0 | \
+grep conj | grep "dl_dst=1e:02:ad:aa:bb:03" -c)])
+
+OVS_WAIT_UNTIL([test 1 = $(as hv1 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb01->NXM_NX_ARP_SHA" -c)])
+
+OVS_WAIT_UNTIL([test 0 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb01->NXM_NX_ARP_SHA" -c)])
+
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb02->NXM_NX_ARP_SHA" -c)])
+
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb03->NXM_NX_ARP_SHA" -c)])
+
+as hv1
+reset_pcap_file hv1-ext1 hv1/ext1
+
+send_arp_request() {
+    local inport=$1 eth_src=$2 eth_dst=$3 spa=$4 tpa=$5
+    local reply_src_mac=$6 reply_dst_mac=$7
+    local reply_sha=$8 reply_tha=$9
+
+    local eth_type=0806
+    local eth=${eth_dst}${eth_src}${eth_type}
+
+    local arp=0001080006040001${eth_src}${spa}${eth_dst}${tpa}
+
+    local request=${eth}${arp}
+    as hv1 ovs-appctl netdev-dummy/receive hv${inport}-ext${inport} $request
+
+    local reply=${reply_dst_mac}${reply_src_mac}${eth_type}
+    reply=${reply}0001080006040002${reply_sha}${tpa}${reply_tha}${spa}
+    echo $reply > hv1-ext${inport}.expected
+}
+
+src_mac=f00000000003
+dst_mac=ffffffffffff
+reply_src_mac=1e02adaabb03
+repl_dst_mac=f00000000003
+# Send ARP request to router ip - 10.0.0.1
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb03->NXM_NX_ARP_SHA" | grep "n_packets=1" -c)])
+
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+
+as hv1
+reset_pcap_file hv1-ext1 hv1/ext1
+
+# Send unicast ARP request destined to the chassis mac of hv3.
+src_mac=f00000000003
+dst_mac=1e02adaabb03
+reply_src_mac=1e02adaabb03
+repl_dst_mac=f00000000003
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+
+OVS_WAIT_UNTIL([test 1 = $(as hv3 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb03->NXM_NX_ARP_SHA" | grep "n_packets=2" -c)])
+
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+
+# Make hv2 active.
+ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv2 60
+
+OVS_WAIT_UNTIL(
+    [chassis=`ovn-sbctl --bare --columns chassis find port_binding \
+logical_port=ls1-lp_ext1`
+    test "$chassis" = "$hv2_uuid"])
+
+reset_pcap_file hv1-ext1 hv1/ext1
+
+src_mac=f00000000003
+dst_mac=ffffffffffff
+reply_src_mac=1e02adaabb02
+repl_dst_mac=f00000000003
+# Send ARP request to router ip - 10.0.0.1. Should be replied by hv2.
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb02->NXM_NX_ARP_SHA" | grep "n_packets=1" -c)])
+
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+
+as hv1
+reset_pcap_file hv1-ext1 hv1/ext1
+
+# Send unicast ARP request destined to the chassis mac of hv2.
+src_mac=f00000000003
+dst_mac=1e02adaabb02
+reply_src_mac=1e02adaabb02
+repl_dst_mac=f00000000003
+send_arp_request 1 ${src_mac} ${dst_mac} $(ip_to_hex 10 0 0 6) $(ip_to_hex 10 0 0 1) \
+${reply_src_mac} ${repl_dst_mac} ${reply_src_mac} ${repl_dst_mac}
+
+OVS_WAIT_UNTIL([test 1 = $(as hv2 ovs-ofctl dump-flows br-int table=65,arp | \
+grep "load:0x1e02adaabb02->NXM_NX_ARP_SHA" | grep "n_packets=2" -c)])
+
+OVN_CHECK_PACKETS([hv1/ext1-tx.pcap], [hv1-ext1.expected])
+
+ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv3 70
+ovn-nbctl ha-chassis-group-add-chassis hagrp1 hv2 10
+OVS_WAIT_UNTIL(
+    [chassis=`ovn-sbctl --bare --columns chassis find port_binding \
+logical_port=ls1-lp_ext1`
+    test "$chassis" = "$hv3_uuid"])
+
 # disconnect hv3 from the network, hv1 should take over
 as hv3
 port=${sandbox}_br-phys
--
2.26.2

_______________________________________________
dev mailing list
dev at openvswitch.org<mailto:dev at openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=pi-QEhO1EOFRK_o1vb_1vCRSCTS1g5cBvqlc_EKo4oE&s=zN_7muosIOH3hM4whjjfwe4Jr-AWMZoDTgL9UqpVEIQ&e=>



More information about the dev mailing list