[ovs-dev] [PATCH v3 2/5] ovn: Introduce l3 gateway router.

Darrell Ball dlu998 at gmail.com
Tue May 24 22:24:59 UTC 2016


Correcting typo below

Also, supporting transit LS should "NOT" prevent other optimizations.



On Tue, May 24, 2016 at 3:17 PM, Darrell Ball <dlu998 at gmail.com> wrote:

>
>
> On Tue, May 24, 2016 at 7:41 AM, Guru Shetty <guru at ovn.org> wrote:
>
>>
>>
>> On 21 May 2016 at 11:48, Darrell Ball <dlu998 at gmail.com> wrote:
>>
>>> I made some modifications to code in Patches 1 and 2 to remove the
>>> Transit
>>> LS
>>> requirements.
>>>
>>
>> These are the reasons for the need of a LS to be able to be connected to
>> multiple routers.
>> 1. I think it should be left to the upstream user on how they want to
>> connect their DRs with GRs. On a 1000 node k8s cluster using peering would
>> mean that I need to add 1000 DR router ports and manage 1000 subnets. If I
>> use a LS in-between I need to add only one router port for the DR and
>> manage only one subnet.
>>
>
> I realize part of the topology is not controllable for k8s and this case
> specifically.
>
> This is a case where 1000 HVs each have their own "GR router" connected to
> a DR for east-west support.
> There is a tradeoff b/w distributing 1000 DR ports and extra static route
> associated flows for 1 DR datapath to each HV
>  vs
> 1 DR port, 1000 Transit LS ports total and a transit LS datapath required
> on 1000 HVs and distributing the
> Transit LS datapath flows to all 1000 HVs, as well as each Transit LS peer
> port requiring an extra arp flow. Thats 1000
> extra arp flows for each HV.
>
> For subnet management, I don't see much issues either way. /31 subnet
> management is trivial and easy to automate.
>
> For this specific K8s case, its not clear whether using a Transit LS is
> worse overall factoring in both
> data packet pipeline, number of flows, number of datapaths and extra
> complexity.
> In most cases, avoiding a Transit LS would be better.
>
>
Corrected typo here:


> Also, supporting transit LS should NOT prevent other optimizations.
>



>
>
>> 2. The ability to connect multiple routers to a switch is needed on the
>> north side of the GR as we will need to connect multiple GRs to a switch to
>> be able to access the physical network for ARP resolution. This is for both
>> north-south as well as east-west.
>>
>
> This is not a transit LS OVN case.
>
>
>
>> 3. A Transit LS is needed for the final patch in this series to work (i.e
>> actual DNAT and SNAT). The final patch needs the packet to enter the
>> ingress pipeline of a router. The current implementation cannot handle
>> peering as packets enter the egress pipeline of the router. To support
>> peering, it will need further enhancements.
>>
>
> The dependency of the final patch on Transit LS usage/topology is
> something that I wanted to make clear with
> this exchange, especially for folks not part of the discussion last week.
>
>
>
>>
>>
>>
>>>
>>> In summary:
>>>
>>> I removed all changes to lflow.c thereby reinstating the previous
>>> optimization.
>>>
>>> I made some modifications to ovn-northd.c changes to remove the Transit
>>> LS
>>> special
>>> aspects and additional arp flows.
>>>
>>> I left the other code changes in Patches 1 and 2 as they were.
>>>
>>> The overall resulting diff to support both patches 1 and 2 is reduced in
>>> ovn-northd.c
>>> and becomes:
>>>
>>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>>> index b271f7f..2e2236b 100644
>>> --- a/ovn/northd/ovn-northd.c
>>> +++ b/ovn/northd/ovn-northd.c
>>> @@ -702,11 +702,25 @@ ovn_port_update_sbrec(const struct ovn_port *op)
>>>  {
>>>      sbrec_port_binding_set_datapath(op->sb, op->od->sb);
>>>      if (op->nbr) {
>>> -        sbrec_port_binding_set_type(op->sb, "patch");
>>> +        /* If the router is for l3 gateway, it resides on a chassis
>>> +         * and its port type is "gateway". */
>>> +        const char *chassis = smap_get(&op->od->nbr->options,
>>> "chassis");
>>> +
>>> +        if (chassis && op->peer && op->peer->od && op->peer->od->nbs){
>>> +            sbrec_port_binding_set_type(op->sb, "gateway");
>>> +        } else {
>>> +            sbrec_port_binding_set_type(op->sb, "patch");
>>> +        }
>>>
>>>          const char *peer = op->peer ? op->peer->key : "<error>";
>>> -        const struct smap ids = SMAP_CONST1(&ids, "peer", peer);
>>> -        sbrec_port_binding_set_options(op->sb, &ids);
>>> +        struct smap new;
>>> +        smap_init(&new);
>>> +        smap_add(&new, "peer", peer);
>>> +        if (chassis) {
>>> +            smap_add(&new, "gateway-chassis", chassis);
>>> +        }
>>> +        sbrec_port_binding_set_options(op->sb, &new);
>>> +        smap_destroy(&new);
>>>
>>>          sbrec_port_binding_set_parent_port(op->sb, NULL);
>>>          sbrec_port_binding_set_tag(op->sb, NULL, 0);
>>> @@ -716,15 +730,31 @@ ovn_port_update_sbrec(const struct ovn_port *op)
>>>              sbrec_port_binding_set_type(op->sb, op->nbs->type);
>>>              sbrec_port_binding_set_options(op->sb, &op->nbs->options);
>>>          } else {
>>> -            sbrec_port_binding_set_type(op->sb, "patch");
>>> +            const char *chassis = NULL;
>>> +            if (op->peer && op->peer->od && op->peer->od->nbr) {
>>> +                chassis = smap_get(&op->peer->od->nbr->options,
>>> "chassis");
>>> +            }
>>> +            /* A switch port connected to a gateway router is also of
>>> +             * type "gateway". */
>>> +            if (chassis) {
>>> +                sbrec_port_binding_set_type(op->sb, "gateway");
>>> +            } else {
>>> +                sbrec_port_binding_set_type(op->sb, "patch");
>>> +            }
>>>
>>>              const char *router_port = smap_get(&op->nbs->options,
>>>                                                 "router-port");
>>>              if (!router_port) {
>>>                  router_port = "<error>";
>>>              }
>>> -            const struct smap ids = SMAP_CONST1(&ids, "peer",
>>> router_port);
>>> -            sbrec_port_binding_set_options(op->sb, &ids);
>>> +            struct smap new;
>>> +            smap_init(&new);
>>> +            smap_add(&new, "peer", router_port);
>>> +            if (chassis) {
>>> +                smap_add(&new, "gateway-chassis", chassis);
>>> +            }
>>> +            sbrec_port_binding_set_options(op->sb, &new);
>>> +            smap_destroy(&new);
>>>          }
>>>          sbrec_port_binding_set_parent_port(op->sb,
>>> op->nbs->parent_name);
>>>          sbrec_port_binding_set_tag(op->sb, op->nbs->tag,
>>> op->nbs->n_tag);
>>>
>>>
>>> I added a new test to demonstrate direct DR<->GR connectivity.
>>>
>>>
>>> AT_SETUP([ovn -- 2 HVs, 3 LRs, 1 DR directly connected to 2 gateway
>>> routers
>>> ])
>>> AT_KEYWORDS([ovndirectlyconnectedrouters])
>>> AT_SKIP_IF([test $HAVE_PYTHON = no])
>>> ovn_start
>>>
>>> # Logical network:
>>> # Three LRs - R1, R2 and R3 that are connected to each other directly
>>> # in 20.0.0.2/31 and 21.0.0.2/31 networks. R1 has switch foo (
>>> 192.168.1.0/24
>>> )
>>> # connected to it. R2 has alice (172.16.1.0/24) and R3 has bob (
>>> 10.32.1.0/24
>>> )
>>> # connected to it.
>>>
>>> ovn-nbctl create Logical_Router name=R1
>>> ovn-nbctl create Logical_Router name=R2 options:chassis="hv2"
>>> ovn-nbctl create Logical_Router name=R3 options:chassis="hv2"
>>>
>>> ovn-nbctl lswitch-add foo
>>> ovn-nbctl lswitch-add alice
>>> ovn-nbctl lswitch-add bob
>>>
>>> # Connect foo to R1
>>> ovn-nbctl -- --id=@lrp create Logical_Router_port name=foo \
>>> network=192.168.1.1/24 mac=\"00:00:01:01:02:03\" -- add Logical_Router
>>> R1 \
>>> ports @lrp -- lport-add foo rp-foo
>>>
>>> ovn-nbctl set Logical_port rp-foo type=router options:router-port=foo \
>>> addresses=\"00:00:01:01:02:03\"
>>>
>>> # Connect alice to R2
>>> ovn-nbctl -- --id=@lrp create Logical_Router_port name=alice \
>>> network=172.16.1.1/24 mac=\"00:00:02:01:02:03\" -- add Logical_Router
>>> R2 \
>>> ports @lrp -- lport-add alice rp-alice
>>>
>>> ovn-nbctl set Logical_port rp-alice type=router
>>> options:router-port=alice \
>>> addresses=\"00:00:02:01:02:03\"
>>>
>>> # Connect bob to R3
>>> ovn-nbctl -- --id=@lrp create Logical_Router_port name=bob \
>>> network=10.32.1.1/24 mac=\"00:00:03:01:02:03\" -- add Logical_Router R3
>>> \
>>> ports @lrp -- lport-add bob rp-bob
>>>
>>> ovn-nbctl set Logical_port rp-bob type=router options:router-port=bob \
>>> addresses=\"00:00:03:01:02:03\"
>>>
>>> # Interconnect R1 and R2
>>> lrp1_uuid_2_R2=`ovn-nbctl -- --id=@lrp create Logical_Router_port
>>> name=R1_R2 \
>>> network="20.0.0.2/31" mac=\"00:00:00:02:03:04\" \
>>> -- add Logical_Router R1 ports @lrp`
>>>
>>> lrp2_uuid_2_R1=`ovn-nbctl -- --id=@lrp create Logical_Router_port
>>> name=R2_R1 \
>>> network="20.0.0.3/31" mac=\"00:00:00:02:03:05\" \
>>> -- add Logical_Router R2 ports @lrp`
>>>
>>> ovn-nbctl set logical_router_port $lrp1_uuid_2_R2 peer="R2_R1"
>>> ovn-nbctl set logical_router_port $lrp2_uuid_2_R1 peer="R1_R2"
>>>
>>> # Interconnect R1 and R3
>>> lrp1_uuid_2_R3=`ovn-nbctl -- --id=@lrp create Logical_Router_port
>>> name=R1_R3 \
>>> network="21.0.0.2/31" mac=\"00:00:21:02:03:04\" \
>>> -- add Logical_Router R1 ports @lrp`
>>>
>>> lrp3_uuid_2_R1=`ovn-nbctl -- --id=@lrp create Logical_Router_port
>>> name=R3_R1 \
>>> network="21.0.0.3/31" mac=\"00:00:21:02:03:05\" \
>>> -- add Logical_Router R3 ports @lrp`
>>>
>>> ovn-nbctl set logical_router_port $lrp1_uuid_2_R3 peer="R3_R1"
>>> ovn-nbctl set logical_router_port $lrp3_uuid_2_R1 peer="R1_R3"
>>>
>>> #install static route in R1 to get to alice
>>> ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>>> ip_prefix=172.16.1.0/24 nexthop=20.0.0.3 -- add Logical_Router \
>>> R1 static_routes @lrt
>>>
>>> #install static route in R1 to get to bob
>>> ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>>> ip_prefix=10.32.1.0/24 nexthop=21.0.0.3 -- add Logical_Router \
>>> R1 static_routes @lrt
>>>
>>> #install static route in R2 to get to foo
>>> ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>>> ip_prefix=192.168.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \
>>> R2 static_routes @lrt
>>>
>>> # Create terminal logical ports
>>> # Create logical port foo1 in foo
>>> ovn-nbctl lport-add foo foo1 \
>>> -- lport-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>>>
>>> # Create logical port alice1 in alice
>>> ovn-nbctl lport-add alice alice1 \
>>> -- lport-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2"
>>>
>>> # Create logical port bob1 in bob
>>> ovn-nbctl lport-add bob bob1 \
>>> -- lport-set-addresses bob1 "f0:00:00:01:02:05 10.32.1.2"
>>>
>>> # Create two hypervisor and create OVS ports corresponding to logical
>>> ports.
>>> net_add n1
>>>
>>> sim_add hv1
>>> as hv1
>>> ovs-vsctl add-br br-phys
>>> ovn_attach n1 br-phys 192.168.0.1
>>> ovs-vsctl -- add-port br-int hv1-vif1 -- \
>>>     set interface hv1-vif1 external-ids:iface-id=foo1 \
>>>     options:tx_pcap=hv1/vif1-tx.pcap \
>>>     options:rxq_pcap=hv1/vif1-rx.pcap \
>>>     ofport-request=1
>>>
>>> sim_add hv2
>>> as hv2
>>> ovs-vsctl add-br br-phys
>>> ovn_attach n1 br-phys 192.168.0.2
>>> ovs-vsctl -- add-port br-int hv2-vif1 -- \
>>>     set interface hv2-vif1 external-ids:iface-id=bob1 \
>>>     options:tx_pcap=hv2/vif1-tx.pcap \
>>>     options:rxq_pcap=hv2/vif1-rx.pcap \
>>>     ofport-request=1
>>>
>>> ovs-vsctl -- add-port br-int hv2-vif2 -- \
>>>     set interface hv2-vif2 external-ids:iface-id=alice1 \
>>>     options:tx_pcap=hv2/vif2-tx.pcap \
>>>     options:rxq_pcap=hv2/vif2-rx.pcap \
>>>     ofport-request=2
>>>
>>> # Pre-populate the hypervisors' ARP tables so that we don't lose any
>>> # packets for ARP resolution (native tunneling doesn't queue packets
>>> # for ARP resolution).
>>> ovn_populate_arp
>>>
>>> # Allow some time for ovn-northd and ovn-controller to catch up.
>>> # XXX This should be more systematic.
>>> sleep 1
>>>
>>> ip_to_hex() {
>>>     printf "%02x%02x%02x%02x" "$@"
>>> }
>>> trim_zeros() {
>>>     sed 's/\(00\)\{1,\}$//'
>>> }
>>>
>>> # Send ip packets between foo1 and alice1
>>> src_mac="f00000010203"
>>> dst_mac="000001010203"
>>> src_ip=`ip_to_hex 192 168 1 2`
>>> dst_ip=`ip_to_hex 172 16 1 2`
>>>
>>> packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>>> as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>>> as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet
>>>
>>> # Send ip packets between foo1 and bob1
>>> src_mac="f00000010203"
>>> dst_mac="000001010203"
>>> src_ip=`ip_to_hex 192 168 1 2`
>>> dst_ip=`ip_to_hex 10 32 1 2`
>>>
>>> packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>>> as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>>>
>>> # Send ip packets from alice1 to foo1
>>> src_mac="f00000010204"
>>> dst_mac="000002010203"
>>> src_ip=`ip_to_hex 172 16 1 2`
>>> dst_ip=`ip_to_hex 192 168 1 2`
>>>
>>> packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>>> as hv2 ovs-appctl netdev-dummy/receive hv2-vif2 $packet
>>>
>>> echo "---------NB dump-----"
>>> ovn-nbctl show
>>> echo "---------------------"
>>> ovn-nbctl list logical_router
>>> echo "---------------------"
>>> ovn-nbctl list logical_router_port
>>> echo "---------------------"
>>>
>>> echo "---------SB dump-----"
>>> ovn-sbctl list datapath_binding
>>> echo "---------------------"
>>> ovn-sbctl list port_binding
>>> echo "---------------------"
>>> #ovn-sbctl dump-flows
>>> echo "---------------------"
>>>
>>> echo "------ hv1 dump ----------"
>>> as hv1 ovs-vsctl show
>>> as hv1 ovs-ofctl show br-int
>>> as hv1 ovs-ofctl dump-flows br-int
>>> echo "------ hv2 dump ----------"
>>> as hv2 ovs-vsctl show
>>> as hv2 ovs-ofctl show br-int
>>> as hv2 ovs-ofctl dump-flows br-int
>>> echo "----------------------------"
>>>
>>> # Packet to Expect at alice1
>>> src_mac="000002010203"
>>> dst_mac="f00000010204"
>>> src_ip=`ip_to_hex 192 168 1 2`
>>> dst_ip=`ip_to_hex 172 16 1 2`
>>>
>>> expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000
>>>
>>> $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif2-tx.pcap |
>>> trim_zeros >
>>> received.packets
>>> echo $expected | trim_zeros > expout
>>> AT_CHECK([cat received.packets], [0], [expout])
>>>
>>> # Packet to Expect at bob1
>>> src_mac="000003010203"
>>> dst_mac="f00000010205"
>>> src_ip=`ip_to_hex 192 168 1 2`
>>> dst_ip=`ip_to_hex 10 32 1 2`
>>>
>>> expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000
>>>
>>> $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap |
>>> trim_zeros >
>>> received1.packets
>>> echo $expected | trim_zeros > expout
>>> AT_CHECK([cat received1.packets], [0], [expout])
>>>
>>> # Packet to Expect at foo1
>>> src_mac="000001010203"
>>> dst_mac="f00000010203"
>>> src_ip=`ip_to_hex 172 16 1 2`
>>> dst_ip=`ip_to_hex 192 168 1 2`
>>>
>>> expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000
>>>
>>> $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap |
>>> trim_zeros >
>>> received2.packets
>>> echo $expected | trim_zeros > expout
>>> AT_CHECK([cat received2.packets], [0], [expout])
>>>
>>> for sim in hv1 hv2; do
>>>     as $sim
>>>     OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>>     OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>>>     OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>> done
>>>
>>> as ovn-sb
>>> OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>>
>>> as ovn-nb
>>> OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>>
>>> as northd
>>> OVS_APP_EXIT_AND_WAIT([ovn-northd])
>>>
>>> as main
>>> OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>>> OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>>
>>> AT_CLEANUP
>>>
>>> On Thu, May 19, 2016 at 1:02 PM, Gurucharan Shetty <guru at ovn.org> wrote:
>>>
>>> > Currently OVN has distributed switches and routers. When a packet
>>> > exits a container or a VM, the entire lifecycle of the packet
>>> > through multiple switches and routers are calculated in source
>>> > chassis itself. When the destination endpoint resides on a different
>>> > chassis, the packet is sent to the other chassis and it only goes
>>> > through the egress pipeline of that chassis once and eventually to
>>> > the real destination.
>>> >
>>> > When the packet returns back, the same thing happens. The return
>>> > packet leaves the VM/container on the chassis where it resides.
>>> > The packet goes through all the switches and routers in the logical
>>> > pipleline on that chassis and then sent to the eventual destination
>>> > over the tunnel.
>>> >
>>> > The above makes the logical pipeline very flexible and easy. But,
>>> > creates a problem for cases where you need to add stateful services
>>> > (via conntrack) on switches and routers.
>>> >
>>> > For l3 gateways, we plan to leverage DNAT and SNAT functionality
>>> > and we want to apply DNAT and SNAT rules on a router. So we ideally
>>> need
>>> > the packet to go through that router in both directions in the same
>>> > chassis. To achieve this, this commit introduces a new gateway router
>>> > which is
>>> > static and can be connected to your distributed router via a switch.
>>> >
>>> > To make minimal changes in OVN's logical pipeline, this commit
>>> > tries to make the switch port connected to a l3 gateway router look
>>> like
>>> > a container/VM endpoint for every other chassis except the chassis
>>> > on which the l3 gateway router resides. On the chassis where the
>>> > gateway router resides, the connection looks just like a patch port.
>>> >
>>> > This is achieved by the doing the following:
>>> > Introduces a new type of port_binding record called 'gateway'.
>>> > On the chassis where the gateway router resides, this port behaves just
>>> > like the port of type 'patch'. The ovn-controller on that chassis
>>> > populates the "chassis" column for this record as an indication for
>>> > other ovn-controllers of its physical location. Other ovn-controllers
>>> > treat this port as they would treat a VM/Container port on a different
>>> > chassis.
>>> >
>>> > Signed-off-by: Gurucharan Shetty <guru at ovn.org>
>>> > ---
>>> >  ovn/controller/binding.c        |   3 +-
>>> >  ovn/controller/ovn-controller.c |   5 +-
>>> >  ovn/controller/patch.c          |  29 ++++++-
>>> >  ovn/controller/patch.h          |   3 +-
>>> >  ovn/northd/ovn-northd.c         |  42 +++++++--
>>> >  ovn/ovn-nb.ovsschema            |   9 +-
>>> >  ovn/ovn-nb.xml                  |  15 ++++
>>> >  ovn/ovn-sb.xml                  |  35 +++++++-
>>> >  tests/ovn.at                    | 184
>>> > ++++++++++++++++++++++++++++++++++++++++
>>> >  9 files changed, 309 insertions(+), 16 deletions(-)
>>> >
>>> > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
>>> > index a0d8b96..e5e55b1 100644
>>> > --- a/ovn/controller/binding.c
>>> > +++ b/ovn/controller/binding.c
>>> > @@ -200,7 +200,8 @@ binding_run(struct controller_ctx *ctx, const
>>> struct
>>> > ovsrec_bridge *br_int,
>>> >                  }
>>> >                  sbrec_port_binding_set_chassis(binding_rec,
>>> chassis_rec);
>>> >              }
>>> > -        } else if (chassis_rec && binding_rec->chassis ==
>>> chassis_rec) {
>>> > +        } else if (chassis_rec && binding_rec->chassis == chassis_rec
>>> > +                   && strcmp(binding_rec->type, "gateway")) {
>>> >              if (ctx->ovnsb_idl_txn) {
>>> >                  VLOG_INFO("Releasing lport %s from this chassis.",
>>> >                            binding_rec->logical_port);
>>> > diff --git a/ovn/controller/ovn-controller.c
>>> > b/ovn/controller/ovn-controller.c
>>> > index 511b184..bc4c24f 100644
>>> > --- a/ovn/controller/ovn-controller.c
>>> > +++ b/ovn/controller/ovn-controller.c
>>> > @@ -364,8 +364,9 @@ main(int argc, char *argv[])
>>> >                      &local_datapaths);
>>> >          }
>>> >
>>> > -        if (br_int) {
>>> > -            patch_run(&ctx, br_int, &local_datapaths,
>>> &patched_datapaths);
>>> > +        if (br_int && chassis_id) {
>>> > +            patch_run(&ctx, br_int, chassis_id, &local_datapaths,
>>> > +                      &patched_datapaths);
>>> >
>>> >              struct lport_index lports;
>>> >              struct mcgroup_index mcgroups;
>>> > diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c
>>> > index 4808146..e8abe30 100644
>>> > --- a/ovn/controller/patch.c
>>> > +++ b/ovn/controller/patch.c
>>> > @@ -267,12 +267,28 @@ add_patched_datapath(struct hmap
>>> *patched_datapaths,
>>> >  static void
>>> >  add_logical_patch_ports(struct controller_ctx *ctx,
>>> >                          const struct ovsrec_bridge *br_int,
>>> > +                        const char *local_chassis_id,
>>> >                          struct shash *existing_ports,
>>> >                          struct hmap *patched_datapaths)
>>> >  {
>>> > +    const struct sbrec_chassis *chassis_rec;
>>> > +    chassis_rec = get_chassis(ctx->ovnsb_idl, local_chassis_id);
>>> > +    if (!chassis_rec) {
>>> > +        return;
>>> > +    }
>>> > +
>>> >      const struct sbrec_port_binding *binding;
>>> >      SBREC_PORT_BINDING_FOR_EACH (binding, ctx->ovnsb_idl) {
>>> > -        if (!strcmp(binding->type, "patch")) {
>>> > +        bool local_port = false;
>>> > +        if (!strcmp(binding->type, "gateway")) {
>>> > +            const char *chassis = smap_get(&binding->options,
>>> > +                                           "gateway-chassis");
>>> > +            if (!strcmp(local_chassis_id, chassis)) {
>>> > +                local_port = true;
>>> > +            }
>>> > +        }
>>> > +
>>> > +        if (!strcmp(binding->type, "patch") || local_port) {
>>> >              const char *local = binding->logical_port;
>>> >              const char *peer = smap_get(&binding->options, "peer");
>>> >              if (!peer) {
>>> > @@ -287,13 +303,19 @@ add_logical_patch_ports(struct controller_ctx
>>> *ctx,
>>> >              free(dst_name);
>>> >              free(src_name);
>>> >              add_patched_datapath(patched_datapaths, binding);
>>> > +            if (local_port) {
>>> > +                if (binding->chassis != chassis_rec &&
>>> > ctx->ovnsb_idl_txn) {
>>> > +                    sbrec_port_binding_set_chassis(binding,
>>> chassis_rec);
>>> > +                }
>>> > +            }
>>> >          }
>>> >      }
>>> >  }
>>> >
>>> >  void
>>> >  patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge
>>> *br_int,
>>> > -          struct hmap *local_datapaths, struct hmap
>>> *patched_datapaths)
>>> > +          const char *chassis_id, struct hmap *local_datapaths,
>>> > +          struct hmap *patched_datapaths)
>>> >  {
>>> >      if (!ctx->ovs_idl_txn) {
>>> >          return;
>>> > @@ -313,7 +335,8 @@ patch_run(struct controller_ctx *ctx, const struct
>>> > ovsrec_bridge *br_int,
>>> >       * 'existing_ports' any patch ports that do exist in the database
>>> and
>>> >       * should be there. */
>>> >      add_bridge_mappings(ctx, br_int, &existing_ports,
>>> local_datapaths);
>>> > -    add_logical_patch_ports(ctx, br_int, &existing_ports,
>>> > patched_datapaths);
>>> > +    add_logical_patch_ports(ctx, br_int, chassis_id, &existing_ports,
>>> > +                            patched_datapaths);
>>> >
>>> >      /* Now 'existing_ports' only still contains patch ports that
>>> exist in
>>> > the
>>> >       * database but shouldn't.  Delete them from the database. */
>>> > diff --git a/ovn/controller/patch.h b/ovn/controller/patch.h
>>> > index d5d842e..7920a48 100644
>>> > --- a/ovn/controller/patch.h
>>> > +++ b/ovn/controller/patch.h
>>> > @@ -27,6 +27,7 @@ struct hmap;
>>> >  struct ovsrec_bridge;
>>> >
>>> >  void patch_run(struct controller_ctx *, const struct ovsrec_bridge
>>> > *br_int,
>>> > -               struct hmap *local_datapaths, struct hmap
>>> > *patched_datapaths);
>>> > +               const char *chassis_id, struct hmap *local_datapaths,
>>> > +               struct hmap *patched_datapaths);
>>> >
>>> >  #endif /* ovn/patch.h */
>>> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>>> > index f469e89..7852d83 100644
>>> > --- a/ovn/northd/ovn-northd.c
>>> > +++ b/ovn/northd/ovn-northd.c
>>> > @@ -690,11 +690,24 @@ ovn_port_update_sbrec(const struct ovn_port *op)
>>> >  {
>>> >      sbrec_port_binding_set_datapath(op->sb, op->od->sb);
>>> >      if (op->nbr) {
>>> > -        sbrec_port_binding_set_type(op->sb, "patch");
>>> > +        /* If the router is for l3 gateway, it resides on a chassis
>>> > +         * and its port type is "gateway". */
>>> > +        const char *chassis = smap_get(&op->od->nbr->options,
>>> "chassis");
>>> > +        if (chassis) {
>>> > +            sbrec_port_binding_set_type(op->sb, "gateway");
>>> > +        } else {
>>> > +            sbrec_port_binding_set_type(op->sb, "patch");
>>> > +        }
>>> >
>>> >          const char *peer = op->peer ? op->peer->key : "<error>";
>>> > -        const struct smap ids = SMAP_CONST1(&ids, "peer", peer);
>>> > -        sbrec_port_binding_set_options(op->sb, &ids);
>>> > +        struct smap new;
>>> > +        smap_init(&new);
>>> > +        smap_add(&new, "peer", peer);
>>> > +        if (chassis) {
>>> > +            smap_add(&new, "gateway-chassis", chassis);
>>> > +        }
>>> > +        sbrec_port_binding_set_options(op->sb, &new);
>>> > +        smap_destroy(&new);
>>> >
>>> >          sbrec_port_binding_set_parent_port(op->sb, NULL);
>>> >          sbrec_port_binding_set_tag(op->sb, NULL, 0);
>>> > @@ -704,15 +717,32 @@ ovn_port_update_sbrec(const struct ovn_port *op)
>>> >              sbrec_port_binding_set_type(op->sb, op->nbs->type);
>>> >              sbrec_port_binding_set_options(op->sb, &op->nbs->options);
>>> >          } else {
>>> > -            sbrec_port_binding_set_type(op->sb, "patch");
>>> > +            const char *chassis = NULL;
>>> > +            if (op->peer && op->peer->od && op->peer->od->nbr) {
>>> > +                chassis = smap_get(&op->peer->od->nbr->options,
>>> > "chassis");
>>> > +            }
>>> > +
>>> > +            /* A switch port connected to a gateway router is also of
>>> > +             * type "gateway". */
>>> > +            if (chassis) {
>>> > +                sbrec_port_binding_set_type(op->sb, "gateway");
>>> > +            } else {
>>> > +                sbrec_port_binding_set_type(op->sb, "patch");
>>> > +            }
>>> >
>>> >              const char *router_port = smap_get(&op->nbs->options,
>>> >                                                 "router-port");
>>> >              if (!router_port) {
>>> >                  router_port = "<error>";
>>> >              }
>>> > -            const struct smap ids = SMAP_CONST1(&ids, "peer",
>>> > router_port);
>>> > -            sbrec_port_binding_set_options(op->sb, &ids);
>>> > +            struct smap new;
>>> > +            smap_init(&new);
>>> > +            smap_add(&new, "peer", router_port);
>>> > +            if (chassis) {
>>> > +                smap_add(&new, "gateway-chassis", chassis);
>>> > +            }
>>> > +            sbrec_port_binding_set_options(op->sb, &new);
>>> > +            smap_destroy(&new);
>>> >          }
>>> >          sbrec_port_binding_set_parent_port(op->sb,
>>> op->nbs->parent_name);
>>> >          sbrec_port_binding_set_tag(op->sb, op->nbs->tag,
>>> op->nbs->n_tag);
>>> > diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
>>> > index 8163f6a..fa21b30 100644
>>> > --- a/ovn/ovn-nb.ovsschema
>>> > +++ b/ovn/ovn-nb.ovsschema
>>> > @@ -1,7 +1,7 @@
>>> >  {
>>> >      "name": "OVN_Northbound",
>>> > -    "version": "2.1.1",
>>> > -    "cksum": "2615511875 5108",
>>> > +    "version": "2.1.2",
>>> > +    "cksum": "429668869 5325",
>>> >      "tables": {
>>> >          "Logical_Switch": {
>>> >              "columns": {
>>> > @@ -78,6 +78,11 @@
>>> >                                     "max": "unlimited"}},
>>> >                  "default_gw": {"type": {"key": "string", "min": 0,
>>> "max":
>>> > 1}},
>>> >                  "enabled": {"type": {"key": "boolean", "min": 0,
>>> "max":
>>> > 1}},
>>> > +                "options": {
>>> > +                     "type": {"key": "string",
>>> > +                              "value": "string",
>>> > +                              "min": 0,
>>> > +                              "max": "unlimited"}},
>>> >                  "external_ids": {
>>> >                      "type": {"key": "string", "value": "string",
>>> >                               "min": 0, "max": "unlimited"}}},
>>> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
>>> > index d7fd595..d239499 100644
>>> > --- a/ovn/ovn-nb.xml
>>> > +++ b/ovn/ovn-nb.xml
>>> > @@ -630,6 +630,21 @@
>>> >        column is set to <code>false</code>, the router is disabled.  A
>>> > disabled
>>> >        router has all ingress and egress traffic dropped.
>>> >      </column>
>>> > +
>>> > +    <group title="Options">
>>> > +      <p>
>>> > +        Additional options for the logical router.
>>> > +      </p>
>>> > +
>>> > +      <column name="options" key="chassis">
>>> > +        If set, indicates that the logical router in question is
>>> > +        non-distributed and resides in the set chassis. The same
>>> > +        value is also used by <code>ovn-controller</code> to
>>> > +        uniquely identify the chassis in the OVN deployment and
>>> > +        comes from <code>external_ids:system-id</code> in the
>>> > +        <code>Open_vSwitch</code> table of Open_vSwitch database.
>>> > +      </column>
>>> > +    </group>
>>> >
>>> >      <group title="Common Columns">
>>> >        <column name="external_ids">
>>> > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
>>> > index efd2f9a..741228c 100644
>>> > --- a/ovn/ovn-sb.xml
>>> > +++ b/ovn/ovn-sb.xml
>>> > @@ -1220,7 +1220,12 @@ tcp.flags = RST;
>>> >        which
>>> <code>ovn-controller</code>/<code>ovn-controller-vtep</code>
>>> > in
>>> >        turn finds out by monitoring the local hypervisor's Open_vSwitch
>>> >        database, which identifies logical ports via the conventions
>>> > described
>>> > -      in <code>IntegrationGuide.md</code>.
>>> > +      in <code>IntegrationGuide.md</code>. (The exceptions are for
>>> > +      <code>Port_Binding</code> records of <code>type</code>
>>> 'gateway',
>>> > +      whose locations are identified by <code>ovn-northd</code> via
>>> > +      the <code>options:gateway-chassis</code> column in this table.
>>> > +      <code>ovn-controller</code> is still responsible to populate the
>>> > +      <code>chassis</code> column.)
>>> >      </p>
>>> >
>>> >      <p>
>>> > @@ -1298,6 +1303,14 @@ tcp.flags = RST;
>>> >              a logical router to a logical switch or to another logical
>>> > router.
>>> >            </dd>
>>> >
>>> > +          <dt><code>gateway</code></dt>
>>> > +          <dd>
>>> > +            One of a pair of logical ports that act as if connected
>>> by a
>>> > patch
>>> > +            cable across multiple chassis.  Useful for connecting a
>>> > logical
>>> > +            switch with a gateway router (which is only resident on a
>>> > +            particular chassis).
>>> > +          </dd>
>>> > +
>>> >            <dt><code>localnet</code></dt>
>>> >            <dd>
>>> >              A connection to a locally accessible network from each
>>> > @@ -1336,6 +1349,26 @@ tcp.flags = RST;
>>> >        </column>
>>> >      </group>
>>> >
>>> > +    <group title="Gateway Options">
>>> > +      <p>
>>> > +        These options apply to logical ports with <ref
>>> column="type"/> of
>>> > +        <code>gateway</code>.
>>> > +      </p>
>>> > +
>>> > +      <column name="options" key="peer">
>>> > +        The <ref column="logical_port"/> in the <ref
>>> > table="Port_Binding"/>
>>> > +        record for the other side of the 'gateway' port.  The named
>>> <ref
>>> > +        column="logical_port"/> must specify this <ref
>>> > column="logical_port"/>
>>> > +        in its own <code>peer</code> option.  That is, the two
>>> 'gateway'
>>> > +        logical ports must have reversed <ref column="logical_port"/>
>>> and
>>> > +        <code>peer</code> values.
>>> > +      </column>
>>> > +
>>> > +      <column name="options" key="gateway-chassis">
>>> > +        The <code>chassis</code> in which the port resides.
>>> > +      </column>
>>> > +    </group>
>>> > +
>>> >      <group title="Localnet Options">
>>> >        <p>
>>> >          These options apply to logical ports with <ref
>>> column="type"/> of
>>> > diff --git a/tests/ovn.at b/tests/ovn.at
>>> > index a827b71..9d93064 100644
>>> > --- a/tests/ovn.at
>>> > +++ b/tests/ovn.at
>>> > @@ -2848,3 +2848,187 @@ OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>>> >  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>> >
>>> >  AT_CLEANUP
>>> > +
>>> > +
>>> > +AT_SETUP([ovn -- 2 HVs, 2 LRs connected via LS, gateway router])
>>> > +AT_KEYWORDS([ovngatewayrouter])
>>> > +AT_SKIP_IF([test $HAVE_PYTHON = no])
>>> > +ovn_start
>>> > +
>>> > +# Logical network:
>>> > +# Two LRs - R1 and R2 that are connected to each other via LS "join"
>>> > +# in 20.0.0.0/24 network. R1 has switchess foo (192.168.1.0/24)
>>> > +# connected to it. R2 has alice (172.16.1.0/24) connected to it.
>>> > +# R2 is a gateway router.
>>> > +
>>> > +
>>> > +
>>> > +# Create two hypervisor and create OVS ports corresponding to logical
>>> > ports.
>>> > +net_add n1
>>> > +
>>> > +sim_add hv1
>>> > +as hv1
>>> > +ovs-vsctl add-br br-phys
>>> > +ovn_attach n1 br-phys 192.168.0.1
>>> > +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>>> > +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>>> > +    options:tx_pcap=hv1/vif1-tx.pcap \
>>> > +    options:rxq_pcap=hv1/vif1-rx.pcap \
>>> > +    ofport-request=1
>>> > +
>>> > +
>>> > +sim_add hv2
>>> > +as hv2
>>> > +ovs-vsctl add-br br-phys
>>> > +ovn_attach n1 br-phys 192.168.0.2
>>> > +ovs-vsctl -- add-port br-int hv2-vif1 -- \
>>> > +    set interface hv2-vif1 external-ids:iface-id=alice1 \
>>> > +    options:tx_pcap=hv2/vif1-tx.pcap \
>>> > +    options:rxq_pcap=hv2/vif1-rx.pcap \
>>> > +    ofport-request=1
>>> > +
>>> > +# Pre-populate the hypervisors' ARP tables so that we don't lose any
>>> > +# packets for ARP resolution (native tunneling doesn't queue packets
>>> > +# for ARP resolution).
>>> > +ovn_populate_arp
>>> > +
>>> > +ovn-nbctl create Logical_Router name=R1
>>> > +ovn-nbctl create Logical_Router name=R2 options:chassis="hv2"
>>> > +
>>> > +ovn-nbctl lswitch-add foo
>>> > +ovn-nbctl lswitch-add alice
>>> > +ovn-nbctl lswitch-add join
>>> > +
>>> > +# Connect foo to R1
>>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=foo \
>>> > +network=192.168.1.1/24 mac=\"00:00:01:01:02:03\" -- add
>>> Logical_Router
>>> > R1 \
>>> > +ports @lrp -- lport-add foo rp-foo
>>> > +
>>> > +ovn-nbctl set Logical_port rp-foo type=router options:router-port=foo
>>> \
>>> > +addresses=\"00:00:01:01:02:03\"
>>> > +
>>> > +# Connect alice to R2
>>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=alice \
>>> > +network=172.16.1.1/24 mac=\"00:00:02:01:02:03\" -- add
>>> Logical_Router R2
>>> > \
>>> > +ports @lrp -- lport-add alice rp-alice
>>> > +
>>> > +ovn-nbctl set Logical_port rp-alice type=router
>>> options:router-port=alice
>>> > \
>>> > +addresses=\"00:00:02:01:02:03\"
>>> > +
>>> > +
>>> > +# Connect R1 to join
>>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R1_join \
>>> > +network=20.0.0.1/24 mac=\"00:00:04:01:02:03\" -- add Logical_Router
>>> R1 \
>>> > +ports @lrp -- lport-add join r1-join
>>> > +
>>> > +ovn-nbctl set Logical_port r1-join type=router
>>> > options:router-port=R1_join \
>>> > +addresses='"00:00:04:01:02:03"'
>>> > +
>>> > +# Connect R2 to join
>>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R2_join \
>>> > +network=20.0.0.2/24 mac=\"00:00:04:01:02:04\" -- add Logical_Router
>>> R2 \
>>> > +ports @lrp -- lport-add join r2-join
>>> > +
>>> > +ovn-nbctl set Logical_port r2-join type=router
>>> > options:router-port=R2_join \
>>> > +addresses='"00:00:04:01:02:04"'
>>> > +
>>> > +
>>> > +#install static routes
>>> > +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>>> > +ip_prefix=172.16.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \
>>> > +R1 static_routes @lrt
>>> > +
>>> > +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>>> > +ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \
>>> > +R2 static_routes @lrt
>>> > +
>>> > +# Create logical port foo1 in foo
>>> > +ovn-nbctl lport-add foo foo1 \
>>> > +-- lport-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>>> > +
>>> > +# Create logical port alice1 in alice
>>> > +ovn-nbctl lport-add alice alice1 \
>>> > +-- lport-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2"
>>> > +
>>> > +
>>> > +# Allow some time for ovn-northd and ovn-controller to catch up.
>>> > +# XXX This should be more systematic.
>>> > +sleep 2
>>> > +
>>> > +ip_to_hex() {
>>> > +    printf "%02x%02x%02x%02x" "$@"
>>> > +}
>>> > +trim_zeros() {
>>> > +    sed 's/\(00\)\{1,\}$//'
>>> > +}
>>> > +
>>> > +# Send ip packets between foo1 and alice1
>>> > +src_mac="f00000010203"
>>> > +dst_mac="000001010203"
>>> > +src_ip=`ip_to_hex 192 168 1 2`
>>> > +dst_ip=`ip_to_hex 172 16 1 2`
>>> >
>>> >
>>> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>>> > +
>>> > +echo "---------NB dump-----"
>>> > +ovn-nbctl show
>>> > +echo "---------------------"
>>> > +ovn-nbctl list logical_router
>>> > +echo "---------------------"
>>> > +ovn-nbctl list logical_router_port
>>> > +echo "---------------------"
>>> > +
>>> > +echo "---------SB dump-----"
>>> > +ovn-sbctl list datapath_binding
>>> > +echo "---------------------"
>>> > +ovn-sbctl list port_binding
>>> > +echo "---------------------"
>>> > +ovn-sbctl dump-flows
>>> > +echo "---------------------"
>>> > +ovn-sbctl list chassis
>>> > +ovn-sbctl list encap
>>> > +echo "---------------------"
>>> > +
>>> > +echo "------ hv1 dump ----------"
>>> > +as hv1 ovs-ofctl show br-int
>>> > +as hv1 ovs-ofctl dump-flows br-int
>>> > +echo "------ hv2 dump ----------"
>>> > +as hv2 ovs-ofctl show br-int
>>> > +as hv2 ovs-ofctl dump-flows br-int
>>> > +echo "----------------------------"
>>> > +
>>> > +# Packet to Expect at alice1
>>> > +src_mac="000002010203"
>>> > +dst_mac="f00000010204"
>>> > +src_ip=`ip_to_hex 192 168 1 2`
>>> > +dst_ip=`ip_to_hex 172 16 1 2`
>>> >
>>> >
>>> +expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000
>>> > +
>>> > +
>>> > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>>> > +as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet
>>> > +
>>> > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap |
>>> > trim_zeros > received1.packets
>>> > +echo $expected | trim_zeros > expout
>>> > +AT_CHECK([cat received1.packets], [0], [expout])
>>> > +
>>> > +for sim in hv1 hv2; do
>>> > +    as $sim
>>> > +    OVS_APP_EXIT_AND_WAIT([ovn-controller])
>>> > +    OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>>> > +    OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>> > +done
>>> > +
>>> > +as ovn-sb
>>> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>> > +
>>> > +as ovn-nb
>>> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>> > +
>>> > +as northd
>>> > +OVS_APP_EXIT_AND_WAIT([ovn-northd])
>>> > +
>>> > +as main
>>> > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>>> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>> > +
>>> > +AT_CLEANUP
>>> > --
>>> > 1.9.1
>>> >
>>> > _______________________________________________
>>> > dev mailing list
>>> > dev at openvswitch.org
>>> > http://openvswitch.org/mailman/listinfo/dev
>>> >
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
>>>
>>
>>
>



More information about the dev mailing list