[ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

Guru Shetty guru at ovn.org
Tue May 10 14:38:48 UTC 2016


On 9 May 2016 at 14:51, Darrell Ball <dlu998 at gmail.com> wrote:

> I have some initial clarifications first before review
>
> On Fri, May 6, 2016 at 9:21 AM, Gurucharan Shetty <guru at ovn.org> wrote:
>
>> Currently we can connect routers via "peer"ing. This limits
>> the number of routers that can be connected with each other
>> directly to 2.
>>
>
>
> This sounds like you are saying that cannot have a topology like
>
> R1-------R2
>  \           /
>   \        /
>    \      /
>     \   /
>      R3
>
> which is common.
> Did I read the above comment correctly ?
>

The above drawn topology will start getting complex with more gateways. (In
case of k8s each machine is a l3 gateway.). I agree that my usage of word
"directly" may be confusing. I meant routers connected to each other in the
same subnet.


>
>
>
>>
>> One of the design goals for L3 Gateway is to be able to
>> have multiple gateways (each with their own router)
>> connected to a distributed router via a switch.
>>
>
> Its not ideal to have a switch required to connect routers - it somewhat
> defeats the advantages
> of having routers in the first place.
>

I do not have much experience with real world networking setups. There are
probably reasons why your statement is true. From my perspective , all I
see is that a switch is used to connect multiple endpoints in the same
subnet and If I want multiple routers connected to each other in the same
subnet, I need to use a switch. VMware NSX uses something similar in
customer deployments, so I believe it is not out of ordinary to do
something like this.


> This is usually only done if there is existing switching equipment than
> must be traversed
> But in the case, we are just dealing with software where we have total
> flexibility.
>

>From OpenStack perspective, each tenant gets a router and multiple switches
with different subnets. The idea is that the OpenStack network plugin, can
at best split this single tenant router into 2 with a switch in between on
a  subnet that neutron does not use. Taking the same logic forward for k8s,
I can support multiple gateways.

Connecting multiple routers to each other without a switch makes it a pain
to remember the interconnections and create multiple subnets (which may
simply not be available for a networking backend for internal use).


>
>
>>
>> With the above goal in mind, this commit gives the general
>> ability to connect multiple routers via a switch.
>>
>> Signed-off-by: Gurucharan Shetty <guru at ovn.org>
>> ---
>> This needs the following 2 commits under review to
>> first go in.
>> 1. ofproto-dpif: Rename "recurse" to "indentation".
>> 2. ofproto-dpif: Do not count resubmit to later tables against limit.
>> ---
>>  ovn/controller/lflow.c  |   19 ++--
>>  ovn/northd/ovn-northd.c |   56 ++++++++++-
>>  ovn/ovn-nb.xml          |    7 --
>>  tests/ovn.at            |  236
>> +++++++++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 299 insertions(+), 19 deletions(-)
>>
>> diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
>> index 96b7c66..efc427d 100644
>> --- a/ovn/controller/lflow.c
>> +++ b/ovn/controller/lflow.c
>> @@ -215,15 +215,15 @@ add_logical_flows(struct controller_ctx *ctx, const
>> struct lport_index *lports,
>>          if (is_switch(ldp)) {
>>              /* For a logical switch datapath, local_datapaths tells us
>> if there
>>               * are any local ports for this datapath.  If not, we can
>> skip
>> -             * processing logical flows if the flow belongs to egress
>> pipeline
>> -             * or if that logical switch datapath is not patched to any
>> logical
>> -             * router.
>> +             * processing logical flows if that logical switch datapath
>> is not
>> +             * patched to any logical router.
>>               *
>> -             * Otherwise, we still need the ingress pipeline because
>> even if
>> -             * there are no local ports, we still may need to execute
>> the ingress
>> -             * pipeline after a packet leaves a logical router.  Further
>> -             * optimization is possible, but not based on what we know
>> with
>> -             * local_datapaths right now.
>> +             * Otherwise, we still need both ingress and egress pipeline
>> +             * because even if there are no local ports, we still may
>> need to
>> +             * execute the ingress pipeline after a packet leaves a
>> logical
>> +             * router and we need to do egress pipeline for a switch that
>> +             * is connected to only routers.  Further optimization is
>> possible,
>> +             * but not based on what we know with local_datapaths right
>> now.
>>               *
>>               * A better approach would be a kind of "flood fill"
>> algorithm:
>>               *
>> @@ -242,9 +242,6 @@ add_logical_flows(struct controller_ctx *ctx, const
>> struct lport_index *lports,
>>               */
>>
>>              if (!get_local_datapath(local_datapaths, ldp->tunnel_key)) {
>> -                if (!ingress) {
>> -                    continue;
>> -                }
>>
>
> This is removing a previous change that was done for some optimization for
> avoiding
> unnecessary flow creation.
> I am not sure about the process here, but maybe this should be tracked as
> a separate
> patch ?
>
The above change is needed for this patch to work. The optimization above
assumes that a switch always has atleast one real physical endpoint. With
this change, since a switch can only be connected to routers, we need to
remove the optimization. The optimization itself will need more careful
consideration and with more information provided to ovn-controller, it can
ideally be improved, but I would ideally not want l3 gateway work delayed
because of it.



>
>
>
>
>>                  if (!get_patched_datapath(patched_datapaths,
>>                                            ldp->tunnel_key)) {
>>                      continue;
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 9e03606..3a29770 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -2110,7 +2110,13 @@ build_lrouter_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>                  free(actions);
>>                  free(match);
>>              }
>> -        } else if (op->od->n_router_ports) {
>> +        } else if (op->od->n_router_ports && strcmp(op->nbs->type,
>> "router")) {
>> +            /* This is a logical switch port that backs a VM or a
>> container.
>> +             * Extract its addresses. For each of the address, go
>> through all
>> +             * the router ports attached to the switch (to which this
>> port
>> +             * connects) and if the address in question is reachable
>> from the
>> +             * router port, add an ARP entry in that router's pipeline.
>> */
>> +
>>              for (size_t i = 0; i < op->nbs->n_addresses; i++) {
>>                  struct lport_addresses laddrs;
>>                  if (!extract_lport_addresses(op->nbs->addresses[i],
>> &laddrs,
>> @@ -2158,8 +2164,56 @@ build_lrouter_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>
>>                  free(laddrs.ipv4_addrs);
>>              }
>> +        } else if (!strcmp(op->nbs->type, "router")) {
>> +            /* This is a logical switch port that connects to a router.
>> */
>> +
>> +            /* The peer of this switch port is the router port for which
>> +             * we need to add logical flows such that it can resolve
>> +             * ARP entries for all the other router ports connected to
>> +             * the switch in question. */
>> +
>> +            const char *peer_name = smap_get(&op->nbs->options,
>> +                                             "router-port");
>> +            if (!peer_name) {
>> +                continue;
>> +            }
>> +
>> +            struct ovn_port *peer = ovn_port_find(ports, peer_name);
>> +            if (!peer || !peer->nbr || !peer->ip) {
>> +                continue;
>> +            }
>> +
>> +            for (size_t j = 0; j < op->od->n_router_ports; j++) {
>> +                const char *router_port_name = smap_get(
>> +
>> &op->od->router_ports[j]->nbs->options,
>> +                                    "router-port");
>> +                struct ovn_port *router_port = ovn_port_find(ports,
>> +
>>  router_port_name);
>> +                if (!router_port || !router_port->nbr ||
>> !router_port->ip) {
>> +                    continue;
>> +                }
>> +
>> +                /* Skip the router port under consideration. */
>> +                if (router_port == peer) {
>> +                   continue;
>> +                }
>> +
>> +                if (!router_port->ip) {
>> +                    continue;
>> +                }
>> +                char *match = xasprintf("outport == %s && reg0 ==
>> "IP_FMT,
>> +                                        peer->json_key,
>> +                                        IP_ARGS(router_port->ip));
>> +                char *actions = xasprintf("eth.dst = "ETH_ADDR_FMT";
>> next;",
>> +
>> ETH_ADDR_ARGS(router_port->mac));
>> +                ovn_lflow_add(lflows, peer->od, S_ROUTER_IN_ARP_RESOLVE,
>> +                              101, match, actions);
>> +                free(actions);
>> +                free(match);
>> +            }
>>          }
>>      }
>> +
>>      HMAP_FOR_EACH (od, key_node, datapaths) {
>>          if (!od->nbr) {
>>              continue;
>> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
>> index c01455d..d7fd595 100644
>> --- a/ovn/ovn-nb.xml
>> +++ b/ovn/ovn-nb.xml
>> @@ -154,13 +154,6 @@
>>            These options apply when <ref column="type"/> is
>> <code>router</code>.
>>          </p>
>>
>> -        <p>
>> -          If a given logical switch has multiple <code>router</code>
>> ports, the
>> -          <ref table="Logical_Router_Port"/> rows that they reference
>> must be
>> -          all on the same <ref table="Logical_Router"/> (for different
>> -          subnets).
>> -        </p>
>> -
>>          <column name="options" key="router-port">
>>            Required.  The <ref column="name"/> of the <ref
>>            table="Logical_Router_Port"/> to which this logical switch
>> port is
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index b9d7ada..6961be0 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -2545,3 +2545,239 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>
>>  AT_CLEANUP
>>
>> +AT_SETUP([ovn -- 2 HVs, 3 LRs connected via LS, static routes])
>> +AT_KEYWORDS([ovnstaticroutes])
>> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>> +ovn_start
>> +
>> +# Logical network:
>> +# Three LRs - R1, R2 and R3 that are connected to each other via LS
>> "join"
>> +# in 20.0.0.0/24 network. R1 has switchess foo (192.168.1.0/24)
>> +# connected to it. R2 has alice (172.16.1.0/24) and R3 has bob (
>> 10.32.1.0/24)
>> +# connected to it.
>> +
>> +ovn-nbctl create Logical_Router name=R1
>> +ovn-nbctl create Logical_Router name=R2
>> +ovn-nbctl create Logical_Router name=R3
>> +
>> +ovn-nbctl lswitch-add foo
>> +ovn-nbctl lswitch-add alice
>> +ovn-nbctl lswitch-add bob
>> +ovn-nbctl lswitch-add join
>> +
>> +# Connect foo to R1
>> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=foo \
>> +network=192.168.1.1/24 mac=\"00:00:01:01:02:03\" -- add Logical_Router
>> R1 \
>> +ports @lrp -- lport-add foo rp-foo
>> +
>> +ovn-nbctl set Logical_port rp-foo type=router options:router-port=foo \
>> +addresses=\"00:00:01:01:02:03\"
>> +
>> +# Connect alice to R2
>> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=alice \
>> +network=172.16.1.1/24 mac=\"00:00:02:01:02:03\" -- add Logical_Router
>> R2 \
>> +ports @lrp -- lport-add alice rp-alice
>> +
>> +ovn-nbctl set Logical_port rp-alice type=router
>> options:router-port=alice \
>> +addresses=\"00:00:02:01:02:03\"
>> +
>> +# Connect bob to R3
>> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=bob \
>> +network=10.32.1.1/24 mac=\"00:00:03:01:02:03\" -- add Logical_Router R3
>> \
>> +ports @lrp -- lport-add bob rp-bob
>> +
>> +ovn-nbctl set Logical_port rp-bob type=router options:router-port=bob \
>> +addresses=\"00:00:03:01:02:03\"
>> +
>> +# Connect R1 to join
>> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R1_join \
>> +network=20.0.0.1/24 mac=\"00:00:04:01:02:03\" -- add Logical_Router R1 \
>> +ports @lrp -- lport-add join r1-join
>> +
>> +ovn-nbctl set Logical_port r1-join type=router
>> options:router-port=R1_join \
>> +addresses='"00:00:04:01:02:03"'
>> +
>> +# Connect R2 to join
>> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R2_join \
>> +network=20.0.0.2/24 mac=\"00:00:04:01:02:04\" -- add Logical_Router R2 \
>> +ports @lrp -- lport-add join r2-join
>> +
>> +ovn-nbctl set Logical_port r2-join type=router
>> options:router-port=R2_join \
>> +addresses='"00:00:04:01:02:04"'
>> +
>> +
>> +# Connect R3 to join
>> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R3_join \
>> +network=20.0.0.3/24 mac=\"00:00:04:01:02:05\" -- add Logical_Router R3 \
>> +ports @lrp -- lport-add join r3-join
>> +
>> +ovn-nbctl set Logical_port r3-join type=router
>> options:router-port=R3_join \
>> +addresses='"00:00:04:01:02:05"'
>> +
>> +#install static routes
>> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>> +ip_prefix=172.16.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \
>> +R1 static_routes @lrt
>> +
>> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>> +ip_prefix=10.32.1.0/24 nexthop=20.0.0.3 -- add Logical_Router \
>> +R1 static_routes @lrt
>> +
>> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>> +ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \
>> +R2 static_routes @lrt
>> +
>> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>> +ip_prefix=10.32.1.0/24 nexthop=20.0.0.3 -- add Logical_Router \
>> +R2 static_routes @lrt
>> +
>> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>> +ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \
>> +R3 static_routes @lrt
>> +
>> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \
>> +ip_prefix=172.16.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \
>> +R3 static_routes @lrt
>> +
>> +# Create logical port foo1 in foo
>> +ovn-nbctl lport-add foo foo1 \
>> +-- lport-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>> +
>> +# Create logical port alice1 in alice
>> +ovn-nbctl lport-add alice alice1 \
>> +-- lport-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2"
>> +
>> +# Create logical port bob1 in bob
>> +ovn-nbctl lport-add bob bob1 \
>> +-- lport-set-addresses bob1 "f0:00:00:01:02:05 10.32.1.2"
>> +
>> +# Create two hypervisor and create OVS ports corresponding to logical
>> ports.
>> +net_add n1
>> +
>> +sim_add hv1
>> +as hv1
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>> +    options:tx_pcap=hv1/vif1-tx.pcap \
>> +    options:rxq_pcap=hv1/vif1-rx.pcap \
>> +    ofport-request=1
>> +
>> +ovs-vsctl -- add-port br-int hv1-vif2 -- \
>> +    set interface hv1-vif2 external-ids:iface-id=alice1 \
>> +    options:tx_pcap=hv1/vif2-tx.pcap \
>> +    options:rxq_pcap=hv1/vif2-rx.pcap \
>> +    ofport-request=2
>> +
>> +sim_add hv2
>> +as hv2
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.2
>> +ovs-vsctl -- add-port br-int hv2-vif1 -- \
>> +    set interface hv2-vif1 external-ids:iface-id=bob1 \
>> +    options:tx_pcap=hv2/vif1-tx.pcap \
>> +    options:rxq_pcap=hv2/vif1-rx.pcap \
>> +    ofport-request=1
>> +
>> +
>> +# Pre-populate the hypervisors' ARP tables so that we don't lose any
>> +# packets for ARP resolution (native tunneling doesn't queue packets
>> +# for ARP resolution).
>> +ovn_populate_arp
>> +
>> +# Allow some time for ovn-northd and ovn-controller to catch up.
>> +# XXX This should be more systematic.
>> +sleep 1
>> +
>> +ip_to_hex() {
>> +    printf "%02x%02x%02x%02x" "$@"
>> +}
>> +trim_zeros() {
>> +    sed 's/\(00\)\{1,\}$//'
>> +}
>> +
>> +# Send ip packets between foo1 and alice1
>> +src_mac="f00000010203"
>> +dst_mac="000001010203"
>> +src_ip=`ip_to_hex 192 168 1 2`
>> +dst_ip=`ip_to_hex 172 16 1 2`
>>
>> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet
>> +
>> +# Send ip packets between foo1 and bob1
>> +src_mac="f00000010203"
>> +dst_mac="000001010203"
>> +src_ip=`ip_to_hex 192 168 1 2`
>> +dst_ip=`ip_to_hex 10 32 1 2`
>>
>> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +
>> +echo "---------NB dump-----"
>> +ovn-nbctl show
>> +echo "---------------------"
>> +ovn-nbctl list logical_router
>> +echo "---------------------"
>> +ovn-nbctl list logical_router_port
>> +echo "---------------------"
>> +
>> +echo "---------SB dump-----"
>> +ovn-sbctl list datapath_binding
>> +echo "---------------------"
>> +ovn-sbctl list port_binding
>> +echo "---------------------"
>> +ovn-sbctl dump-flows
>> +echo "---------------------"
>> +
>> +echo "------ hv1 dump ----------"
>> +as hv1 ovs-ofctl show br-int
>> +as hv1 ovs-ofctl dump-flows br-int
>> +echo "------ hv2 dump ----------"
>> +as hv2 ovs-ofctl show br-int
>> +as hv2 ovs-ofctl dump-flows br-int
>> +echo "----------------------------"
>> +
>> +# Packet to Expect at bob1
>> +src_mac="000003010203"
>> +dst_mac="f00000010205"
>> +src_ip=`ip_to_hex 192 168 1 2`
>> +dst_ip=`ip_to_hex 10 32 1 2`
>>
>> +expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000
>> +
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap |
>> trim_zeros > received.packets
>> +echo $expected | trim_zeros > expout
>> +AT_CHECK([cat received.packets], [0], [expout])
>> +
>> +# Packet to Expect at alice1
>> +src_mac="000002010203"
>> +dst_mac="f00000010204"
>> +src_ip=`ip_to_hex 192 168 1 2`
>> +dst_ip=`ip_to_hex 172 16 1 2`
>>
>> +expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000
>> +
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/vif2-tx.pcap |
>> trim_zeros > received1.packets
>> +echo $expected | trim_zeros > expout
>> +AT_CHECK([cat received1.packets], [0], [expout])
>> +
>> +for sim in hv1 hv2; do
>> +    as $sim
>> +    OVS_APP_EXIT_AND_WAIT([ovn-controller])
>> +    OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>> +    OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> +done
>> +
>> +as ovn-sb
>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> +
>> +as ovn-nb
>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> +
>> +as northd
>> +OVS_APP_EXIT_AND_WAIT([ovn-northd])
>> +
>> +as main
>> +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
>> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>> +
>> +AT_CLEANUP
>> --
>> 1.7.9.5
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>



More information about the dev mailing list