[ovs-dev] [PATCH ovn v6 2/4] northd, utils: support for RouteTables in LRs

Odintsov Vladislav VlOdintsov at croc.ru
Sat Nov 13 09:48:10 UTC 2021


Thanks Numan,

I’ve forgotten to update description according to new changes.
A new patch version v8 was submitted:
https://patchwork.ozlabs.org/project/ovn/cover/20211113094353.17690-1-odivlad@gmail.com/
Please check it out.

Regards,
Vladislav Odintsov

On 12 Nov 2021, at 23:07, Numan Siddique <numans at ovn.org<mailto:numans at ovn.org>> wrote:

On Thu, Nov 11, 2021 at 2:18 PM Odintsov Vladislav <VlOdintsov at croc.ru<mailto:VlOdintsov at croc.ru>> wrote:

Hi Han, Numan,

I’ve posted a new version of this series [1] and addressed your comments and suggestions.
I’ll appreciate if you can take a look on this please and I’d be happy if this can be included in next OVN release.

Thanks.

Thanks Vladislav,

I'll take a look at the patches next week.

Numan


1: https://patchwork.ozlabs.org/project/ovn/cover/20211111191306.6369-1-odivlad@gmail.com/

Regards,
Vladislav Odintsov

On 20 Oct 2021, at 10:09, Han Zhou <hzhou at ovn.org<mailto:hzhou at ovn.org><mailto:hzhou at ovn.org>> wrote:




On Tue, Oct 19, 2021 at 3:21 PM Numan Siddique <numans at ovn.org<mailto:numans at ovn.org><mailto:numans at ovn.org>> wrote:

On Tue, Oct 19, 2021 at 3:28 AM Han Zhou <hzhou at ovn.org<mailto:hzhou at ovn.org><mailto:hzhou at ovn.org>> wrote:

On Mon, Oct 18, 2021 at 6:35 AM Odintsov Vladislav <VlOdintsov at croc.ru<mailto:VlOdintsov at croc.ru><mailto:VlOdintsov at croc.ru>>
wrote:



regards,
Vladislav Odintsov

On 16 Oct 2021, at 03:20, Han Zhou <hzhou at ovn.org<mailto:hzhou at ovn.org><mailto:hzhou at ovn.org>> wrote:

On Fri, Oct 15, 2021 at 2:36 AM Vladislav Odintsov <odivlad at gmail.com<mailto:odivlad at gmail.com><mailto:odivlad at gmail.com>>
wrote:



Regards,
Vladislav Odintsov

On 15 Oct 2021, at 08:42, Han Zhou <hzhou at ovn.org<mailto:hzhou at ovn.org><mailto:hzhou at ovn.org>> wrote:

On Thu, Oct 14, 2021 at 12:58 AM Vladislav Odintsov <odivlad at gmail.com<mailto:odivlad at gmail.com><mailto:odivlad at gmail.com>>
wrote:


Hi Han,

Thanks for the review.

Regards,
Vladislav Odintsov

On 14 Oct 2021, at 08:13, Han Zhou <hzhou at ovn.org<mailto:hzhou at ovn.org><mailto:hzhou at ovn.org>> wrote:



On Tue, Oct 5, 2021 at 1:26 PM Vladislav Odintsov <odivlad at gmail.com<mailto:odivlad at gmail.com><mailto:odivlad at gmail.com>>

wrote:


This patch extends Logical Router's routing functionality.
Now user may create multiple routing tables within a Logical Router
and assign them to Logical Router Ports.

Traffic coming from Logical Router Port with assigned route_table
is checked against global routes if any (Logical_Router_Static_Routes
whith empty route_table field), next against directly connected routes


This is not accurate. The "directly connected routes" is NOT after the

global routes. Their priority only depends on the prefix length.


and then Logical_Router_Static_Routes with same route_table value as
in Logical_Router_Port options:route_table field.

A new Logical Router ingress table #10 is added - IN_IP_ROUTING_PRE.
In this table packets which come from LRPs with configured
options:route_table field are checked against inport and in OVS
register 7 unique non-zero value identifying route table is written.

Then in 11th table IN_IP_ROUTING routes which have non-empty
`route_table` field are added with additional match on reg7 value
associated with appropriate route_table.


Hi Vladislav,

First of all, sorry for the delayed review, and thanks for implementing

this new feature.


I have some questions regarding the feature itself. I remember that we

had some discussion earlier for this feature, but it seems I
misunderstood
the feature you are implementing here. I thought by multiple routing
tables
you were trying to support something like VRF in physical routers, but
this
seems to be something different. According to your implementation,
instead
of assigning LRPs to different routing tables, a LRP can actually
participate to both the global routing table and the table with a
specific
ID. For ingress, global routes are prefered over other routes; for
egress
(i.e. forwarding to the next hop), it doesn't even enforce any table-id
check, so packet routed by a entry of table-X can go out of a LRP with
table-id Y. Is my understanding correct about the desired behavior of
this
feature?



Yes, your understanding is correct.
This is not VRF. At first glance VRF can be done just by using
LR-per-VRF

without any code modifications (maybe there are corner cases, but it’s
not
something I’m currently working on).

I agree VRF can be achieved by just separate LRs. I am trying to
understand
why multiple LRs wouldn't satisfy your use case here.

LRP can be optionally assigned to specific routing table name. This
means

that for LR ingress pipeline packets coming from this specific LRP
would be
checked against routes with same route_table value within appropriate
LR.
This is some kind of PBR, analog of "ip rule add iif <interface name>
lookup <id>".

There is one specific use-case, which requires special handling:

directly-connected routes (subnet CIDRs from connected to this LR
LRPs).
These routes can’t be added manually by user, though routing between
LRPs
belonging to different routing tables is still needed. So, these routes
should be added to global routing table to override routing for LRPs
with
configured routing tables. If for some reason user wants to prohibit IP
connectivity to any LRP (honestly, I can’t imagine, why), it can be
done by
utilising OVN PBR with drop action or a route with "discard" nexthop.
But
I’d think in this case whether this LRP is needed in this LR.

Also, if user wants to create routes, which apply for all LRPs from all

routing tables, it can be done by adding new entries to global routing
table.

And the third reason to have global routing table - a fully

backward-compatible solution. All users, who don’t need multiple
routing
tables support just continue using static routing in the same manner
without any changes.



If this is correct, it doesn't seem to be a common/standard requirement

(or please educate me if I am wrong). Could you explain a little more
about
the actual use cases: what kind of real world problems need to be
solved by
this feature or how are you going to use this. For example, why would a
port need to participate in both routing tables? It looks like what you
really need is policy routing instead of multiple isolated routing
tables.
I understand that you already use policy routing to implement ACLs, so
it
is not convenient to combine other policies (e.g. inport based routing)
into the policy routing stage. If that's the case, would it be more
generic
to support multiple policy routing stages? My concern to the current
approach is that it is implemented for a very special use case. It
makes
the code more complex but when there is a slightly different
requirement in
the future it becomes insufficient. I am thinking that policy routing
seems
more flexible and has more potential to be made more generic. Maybe I
will
have a better understanding when I hear more detailed use cases and
considerations from you.



I can't agree here in it’s uncommon requirement.
This implementation was inspired by AWS Route Tables feature [1]:
within

a VPC (LR in terms of OVN) user may create multiple routing tables and
assign them to different subnets (LRPs) in multiple availability zones.
Auto-generated directly-connected routes from LRPs CIDRs are working
in the
same manner as they do in AWS - apply to all subnets regardless of
their
association to routing table. GCP has similar behaviour: [2], Azure, I
guess, too.

Our public cloud (CROC Cloud Platform) supports AWS behaviour [3], so I

primarily was oriented on it. Internally we already use this feature
and
currently it fits our use-case, but again I can't say it is specific.

If it is for AWS/GCP alike routing features, then from what I
understand
what's implemented in this patch is a little different. To implement a
VPC
model in OVN, I think it is ok to have multiple LRs instead of only
one LR
for each VPC. So naturally I would use a LR to map to AWS's routing
table.
In AWS's document (the link you provided), it says:

"A subnet can only be associated with one route table at a time"

So basically in AWS a subnet is either attached to the default/main
route
table or a custom table, but not both, right? However, in your use
case, a
LRP (maps to a subnet) attaches to both "main" and a custom table,
which
seems not common to me. Or did I miss something?


That’s true about AWS, but there is still a bit not accurate about OVN.
Global routing table in OVN terms is not that AWS main route table is.
Main route table is just a configuration hint for users for implicit
route
tables association with subnets.
Implicitly-associated via main routing table subnets routing functions
the
same manner as a normal explicit route_table-subnet association.

Global routing table in OVN is just a list of routes with higher
priority
than routes with configured "route_table".

I do not offer to configure both tables at the same time. But it is
_possible_ to do if required for some reason (for instance to configure
some service chaining or just internal VPC services like
metadata/another
internal APIs, access to another services).
Normally, if we talk about AWS Route Table to OVN, it is mostly
one-to-one
mapping, except "Local route":


Example:
AWS Route Table rtb-xxx:
172.31.0.0/16<http://172.31.0.0/16>: local route (VPC CIDR for subnets)
0.0.0.0/0<http://0.0.0.0/0>: igw-XXX (internet gateway)

AWS Route Table rtb-yyy:
172.31.0.0/16<http://172.31.0.0/16>: local route (VPC CIDR for subnets)
5.5.5.5/32<http://5.5.5.5/32>: instance-xxx

This maps to OVN configuration (for one LR with one subnet
172.31.0.0/24<http://172.31.0.0/24>):

implicit route (not present in logical router static routes table):
172.31.0.0/24<http://172.31.0.0/24>: LRP-subnet-xxx - has highest priority over other route
table-oriented routes and can be threat as placed in global routing
table


What do you mean by highest priority here? It all depends on the prefix
length, right? If you have a static route entry that says:
172.31.0.0./25
-> xyz, then this route will have higher priority than the directly
connected subnet.


Yes, it is true, but only for routes within “global” routing table. I
left this to save previous behaviour.
It is impossible to create a static route within any non-global routing
table which overrides directly connected routes,
because priority for “global” routes is added by 100 (though, it should
add >2*128 to support same behavior for IPv6 as you already pointed).



Normal static routes:
ip_prefix: 0.0.0.0/0<http://0.0.0.0/0>, nexthop: <IP for edge LR’s LRP>, route_table:
rtb-xxx
ip_prefix: 5.5.5.5/32<http://5.5.5.5/32>, nexthop: <IP of some LSP, which belongs to
VM/container via which route is built>, route_table: rtb-yyy

I guess, I understood the reason for misunderstanding: the global
routing
table, which I referred earlier is a routing table which has no value
in
"route_table" field and directly-connected routes at the same time.
Latter
have no records in logical router static routes, but I still referred
them
as a routes from "global routing table". I can think about terminology
here, if it’s a root cause for misunderstanding. What do you think?


Thanks for explaining. In fact I understand what you mean about "global
routing table", but I think you are implementing something different
from
what AWS provides, primarily because you use a single LR instead of LR
per
routing table. I understand there are reasons why you want to do it this
way, but here I can think of some challenges of your approach. For
example,
in AWS we could do:

route table main:
subnet S0:
172.31.0.0/24<http://172.31.0.0/24>
routes:
172.31.0.0/16<http://172.31.0.0/16>: local

route table R1:
subnet S1:
172.31.1.0/24<http://172.31.1.0/24>
routes:
172.31.0.0/16<http://172.31.0.0/16>: local
172.31.0.0/24<http://172.31.0.0/24>: <some FW/GW>

Wow. It’s a brand-new behaviour for AWS, about which I was not aware, as
it appeared 1.5 months ago [1].

Previously, when I was writing this patch series in AWS such static route
was impossible to add.
You couldn’t add routes, which are the same or more specific than VPC
CIDR block.

Though in GCP you can’t create same or more specific than subnet route
[2].
I think I can try to update my patch to support AWS approach here.

Would it be simpler just don't make the global routes higher priority? We
could make it clear to the user that if they add a route in any route table
that is overlapping with any directly connected subnets, the longer prefix
would take precedence. Would it be more flexible to user?


But all this is relevant to single LR per VPC (not per route table
because of described earlier blocker).
Do you think it is reasonable with such inputs?


Packet from S1 to S0 will go to FW/GW, where it may be
dropped/forwarded to
S0/or redirected to something outside of AWS ...

While in your approach with OVN, both subnets will be under the same
logical router, and with different table IDs assigned to the LRPs and
routes. But when a VM under S1 sends a packet to S0, it will go to the
destination directly because you are prioritizing the direct-connection
routes and they are under the same global route table, and there is no
way
to force it to go through some FW/GW from the custom route table. You
can
add a route to achieve this in the global table, because in your design
the
global routes are still applicable to all LRPs and has higher priority,
but
it would impact all the LRPs/subnets, while in the AWS design above it
is
supposed to affect subnets under R1 only.

In addition, for my understanding, S0 and S1 in AWS cannot communicate
directly, unless there are some specific routes on both route tables to
route them to some gateway. In other words, there are some isolations
between route tables. However, in your design with OVN, it is more of

S0 and S1 can communicate with each other being in different route tables
without any specific setup,
the default unremovable “Local route” ensures this.
User only can override "local" route with only existing subnet’s (more
specific than "local route") CIDR
to send traffic from S0 to S1 through some specific appliance(s) in S2.
But basic IP connectivity always remains. Regardless of route tables
subnet associations.
So I’m still sure that one LR per VPC suites this scenario better,
as I can’t imagine why to place subnets, which don’t need connectivity in
one VPC.

Maybe I was wrong. I am not really familiar with AWS and I wanted to try
this myself but didn't get the time :(
If default connectivity for subnets under different route tables is a
requirement (instead of the contrary), then I'd agree that one LR per VPC
seems better, because it is more like policy routing instead of for
isolation.

policy routing without isolation at all. I am not saying that your
design
is wrong, but just saying it implements very different behavior than
AWS,
and I am trying to understand your real use cases to have a better
understanding of the feature you implemented. (You explained that you
just
wanted the AWS behavior, but it looks not exactly the case).


Will fix places where the behaviour has difference with AWS.
And thanks you for pointing such places.

Well, I think it doesn't have to be all the same as AWS, but yes if AWS
design makes more sense.




In my opinion having this feature to be implemented using PBR is less

convenient and native for users, who are familiar with behaviour for
mentioned above public cloud platforms, because configuring routes
should
be done in routes section. And adding route table property seems
native in
this route, not in PBR. Moreover, I _think_ using
Logical_Router_Static_Route to extend this feature for
OVN-Interconnection
becomes quite easy comparing to PBR (though, I didn’t try the latter).

I agree if it is just AWS -like requirement, PBR is less convenient.

I am trying to understand if it can be achieved with separate LRs. If
not,
what's special about the requirement, and is the current approach
providing
a solution common enough so that more use cases can also benefit from?
Could you clarify a little more? Thanks again.

That was our initial approach - to use separate LRs for each Route
Table.
We rejected that solution because met some difficulties and blocker.
See
below:

Brief topology description if using LRs per Route Table:
Imagine 2 subnets in VPC in 1st AZ, one in another AZ. Each subnet in
it’s
own Route Table (LR).
All subnets must have IP connectivity, so we have to somehow
interconnect
these Route Table LRs.


That's exactly the same case for AWS, right? If you want direct
connectivity, you would just put them under the same route table in AWS
(or
same LR in OVN), right?


I’m sorry, my example was not good enough.
Let’s remove AZ’s from that case as they’re not relevant to discussion.
So, we’ve got 2 subnets S0 and S1, and suppose, they’ve got different
route tables assigned to them.
If we place them in a single LR, assign different route tables, they
would have connectivity "out-of-box".
Same in AWS: because "local route" is added automatically, can’t be
removed or modified and ensures IP connectivity between all VPC subnets.

Ok, thanks for explaining.



[BLOCKER] It is impossible to place route in route table 1 via VM from
subnet assiciated to route table 2 if using per-RTB LR. Because in
RTB-2 LR
we have to add route from RTB 1 and this breaks route table isolation.
Route Table 2 LR will start looking up routes and there could be routes
from another route tables. This breaks the idea of having LR per Route
Table completely. Here we rejected this solution and moved to adding
support for route tables in OVN.


Would it be just the same problem in AWS? Why would a user route a
subnet
of RTB-1 to a VM under RTB-2, and at the same time want route table
isolation?
With the single LR solution in OVN, you would not get the isolation
between
the tables, because 1) all LRPs still share the global routing table, 2)
for output to nexthop there is no distinction between the output
interfaces.


No, such problem does not exist.
By route tables "isolation" I meant that routing rules configured in one
routing table must not affect routing in another one.
In this scenario user may want to route traffic to internet from S0
(RTB-0) through instance in S1 (RTB-1).
For instance it can be a gateway service (FW, WAF, IPS, etc) in S1, which
provides secure access for clients from VPC to the internet with NAT.
So, configuration should be:
- 0.0.0.0/0<http://0.0.0.0/0> via GW-INSTANCE in RTB-0
- 0.0.0.0/0<http://0.0.0.0/0> via edge LR in RTB-1
So, instances from all subnets with rtb-0 assigned route table will be
routed to internet through GW-INSTANCE, which can analyse/drop/do whatever
things with traffic.
This is a small example to show how different route tables can be used in
a single VPC.
With multiple OVN LRs per VPC in AZ such topology can’t be created, see:
In LR for RTB-0 we can add route 0.0.0.0/0<http://0.0.0.0/0> via <LR0-to-LR1 link IP>;
bun in LR for RTB-1 we can only add route to internet through either
edge-LR or GW-INSTANCE. Not both. So, this case can’t be implemented
currently in OVN.

Thanks for the example. It helps understanding your requirements. Why I
believe this can be achieved by current OVN with the policy routing
feature, I understand as you mentioned earlier you need policy routing for
ACL purpose. I also agree that multi-stage policy routing would seem less
user-friendly (and maybe also harder to implement), so I support the
multi-table feature for the purpose of providing extra policy routing
capability.

While I still need to take a closer look at the code, I believe it would be
helpful to add such example use cases in the documentation for users to
better understand the feature.

Hope this helps to understand my problem.


But some more cons:

1. complex network topology. Interconnecting all LRs even with some
transit switch is harder than having one LR and all VPC-related
configuration is done in one place (in AZ).
2. Directly-connected routes. In case we have multiple Route Table
LRs, we
have to add route for each subnet from another LR. In case one LR per
VPC
all such routes are installed automatically and learnt via ovn-ic.
3. PBR, LBs. It’s much easier to implement PBR and configuring Load
Balancers in one LR, than in multiple.
4. There could be very many LRs, LRPs, LSPs (datapaths in SB) - excess
database records, huge database growth.
5. Extra client code complexity (updating routes required configuring
routes in many LRs on different availibility zones);
5. We couldn’t use ovn-ic routes learning, because VPC requires
out-of-box
IP connectivity between subnets, and if connect Route Table LRs between
AZs, because connecting multiple LRs from different Route Tables would
learn routes without binding to route table. This requires additional
isolation at the transit switches level.
6. There were some more problems. Here are listed some, which I could
refresh in my mind.

From our half-of-a-year experience using LR per VPC is very comfortable
and it looks quite extendable it terms of network features.

Let me know if this makes sense.


Thanks for the list of the cons! Most of the cons you listed above seem
to
apply to the AWS model, too, right? It is ok to have different
requirements

Honestly, I can’t imagine these bullets to "AWS model". Maybe only
ovn-ic, which
requires modifications in ovn-ic codebase to support route_tables in
routes.
But this work is already done.

I think the disagreement was mainly from the understanding of the behavior
of directly connected subnets between different routing tables. I assume my
understanding was wrong for now :)

than AWS, but we'd better define it clearly and it would be helpful with
some typical use cases that solve specific problems. I have more
information now after your explanation, but still not really clear under
what scenarios would this feature be used.

@Numan Siddique <numans at ovn.org<mailto:numans at ovn.org><mailto:numans at ovn.org>> @Mark Michelson <mmichels at redhat.com<mailto:mmichels at redhat.com><mailto:mmichels at redhat.com>>
could you let me know your thoughts on this, too? Would you see similar
use
cases in Redhat customers? Or does this design match well with your
potential use cases? I'd like to be careful when implementing something
like this and make sure we are doing it the right way.

I don't think there is any use case in ovn-k8s or openstack neutron which would
use this feature.

Sorry.  I've not followed the discussion here closely.

What is the tl;dr.

@Han Zhou You're fine with this feature and will we be accepting this in OVN ?

Yes, after the discussions, I think the requirement is valid and the approach addresses the requirement properly without increasing complexity too much.
I had a comment regarding the priority handling. Vladislav mentioned he will update the patch to support similar behavior of AWS that allows users to add routes that overrides directly connected subnet routes. My recommendation is just don't add extra priority to the global routing table, and just let the longer prefix route take precedence.

Thanks,
Han


Thanks
Numan


One comment here from my side:
by "this design" we should talk about "behaviour similar to AWS/other
public platforms route tables support".
I think that it is a good feature for opensource SDN, which can be futher
used by opensource cloud platforms like OpenStack to extend routing
capabilities.

Agree.


Thanks,
Han

1:
https://aws.amazon.com/blogs/aws/inspect-subnet-to-subnet-traffic-with-amazon-vpc-more-specific-routing/
2: https://cloud.google.com/vpc/docs/routes#subnet-routes


Han



I haven't finished reviewing the code yet, but I have one comment

regarding adding 100 to the priority of the global routes. For IPv6,
the
priority range from 0 to 120x2=240, so adding 100 is not enough. It
would
create overlapping priority ranges, and some table-id specific route
entries may override the global routes.



Thanks, I’ll dig into this when you finish review.


Let me know if I answered your questions or if you have new ones.
Again many thanks for your time and digging into this patch series.

1:
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html
2: https://cloud.google.com/vpc/docs/routes#subnet-routes
3: https://docs.cloud.croc.ru/en/services/networks/routetables.html

Thanks,
Han

_______________________________________________
dev mailing list
dev at openvswitch.org<mailto:dev at openvswitch.org><mailto:dev at openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

_______________________________________________
dev mailing list
dev at openvswitch.org<mailto:dev at openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



More information about the dev mailing list