[ovs-discuss] MTU considerations for OVN

Fri Jun 10 01:06:29 UTC 2016

In my previous message, this is what I mentioned (reproducing it here
just because it doesn't appear in the quoted conversation below):

"One possible solution is to introduce an action in the kernel that
would check packets flowing through the switch against a length
specified by the user (where the 'user' is OVS userspace/OVN in this
case). To use this, we would do a route lookup of the tunnel endpoint
to find the outgoing device, subtract the encapsulation overhead, and
install a flow that checks this length and punts the packet to OVN to
generate an ICMP message."

A possible way of getting the MTU to OVS userspace would be through a
configuration option. I don't think that this is really the hard part
though and so rest of the discussion around this should still apply.
In particular, it's not really that hard for OVS userspace to do a
route lookup, so if we are totally sure that the MTU is static we
could just have OVS fetch it. I'm not sure that either approach is all
that generic though.

On Thu, Jun 9, 2016 at 10:27 AM, Matt Kassawara <mkassawara at gmail.com> wrote:
> Jesse,
>
> I know this sounds too easy, but can we just tell OVS about the underlying
> physical network MTU via config option?
>
> On Fri, May 6, 2016 at 1:08 PM, Jesse Gross <jesse at kernel.org> wrote:
>>
>> On Fri, May 6, 2016 at 11:53 AM, Ryan Moats <rmoats at us.ibm.com> wrote:
>> > Jesse Gross <jesse at kernel.org> wrote on 05/06/2016 11:11:10 AM:
>> >
>> >> From: Jesse Gross <jesse at kernel.org>
>> >> To: Ryan Moats/Omaha/IBM at IBMUS
>> >> Cc: Matt Kassawara <mkassawara at gmail.com>, discuss
>> >> <discuss at openvswitch.org>, Thomas Graf <tgraf at suug.ch>
>> >> Date: 05/06/2016 11:11 AM
>> >
>> >
>> >> Subject: Re: [ovs-discuss] MTU considerations for OVN
>> >>
>> >> On Fri, May 6, 2016 at 8:40 AM, Ryan Moats <rmoats at us.ibm.com> wrote:
>> >> > "discuss" <discuss-bounces at openvswitch.org> wrote on 05/04/2016
>> >> > 06:09:04
>> >> > PM:
>> >> >
>> >> >> From: Jesse Gross <jesse at kernel.org>
>> >> >> To: Matt Kassawara <mkassawara at gmail.com>
>> >> >> Cc: discuss <discuss at openvswitch.org>
>> >> >> Date: 05/04/2016 06:09 PM
>> >> >> Subject: Re: [ovs-discuss] MTU considerations for OVN
>> >> >> Sent by: "discuss" <discuss-bounces at openvswitch.org>
>> >> >>
>> >> >> On Tue, May 3, 2016 at 3:50 PM, Matt Kassawara
>> >> >> <mkassawara at gmail.com>
>> >> >> wrote:
>> >> >> > Jesse,
>> >> >> >
>> >> >> > I'm resurrecting this thread after a fairly lengthy discussion of
>> >> >> > MTU
>> >> >> > with
>> >> >> > Ben at the recent OpenStack summit. Have you given the topic any
>> >> >> > further
>> >> >> > thought toward implementation in a reasonable way? Can you
>> >> >> > elaborate
>> >> >> > on
>> >> >> > the
>> >> >> > architectural limitations? At the moment, the OpenStack
>> >> >> > implementation
>> >> >> > of
>> >> >> > OVN doesn't use DPDK.
>> >> >>
>> >> >> The issue that I alluded to before is that when OVS (and by
>> >> >> extension
>> >> >> OVN) does L3 processing the packets aren't traversing the Linux IP
>> >> >> stack and so the usual MTU checks don't apply. Instead OVS just does
>> >> >> a
>> >> >> single combined lookup for all flow processing and then applies some
>> >> >> actions like set SMAC/DMAC and decrement TTL. Not only is there no
>> >> >> code to check the outgoing MTU but there's no obvious outgoing
>> >> >> device
>> >> >> to fetch the desired MTU from.
>> >> >
>> >> > I'm not 100% sure why this would be an issue - IIRC (based on my
>> >> > scanning
>> >> > the code)
>> >> > when a packet is going to be outputed, it looks like the MTU of the
>> >> > physical
>> >> > device
>> >> > is checked and a fragmentation decision made.  Isn't that good enough
>> >> > for
>> >> > our
>> >> > purposes?
>> >>
>> >> Which check in particular do you have in mind?
>> >>
>> >> There are two possibilities that I can think of:
>> >>  * ovs_vport_send() has one but the device it looks at for the MTU is
>> >> a tunnel device, which has an essentially infinite MTU. The real MTU
>> >> that we would need to check also depends on the destination IP address
>> >> of the tunnel but we haven't done a route lookup at this point.
>> >>  * ip_finish_output() in the IP stack. This one does have the
>> >> information that we need but it is outside of the tunnel. Any ICMP
>> >> packets that are generated will be processed through the hypervisor's
>> >> IP stack and won't make it back to the VM. In addition, this check
>> >> doesn't handle GSO packets.
>> >
>> > I see, I was misreading code... my mistake.
>> >
>> > I certainly dislike the idea of separating the MTU calculation from the
>> > datapath. What I was hoping to find that it would be possible to do the
>> > fragmentation check on the tunnel after the route has been looked up and
>> > the outgoing device is known, but looking through this, I'm not seeing
>> > a good way to do this cleanly (yet) ...
>>
>> I agree.
>>
>> There was a thread a while back on the netdev mailing list related
>> this but no real conclusion:
>> https://www.spinics.net/lists/netdev/msg257830.html
>
>