[ovs-discuss] OVN - MTU path discovery

Daniel Alvarez Sanchez dalvarez at redhat.com
Mon Sep 24 12:57:31 UTC 2018


Resending this email as I can't see it in [0] for some reason.
[0] https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/




On Fri, Sep 21, 2018 at 2:36 PM Daniel Alvarez Sanchez <dalvarez at redhat.com>
wrote:

> Hi folks,
>
> After talking to Numan and reading log from IRC meeting yesterday,
> looks like there's some confusion around the issue.
>
> jpettit | I should look at the initial bug report again, but is it not
> sufficient to configure a smaller MTU within the VM?
>
> Imagine the case where some host from the external network (MTU 1500)
> sends 1000B UDP packets to the VM (MTU 200). When OVN attempts to deliver
> the packet to the VM it won't fit and the application running there will
> never
> get the packet.
>
> With reference implementation (or if namespaces were used as Han suggests
> that this is what NSX does), the packet would be handled by the IP stack on
> the gateway node. An ICMP need-to-frag would be sent back to the sender
> and - if they're not blocked by some firewall - the IP stack on the sender
> node
> will fragment this and subsequent packets to fit the MTU on the receiver.
>
> Also, generally we don't want to configure small MTUs on the VMs for
> performance as it would also impact on east/west traffic where
> Jumbo frames appear to work.
>
> Thanks a lot for bringing this up on the meeting!
> Daniel
>
> On Mon, Aug 13, 2018 at 5:23 PM Miguel Angel Ajo Pelayo <
> majopela at redhat.com> wrote:
> >
> > Yeah, later on we have found that it was, again, more important that we
> think.
> >
> > For example, there are still cases not covered by TCP MSS negotiation (or
> > for UDP/other protocols):
> >
> > Imagine you have two clouds, both with an internal MTU (let’s imagine
> > MTUb on cloud B, and MTUa on cloud A), and an external transit
> > network with a 1500 MTU (MTUc).
> >
> > MTUb > MTUc. And MTUb > MTUc
> >
> > Also, imagine that VMa in cloud A, has a floating IP (DNAT_SNAT NAT),
> > and VMb in cloud B has also a floating IP.
> >
> > VMa tries to establish  connection to VMb FIP, and announces
> > MSSa = MTUa - (IP + TCP overhead), VMb ACKs the TCP SYN request
> > with  MSSb = MTUb - (IP - TCP overhead).
> >
> > So the agreement will be min(MSSa,MSSb) , but… the transit network MSSc
> > will always be smaller , min(MSSa, MSSb) < MSSc.
> >
> > In ML2/OVS deployments, those big packets will get fragmented at the
> router
> > edge, and a notification ICMP will be sent to the sender of the packets
> to notify
> > fragmenting in source is necessary.
> >
> >
> > I guess we can also replicate this with 2 VMs on the same cloud with
> MSSa > MSSb
> > where they try to talk via floating IP to each other.
> >
> >
> > So going back to the thing, I guess we need to implement some OpenFlow
> extension
> > to match packets per size, redirecting those to an slow path
> (ovn-controller) so we can
> > Fragment/and icmp back the source for source fragmentation?
> >
> > Any advise on what’s the procedure here (OpenFlow land, kernel wise,
> even in terms
> > of our source code and design so we could implement this) ?
> >
> >
> > Best regards,
> > Miguel Ángel.
> >
> >
> > On 3 August 2018 at 17:41:05, Daniel Alvarez Sanchez (
> dalvarez at redhat.com) wrote:
> >
> > Maybe ICMP is not that critical but seems like not having the ICMP 'need
> to frag' on UDP communications could break some applications that are aware
> of this to reduce the size of the packets? I wonder...
> >
> > Thanks!
> > Daniel
> >
> > On Fri, Aug 3, 2018 at 5:20 PM Miguel Angel Ajo Pelayo <
> majopela at redhat.com> wrote:
> >>
> >>
> >> We didn’t understand why a MTU missmatch in one direction worked (N/S),
> >> but in other direction (S/N) didn’t work… and we found that that it’s
> actually
> >> working (at least for TCP, via MSS negotiation), we had a
> missconfiguration
> >> In one of the physical interfaces.
> >>
> >> So, in the case of TCP we are fine. TCP is smart enough to negotiate
> properly.
> >>
> >> Other protocols like ICMP with the DF flag, or UDP… would not get the
> ICMP
> >> that notifies the sender about the MTU miss-match.
> >>
> >> I suspect that the most common cases are covered, and that it’s not
> worth
> >> pursuing what I was asking for at least with a high priority, but I’d
> like to hear
> >> opinions.
> >>
> >>
> >> Best regards,
> >> Miguel Ángel.
> >>
> >> On 3 August 2018 at 08:11:01, Miguel Angel Ajo Pelayo (
> majopela at redhat.com) wrote:
> >>
> >> I’m going to capture some example traffic and try to figure out which
> RFCs
> >> talk about that behaviour so we can come up with a consistent solution.
> >> I can document it in the project.
> >>
> >> To be honest, when I looked at it, I was expecting that the router would
> >> fragment, and I ended up discovering that we had this path MTU discovery
> >> mechanism in play for IPv4 .
> >>
> >> On 2 August 2018 at 22:21:28, Ben Pfaff (blp at ovn.org) wrote:
> >>
> >> On Thu, Aug 02, 2018 at 01:19:57PM -0700, Ben Pfaff wrote:
> >> > On Wed, Aug 01, 2018 at 10:46:07AM -0400, Miguel Angel Ajo Pelayo
> wrote:
> >> > > Hi Ben, ICMP is used as a signal from the router to tell the sender
> >> > > “next hop has a lower mtu, please send smaller packets”, we would
> >> > > need at least something in OVS to slow-path the “bigger than X”
> packets,
> >> > > at that point ova-controller could take care of constructing the
> ICMP packet
> >> > > and sending it to the source.
> >> >
> >> > Yes.
> >> >
> >> > > But I guess, that we still need the kernel changes to match on
> >> > > those “big packets”.
> >> >
> >> > Maybe. If we only need to worry about ICMP, though, we can set up OVN
> >> > so that it always slow-paths ICMP.
> >>
> >> Oh, I think maybe I was just being slow. The ICMP is generated, not
> >> processed. Never mind.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180924/cc2cebf3/attachment.html>


More information about the discuss mailing list