[ovs-discuss] [OVN] MTU issues with OVN

axel at tripier.fr axel at tripier.fr
Tue Jun 19 14:42:36 UTC 2018


Hello everyone,

I have a project where I use Vagrant to spawn VMs connected to OVS and managed by OVN. Think of it as "A minimalist OpenStack-like tool that is based on Vagrant and OVN" (yes I know, it sounds horrible said like that!)
The workflow is: users can request VMs through the tool, the tool creates the Vagrantfile, launches the VM using vagrant on a hypervisor with free resources (based on qemu+kvm or HyperV), and attach the VM to OVS, where OVN grants it network connectivity to others VMs launched by the same user (+ internet through a gateway). VMs from a same user can be on different hypervisors.
VM images are public pre-packages images chosen by the user from Vagrant Cloud (https://app.vagrantup.com/boxes/search)

I'm encountering an issue where TCP connections between VMs on the same LAN but on different hypervisors sometime hangs.
The issue is with the MTU of the network interface of the VM: it defaults to 1500, but packets have to be sent over a Geneve tunnel so they should not exceed 1442 bytes (with DF=1). So packets larger than this are dropped.

>From https://specs.openstack.org/openstack/neutron-specs/specs/kilo/mtu-selection-and-advertisement.html the option OpenStack chose is to advertise the lower MTU using DHCP to the VM ("mtu"="1442" in `ovn-nbctl create DHCP_Options` options).

The issue is that not all DHCP clients apply this option: specifically, from my tests, dhcpcd and dhclient apply the option, but systemd-networkd doesn't (see https://github.com/systemd/systemd/pull/6950/files).
The issue is that a lot of Vagrant Boxes (all the ones based on systemd I encountered so far) use systemd-networkd as their DHCP client.

So I have a situation where: I cannot send the MTU over DHCP because it won't be accepted by systemd-networkd on the vagrant boxes, I cannot connect to the vagrant boxes to provision them as they are connected to OVN and the hypervisor has no access to them anymore (there is no vagrant management interface on the VM on my setup, I cannot provision them post-creation), and of course I cannot modify all the images on Vagrant Cloud to add "UseMTU=true" to the systemd-networkd config.

PMTUD or MSS Clamping cannot be done as the packet is not going through a router (it's going between VMs in the same LAN).

I talked about that with lucasagomes on IRC (thanks a lot for his time!) and he recommended I ask here since there was no easy answer to my problem.

An option is to use jumbo frames on the underlay network so its MTU would be >1558 bytes, so the overlay network can keep its MTU to 1500. But then I can't have hypervisors across Internet (or any other MTU limiting network).

Another option would be to fragment the Geneve encapsulated packets (after encapsulation) before going through the tunnel, and reassemble them on the other side of the tunnel. It would hurt performances a lot (fragmentation/reassembly of packets, sending of a big packet and a very small one each time), but it would solve the issue at least until the MTU is lowered on the VM by its user.
I also saw some hints that OVN could do this ("Although GENEVE and OVN supports IP fragmentation [...]" on https://ovirt.org/develop/release-management/features/network/managed_mtu_for_vm_networks/), but I did not find a way to do it. Is there a way?

In any case, the issue with lowering the MTU is that communication between VMs in the same hypervisor could be way more efficient if they could use a high MTU, and only use a lower MTU if they communicate across hypervisors.
With that in mind, I wonders if OVS could send "fake" ICMP Fragmentation needed packets to the sender VM if the packet has to go through a tunnel, has DF=1 and packet size is over the (MTU - tunnel header size). It probably would not work because the packet is not going through a router, but I have not tried. What are you though about this?

Have I exhausted all the options to work around the MTU issue without having to modify the MTU in the VM itself? Or are there more things that can be done and that I did not think of?

Thanks,
Axel


More information about the discuss mailing list