[ovs-dev] [PATCH net 0/2] vxlan: Set a large MTU on ovs-created vxlan devices

Thu Jan 7 18:40:42 UTC 2016

On 01/07/16 at 06:50pm, Hannes Frederic Sowa wrote:
> On 07.01.2016 18:21, Thomas Graf wrote:
> >On 01/07/16 at 08:35am, Jesse Gross wrote:
> >>On Thu, Jan 7, 2016 at 3:49 AM, Thomas Graf <tgraf at suug.ch> wrote:
> >>>A simple start could be to add a new return code for > MTU drops in
> >>>the dev_queue_xmit() path and check for NET_XMIT_DROP_MTU in
> >>>ovs_vport_send() and emit proper ICMPs.
> >>
> >>That could be interesting. The problem in the past was making sure
> >>that ICMPs that are generated fit in the virtual network appropriately
> >>- right addresses, etc. This requires either spoofing addresses or
> >>some additional knowledge about the topology that we don't currently
> >>have in the kernel.
> >
> >Are you worried about emitting an ICMP with a source which is not
> >a local host address?
> 
> We have uRPF enabled for IPv4 by default on all kernels. Thus if we generate
> an IPv4 ICMP packet back with an error message it must have a source address
> which the receiving kernel considers valid. Valid means that sending to the
> source address would have used the same outgoing interface the ICMP error
> came in from.

Agreed. I think this is given though as we would reverse the addresses
as icmp_send() already does:

        saddr = iph->daddr;

> >Can't we just use icmp_send() in the context of the inner header and
> >feed it to the flow table to send it back? It should be the same as
> >for ip_forward().
> 
> The bridge's ip address often has no valid path as seen from the end host
> system receiving the icmp error, because the openvswitch is not really part
> of the L3 forwarding chain.

I don't think the IP of the bridge ever comes into play. It shouldn't.
I'm not even sure what could be considered the address of the bridge
;-)

> Faking the address from the packet (e.g. using the destination address of
> the original packet) will make traceroute go nuts.

I think you are worried about an ICMP error from a hop which does not
decrement TTL. I think that's a good point and I think we should only
send an ICMP error if the TTL is decremented in the action list of
the flow for which we have seen a MTU based drop (or TTL=0).

I don't really see a difference between ip_forward(), some
sophisticated tc action or OVS. As soon as they decremented TTL and
perform L3 forwarding, then they should send out ICMP errors to allow
for proper PMTU.

> Normally ethernet devices don't return icmp error messages. E.g. broken
> jumbo frame configuration just leads to silent packet loss because the
> packet is discarded before a router can handle it. Thus it would be best in
> case of local ovs installation if the error is already transported back to
> the client application via the network call stack. This might be very
> difficult in case we enqueue the packet to a backlog queue and reschedule
> softirqs. Probably we need some way of faking source addresses from bridges
> now.... :/

I think the major complications comes from the assumption that OVS is
a bridge. This is not necessarily the case as stated above. If a flow
is doing L3 forwarding, we should send ICMPs as expected from a
router.