[ovs-dev] [PATCH] VxLAN-gpe implementation

Hannes Frederic Sowa hannes at redhat.com
Wed Jun 8 19:46:59 UTC 2016


On 08.06.2016 17:38, Alexander Duyck wrote:
> On Wed, Jun 8, 2016 at 7:48 AM, Hannes Frederic Sowa <hannes at redhat.com> wrote:
>> On 08.06.2016 14:51, Jiri Benc wrote:
>>> On Mon, 6 Jun 2016 14:22:58 -0700, Jesse Gross wrote:
>>>> On Sat, Jun 4, 2016 at 6:39 AM, Yi Yang <yi.y.yang at intel.com> wrote:
>>>> [...]
>>>>>  datapath/vport-netdev.c                           |   3 +-
>>>>>  datapath/vport-vxlan.c                            |  17 ++-
>>>>
>>>> These changes aren't upstream yet. Please do that before backporting them here.
>>>>
>>>> However, the changes to vport-vxlan.c are modifying compatibility code
>>>> that shouldn't be extended further. Instead, just use the existing
>>>> VXLAN netlink interfaces that have already been created to enable
>>>> these features.
>>>>
>>>> There is also a number of other patches to the OVS kernel module/VXLAN
>>>> that have not been backported. Pravin started doing this work but it
>>>> hasn't been applied yet. In general, I think it makes sense to
>>>> backport patches in order so that the diffs of the patches match those
>>>> of upstream.
>>>>
>>>> Finally, I have a question about receive side offloading with
>>>> VXLAN-gpe. This is primarily an upstream issue but is present in the
>>>> code being backported here as well. The VXLAN code sets up receive
>>>> offloads for all ports regardless of whether they are classic VXLAN or
>>>> L2/L3 GPE and expects NICs to parse the packets. I don't think this is
>>>> safe because there are a number of NICs out there that predate the
>>>> existence of GPE and therefore won't do this parsing correctly. I
>>>> think that it is necessary to disable receive offloading for
>>>> non-Ethernet VXLAN-GPE unless the offloading interface is extended.
>>>
>>> Coincidentally, I was talking about this with Hannes a few days ago.
>>> I'm adding him to CC.
>>>
>>> I guess you're referring to ndo_add_vxlan_port, right? I agree that
>>> this interface needs changes, especially considering that we know
>>> whether the UDP port belongs to VXLAN or VXLAN-GPE. But from my
>>> understanding of how drivers use this callback, the worst thing that
>>> could happen is suboptimal generation of rx hashes and thus steering
>>> the packets to a different receive queue than in the optimal case.
>>> Surely something to fix but it seems it won't cause much functional
>>> troubles with the current code?
>>
>> I am not sure if we must upgrade the interface. Can't drivers always
>> configure vxlan-gpe and are always backwards compatible?
>>
>> Non vxlan-gpe capable hardware would have to abort checksum offloading
>> as soon as they can't interpret the vxlan header anyway, so the packets
>> end up on the slow path and nothing bad should happen.
>>
>> Possibly some hardware will verify inner checksums despite it could not
>> understand the vxlan header completely. In this case we probably will
>> drop the packet in the driver. Anyway, I would be in favor to simply
>> present one knob, namely vxlan-offloading, to the user, instead a knob
>> for each version of vxlan.
>>
>> Unfortunately I couldn't get a definitive answer from the specs
>> regarding the checksuming details.
>>
>> Bye,
>> Hannes
> 
> This is starting to sound like the same conversation we had on netdev
> when the ndo_add_geneve_port was added.  One easy fix for guaranteeing
> that we can perform the checksum offload is to just enable the outer
> UDP checksum.  Then we can still perform GRO and use the outer UDP
> source port for generating a hash.  If possible we should make this
> the default for all new UDP based tunnels going forward simply because
> it allows for backwards-compatibility with existing offloads.

Yes, but this time we are only in vxlan-versioning-world only. :)

Hmm, it is a good question why we didn't do that already. Do you
remember the reasons?

> Really I wonder if we shouldn't just start pulling out all the support
> for the ndo_add_vxlan/geneve_port code anyway, or at least start
> forcing it to include more data such as an endpoint IP/IPv6 address

As far as I know Intel cards don't support adding UDP tuples but only
simple ports to the list of udp protocol offloads.

On the other side, I think the current way how we do it is quite
okayish. At the same time where we establish the udp offload we
unconditionally add the socket to the wildcard socket table (local_ip ==
0). At least no other socket can bind to such a socket (maybe
IP_FREEBIND etc. are an exception here).

> and tunnel type.  There end up being a number of cases where the
> offload could end up asserting itself where it isn't actually wanted
> such as a mixed environment where the PF is setting up the offloads
> via ndo_add_vxlan_port and then trying to apply that to VFs or other
> PFs on the same device which aren't operating in the same network
> namespace.  The worst case ends up being that you have UDP checksum
> offloads stripped for flows that would have otherwise been reporting
> valid checksum because the interfaces think they have tunnels with an
> invalid inner checksum, or not L4 checksum at all.

Okay, if the networking card mixes up offloads across vf interfaces we
certainly can't trust them. Is this the case or did I misunderstood your
example?

Otherwise regarding the vxlan offloading config on the cards, I think we
are fine right now. Even during namespace change of an device we
correctly flush the ports from the firmware.

Bye,
Hannes




More information about the dev mailing list