[ovs-dev] [PATCH v2 03/16] datapath: compat for NAT.

Sat Jun 18 01:26:38 UTC 2016

> On Jun 16, 2016, at 4:39 PM, Jesse Gross <jesse at kernel.org> wrote:
> 
> On Tue, Jun 14, 2016 at 3:25 PM, Jarno Rajahalme <jarno at ovn.org> wrote:
>> diff --git a/datapath/linux/compat/include/linux/netfilter/nf_nat.h b/datapath/linux/compat/include/linux/netfilter/nf_nat.h
>> new file mode 100644
>> index 0000000..210b9a7
>> --- /dev/null
>> +++ b/datapath/linux/compat/include/linux/netfilter/nf_nat.h
>> @@ -0,0 +1,15 @@
>> +#ifndef _LINUX_NF_NAT_WRAPPER_H
>> +#define _LINUX_NF_NAT_WRAPPER_H
>> +
>> +#include_next <linux/netfilter/nf_nat.h>
>> +
>> +/* Linux kernel 3.13 and older do not have NF_NAT_RANGE_PROTO_RANDOM_FULLY
>> + * (unless backported by the distribution), but we fake it to maintain OVS API
>> + * compatibility.  In this case NAT port allocation may happen sequentially
>> + * instead.
>> + */
>> +#ifndef NF_NAT_RANGE_PROTO_RANDOM_FULLY
>> +#define NF_NAT_RANGE_PROTO_RANDOM_FULLY (1 << 4)
>> +#endif
> 
> I think that on kernels where this isn't defined this value will
> likely be meaningless and ignored, right? If that's true then I think
> 0 would be a better choice to avoid the possibility of collision with
> another backport. However, would it make sense to just define this to
> NF_NAT_RANGE_PROTO_RANDOM to be a closer approximation?
> 

This was difficult without touching the conntrack.c itself, but you are right, PROTO_RANDOM is a better approximation. I added the needed combat code to conntrack.c itself for v3.

>> diff --git a/datapath/linux/compat/include/net/netfilter/nf_nat_core.h b/datapath/linux/compat/include/net/netfilter/nf_nat_core.h
>> new file mode 100644
>> index 0000000..6b17ab7
>> --- /dev/null
>> +++ b/datapath/linux/compat/include/net/netfilter/nf_nat_core.h
>> @@ -0,0 +1,88 @@
>> +#ifndef _NF_NAT_CORE_WRAPPER_H
>> +#define _NF_NAT_CORE_WRAPPER_H
>> +
>> +#include_next <net/netfilter/nf_nat_core.h>
>> +
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(4,6,0)
>> +
>> +/* Linux 4.6 and newer do not depend on skb_dst being set, so this workaround
>> + * is not needed there.
>> + */
>> +static inline unsigned int
>> +rpl_nf_nat_packet(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
>> +                 unsigned int hooknum, struct sk_buff *skb)
>> +{
>> +       /* Change skb to CHECKSUM_PARTIAL to avoid running code that
>> +        * expects skb_dst being set.
>> +        */
> 
> What code triggers the problem? This code kind of scares me, it seems
> like it is easy to miss things. (What about protocols for which there
> is no checksum offload? I suppose many of them aren't possible to NAT
> but I there might be some.)

It is the checksum recalculation code for IPv4 and IPv6 that is called from NAT helper after a TCP or UDP packet payload has been mangled. As such, a better location for this is right before the helper call. Unfortunately the helper is called via a function pointer, so a simple backport in the compat directory does not work. Therefore I added this piece to conntrack.c itself, which also allowed removing code duplication for finding the transport header for IPv6.

> 
>> +       if (skb->ip_summed != CHECKSUM_PARTIAL) {
>> +               switch (skb->protocol) {
>> +               case ETH_P_IP:
> 
> There's a byte order issue here: skb->protocol is in network byte
> order but the ETH_P_* constants are in host byte order.

Oops. This is now avoided by sharing code that was already in conntrack.c.

These conntrack.c changes are now new separate patches after the main NAT backport in v3 I'll post in a moment.

  Jarno