[ovs-dev] [PATCH] datapath: add ipv6 'set' action

Ansis Atteka aatteka at nicira.com
Sun Nov 4 15:44:15 UTC 2012


On Tue, Oct 30, 2012 at 1:11 AM, Jesse Gross <jesse at nicira.com> wrote:
> On Sun, Oct 28, 2012 at 2:21 PM, Ansis Atteka <aatteka at nicira.com> wrote:
>> This patch adds ipv6 set action functionality. It allows to change
>> traffic class, flow label, hop-limit, ipv6 source and destination
>> address fields.
>>
>> Signed-off-by: Ansis Atteka <aatteka at nicira.com>
>
> Can you add an item to NEWS?
ok
>
> I got a few sparse errors when I applied this:
>
>   CHECK   /home/jgross/openvswitch/datapath/linux/actions.c
> /home/jgross/openvswitch/datapath/linux/actions.c:270:12: warning:
> incorrect type in assignment (different base types)
> /home/jgross/openvswitch/datapath/linux/actions.c:270:12:    expected
> restricted __be32 [usertype] fl
> /home/jgross/openvswitch/datapath/linux/actions.c:270:12:    got int
> /home/jgross/openvswitch/datapath/linux/actions.c:273:41: warning:
> incorrect type in argument 2 (different base types)
> /home/jgross/openvswitch/datapath/linux/actions.c:273:41:    expected
> unsigned int [unsigned] [usertype] fl
> /home/jgross/openvswitch/datapath/linux/actions.c:273:41:    got
> restricted __be32 const [usertype] ipv6_label
>
> These look like real problems to me on certain architectures because
> the constants that you are using for bit manipulations will have
> different results depending on endianness.
ok
>
> I think we also need to do some validation of these actions in
> datapath.c similar to what we do currently for IPv4 set.  I think
> right now these actions just get rejected immediately because they
> aren't expected by the validation code.
ok
>
> We could implement the equivalent actions in the userspace datapath as
> well.  There aren't any complex dependencies here so it would be ideal
> to maintain parity.
ok. Doing that right now.
>
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index ec9b595..7e60b0f 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> +static void set_ipv6_addr(struct sk_buff *skb, struct ipv6hdr *nh,
>> +                         __be32 addr[4], const __be32 new_addr[4])
>> +{
>> +       int transport_len = skb->len - skb_transport_offset(skb);
>> +
>> +       if (nh->nexthdr == IPPROTO_TCP) {
>> +               if (likely(transport_len >= sizeof(struct tcphdr)))
>> +                       inet_proto_csum_replace16(&tcp_hdr(skb)->check, skb,
>> +                                                 addr, new_addr, 1);
>> +       } else if (nh->nexthdr == IPPROTO_UDP) {
>> +               if (likely(transport_len >= sizeof(struct udphdr))) {
>> +                       struct udphdr *uh = udp_hdr(skb);
>> +
>> +                       if (uh->check ||
>> +                           get_ip_summed(skb) == OVS_CSUM_PARTIAL) {
>> +                               inet_proto_csum_replace16(&uh->check, skb,
>> +                                                         addr, new_addr, 1);
>> +                               if (!uh->check)
>> +                                       uh->check = CSUM_MANGLED_0;
>> +                       }
>> +               }
>> +       }
>
> IPv6 has an extra little twist when it comes to checksums compared to
> IPv4: routing headers.  In an IPv6 packet, the destination IP address
> used for the purposes of checksum calculation in the pseudo header is
> the final destination address.  This means that if you have a routing
> header then it is that address that matters and not one in the IP
> header.  This is likely to be a pretty rare case but we should handle
> it.
There could be a chain of ipv6 extension headers. And the Routing
Header could be in the middle of that chain. Can you confirm, if
ipv6_find_hdr() function is good enough to get the routing header?

Also, in the routing header there could be multiple ipv6 addresses
that need to be visited. I guess, that, If routing header is present,
but ipv6_rt_hdr.segments_left is equal to 0, then we would still need
to change the ipv6 destination address in the main IPv6 header and not
the last ipv6 address in the routing header (rfc2460; 4.4). Is that
right?

Type 0 Routing Header seems to be deprecated by rfc5095. To make
things simpler, would it make sense to:
1. always change the nh.daddr; and
2. update L4 checksums, if and only if nh.daddr is also the final
destination address?
Maybe you already assumed that we should always change nh.daddr and
not the ipv6_rt_hdr.last_daddr? I couldn't quite tell from your
paragraph above.

>
> Also, due to the presence of extension headers in general, nh->nexthdr
> might not point to the L4 protocol.  We should use the extracted L4
> protocol in the flow instead.
Right,

>
>> +static void set_ipv6_tc(struct ipv6hdr *nh, __u8 tc)
>
> In general, I would prefer the versions of the host byte order types
> without underscores (i.e. u8, u32, etc.) for components entirely
> within the kernel.
ok
>
>> +{
>> +       nh->priority = (nh->priority & 0xf0) | (tc >> 4);
>
> nh->priority is only 4 bits, so you shouldn't have to do the mask.
>
>> +       nh->flow_lbl[0] = (nh->flow_lbl[0] & 0x0f) | ((tc & 0x0f) << 4);
>> +}
>> +
>> +static void set_ipv6_fl(struct ipv6hdr *nh, __u32 fl)
>> +{
>> +       nh->flow_lbl[0] = (nh->flow_lbl[0] & 0xf0) | ((fl & 0x0f0000) >> 16);
>> +       nh->flow_lbl[1] = (fl & 0x00ff00) >> 8;
>> +       nh->flow_lbl[2] = fl & 0x0000ff;
>
> I would make all of these hex constants the full 4 bytes.  Whenever I
> see only 3 being used with a 4 byte value I think that I've
> miscounted.
ok
>
>> +static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *ipv6_key)
>> +{
>> +       struct ipv6hdr *nh;
>> +       int err;
>> +       __u8 tc;
>> +       __be32 fl;
>
> Based on your usage below, I think that fl is supposed to be in host byte order.
Ok
>
>> +       __be32 *saddr;
>> +       __be32 *daddr;
>> +
>> +       err = make_writable(skb, skb_network_offset(skb) +
>> +                                sizeof(struct ipv6hdr));
>> +       if (unlikely(err))
>> +               return err;
>> +
>> +       nh = ipv6_hdr(skb);
>> +       saddr = (__be32 *)&nh->saddr;
>> +       daddr = (__be32 *)&nh->daddr;
>> +
>> +       if (memcmp(ipv6_key->ipv6_src, saddr, sizeof(ipv6_key->ipv6_src)))
>> +               set_ipv6_addr(skb, nh, saddr, ipv6_key->ipv6_src);
>> +
>> +       if (memcmp(ipv6_key->ipv6_dst, daddr, sizeof(ipv6_key->ipv6_dst)))
>> +               set_ipv6_addr(skb, nh, daddr, ipv6_key->ipv6_dst);
>> +
>> +       tc = (nh->priority << 4) | (nh->flow_lbl[0] >> 4);
>> +       if (ipv6_key->ipv6_tclass != tc)
>> +               set_ipv6_tc(nh, ipv6_key->ipv6_tclass);
>> +
>> +       fl = (nh->flow_lbl[0] & 0x0f) << 16 | nh->flow_lbl[1] << 8 |
>> +             nh->flow_lbl[2];
>> +       if (ipv6_key->ipv6_label != fl)
>> +               set_ipv6_fl(nh, ipv6_key->ipv6_label);
>
> There's a mismatch here between the byte order fl and ipv6_key->ipv6_label.
>
>> +       if (ipv6_key->ipv6_hlimit != nh->hop_limit)
>> +               nh->hop_limit = ipv6_key->ipv6_hlimit;
Ok
>
> I think you can drop the checks for traffic class, flow label, and hop
> limit and just do the assignments directly.  The cost to do the check
> is basically the same as the actual operation so the check doesn't
> really save us anything even in the common case.
ok
>
>> diff --git a/datapath/compat.h b/datapath/compat.h
>> index 3113b96..fb660d2 100644
>> --- a/datapath/compat.h
>> +++ b/datapath/compat.h
>
> We usually put functions that are backported directly from Linux in
> the corresponding header file in the compat directory.  Normally that
> would be net/checksum.h in this case but this function depends on the
> OVS checksum constants, so it should go in datapath/checksum.h.  It
> should also use the OVS constants like the example of
> inet_proto_csum_replace4() in the same file.
ok
>
>> +#if LINUX_VERSION_CODE < KERNEL_VERSION(3,6,0)
>> +static inline void inet_proto_csum_replace16(__sum16 *sum,
>
> I think this function is coming in 3.7, not 3.6.
This function was added just before 3.6-rc2 git tag. So does this mean
that I have to check against 3.7 version instead?

git describe --tags 2cf545e835aae92173ef0b1f4af385e9c40f21e8



More information about the dev mailing list