[ovs-dev] [IPv6 Redux 5/6] nicira-ext: Support matching IPv6 traffic.

Justin Pettit jpettit at nicira.com
Mon Jan 24 01:55:33 UTC 2011


On Jan 23, 2011, at 11:11 AM, Jesse Gross wrote:

> On Sun, Jan 23, 2011 at 3:31 AM, Justin Pettit <jpettit at nicira.com> wrote:
>> On Jan 22, 2011, at 5:41 PM, Jesse Gross wrote:
>>>> +
>>>> +       /* We don't process jumbograms. */
>>>> +       if (!ntohs(nh->payload_len))
>>>> +               return -EINVAL;
>>> 
>>> Since this is testing against zero, the byte swap is a waste of time.
>>> Zero is always zero.
>> 
>> Right.  In quite a few places throughout the code, we use this for documentation/warning-others-who-use-this-field purposes; plus, I figured the compiler might optimize it out.  Regardless, I removed it, since you seem to have a preference.
> 
> It's usually best to assume the compiler is pretty dumb because most
> of the time you will be right.  In this case, for a 16-bit swap on x86
> with the Linux implementation it actually is optimized out.  However,
> if it were a 32-bit value then it does make a difference here.
> 
> On the fast path at least, documentation in the form of a comment is
> better if necessary.  Alternately, you could do:
> if (nh->payload_len == htons(0))
> 
> This can be handled at compile time for all sizes, architectures, and
> values since the compiler will do constant folding.

Thanks for the explanation.

>>>> +       while (1) {
>>>> +               if ((nexthdr != NEXTHDR_HOP)
>>>> +                               && (nexthdr != NEXTHDR_ROUTING)
>>>> +                               && (nexthdr != NEXTHDR_DEST)
>>>> +                               && (nexthdr != NEXTHDR_MOBILITY)
>>>> +                               && (nexthdr != NEXTHDR_AUTH)
>>>> +                               && (nexthdr != NEXTHDR_FRAGMENT)) {
>>>> +                       /* It's either a terminal header (e.g., TCP, UDP) or one we
>>>> +                        * don't understand.  In either case, we're done with the
>>>> +                        * packet, so use it to fill in 'nw_proto'. */
>>>> +                       break;
>>>> +               }
>>> 
>>> There's an existing Linux function that is almost exactly the same as
>>> this loop: ipv6_skip_exthdr().  It would be much better if we could
>>> use it instead, especially since I see some protocol handling bugs
>>> below.  I see that it handles NEXTHDR_MOBILITY and NEXTHDR_NONE
>>> differently but otherwise the logic is very similar.
>> 
>> I agree it would be great to reuse the kernel's code, but do you think it's worth using?  It doesn't handle Mobility headers, doesn't complain about nested fragments (I'm not positive this is a protocol violation based on my reading (I think it's silent on the issue), but it seems unlikely to be legitimate), and its handling of the None header is odd.  If any new headers are introduced, we'll have to rebuild this functionality, if we want compatibility with older kernels.  Any issues with the code I wrote to parse extenders should be ferreted out quickly.
> 
> Yes, I think we should use it.  It allows for reduced code, fewer
> bugs, is a slightly more efficient implementation, and would be
> necessary for upstreaming.  Here are the differences:
> ...

Thanks for digging into this.

> Any backwards compatibility issues can be solved easily by putting a
> copy of the most recent kernel's function in the compat directory.
> It's self-contained so it should not be difficult at all.  In
> addition, if there are pieces that you feel are missing from the
> function, then you should propose upstream changes as a sanity check.

Maybe we can start with correcting the explanatory comment to match the implementation.  :-)  Should I send a patch to correct their explanation of return values and the handling of "nexthdrp"?  The one time I reported a bug it was ignored, but I guess patches are more likely to be applied. 

>>>> +       skb_set_transport_header(skb, nh_ofs + nh_len);
>>>> +       key->nw_proto = nexthdr;
>>> 
>>> The IPv4 code pulls enough into the linear data area for the L4
>>> protocols as well, which means that those checks only test skb->len.
>>> We either need to do something similar here or modify the other
>>> checks.
>> 
>> I just did the pull directly in parse_ipv6hdr().  It seems like it may be cleaner to do it in the transport headers in the long run.
> 
> The reason why it was done the way it is currently is to reduce the
> number of calls to pskb_may_pull().  That doesn't really apply to IPv6
> right now because of the extension header parsing but it might make
> more sense if we use ipv6_skip_exthdr() because we won't be calling
> pskb_may_pull() repeatedly then.

Since I'm now using ipv6_skip_exthdr(), there's just one pull at the end.

>>>> --- a/include/openvswitch/datapath-protocol.h
>>>> +++ b/include/openvswitch/datapath-protocol.h
>>>> @@ -235,6 +235,8 @@ struct odp_flow_key {
>>>>     uint8_t  nw_tos;            /* IP ToS (DSCP field, 6 bits). */
>>>>     uint8_t  arp_sha[6];        /* ARP source hardware address. */
>>>>     uint8_t  arp_tha[6];        /* ARP target hardware address. */
>>>> +    uint32_t ipv6_src[4];       /* IPv6 source address. */
>>>> +    uint32_t ipv6_dst[4];       /* IPv6 destination address. */
>>> 
>>> This structure is really getting unwieldy.  After the ND additions,
>>> it's almost triple the previous size.  I guess it is OK as a temporary
>>> measure but it needs to be addressed soon.
>> 
>> I agree.  I was anticipating that this would go in after the netlink changes, so I didn't attempt to optimize it.  This will hopefully get replaced in a few days.
> 
> That's fine but the optimizations are independent of the Netlink
> changes and doesn't come automatically with the switchover.

Sorry, I had misunderstood you earlier.  I was focused on the communication aspect of the structure and not the lookup.  I thought you were commenting that the structure was growing larger than strictly necessary, since we could combine the IPv6 and IPv4 addresses into the same structure.  We should plan on doing the variable length lookup sooner rather than later, and that should address this problem.

>>> Most of these comments also apply to the related functions in this file.
>> 
>> Okay, I cleaned them up.  I'm not sure the performance benefit from having both 32-bit and 8-bit versions is worth the code bloat.  Someplace like "ipv6_addr_bitand() is fairly compact, but the others are huge.  Thoughts?
> 
> Unless it's performance critical, it's probably not worth it.
> However, depending on what you did for alignment/strict aliasing (like
> if you are making a copy) you might be able to put it in an array of
> longs to automatically get the native word size without any duplicate
> code.

None of those calls are on the fast path, so I just got rid of the 32-bit versions.  (I kept the ipv6_addr_bitand() but it's 32-bit version is more reasonable, and it's more likely that someone would like it to be fast.)

I'll send out a respin in a few minutes with these changes.

--Justin






More information about the dev mailing list