[ovs-dev] MPLS and VLAN QinQ patch

Ravi.Kerur at telekom.com Ravi.Kerur at telekom.com
Thu May 24 23:24:00 UTC 2012


Thanks Ben for your thorough review. Comments inline <rk>

-----Original Message-----
From: dev-bounces at openvswitch.org [mailto:dev-bounces at openvswitch.org] On Behalf Of Ben Pfaff
Sent: Thursday, May 24, 2012 4:02 PM
To: Kerur, Ravi
Cc: dev at openvswitch.org
Subject: Re: [ovs-dev] MPLS and VLAN QinQ patch

On Thu, May 24, 2012 at 11:58:53PM +0200, Ravi.Kerur at telekom.com wrote:
> Ben Pfaff writes:
> > get_l3_ttl_and_tos()
> > --------------------
> > 
> > Why does get_l3_ttl_and_tos() consider an unknown IP version as 
> > success, with a default?  This seems odd.
> 
> To handle non-IP traffic i.e push/pop MPLS over non-IP traffic e.g. 
> ARP(practical?), VPLS. I had also mentioned in previous discussion 
> that MPLS ttl actions will not work for these type of traffic but 
> other actions lke push_mpls,pop_mpls, set_mpls_label/set_mpls_tc would 
> be. In this function I keep them to default for non-IP traffic.

But that only comes up in one situation: copying the TTL inward or outward when there is exactly one MPLS label.  In other situations (pushing an MPLS label or popping one) we know the correct inner Ethertype either because it's in the packet or because it's an argument to the action.  When we know it, we should use it. 

<rk> Cases for multiple MPLS headers currently happen only at Provider core routers and yes in that case we don't have to worry, I was thinking more in terms of Provider-Edge or MPLS termination point.  For those multiple MPLS label case it is already handled and I use existing mpls header to copy.  I will closely look at it one more time probably I am missing your point. I think you are probably saying extract ethtype from packet and use it to derive ttl and tos?

> In get_l3_ttl_and_tos(), I'm pretty sure that "ih->tos & 0x07" is 
> wrong.  By my reading, it extracts the two ECN bits and the 
> lowest-order DSCP bit.  Is that really what you want?
> 
> I will check again. I thought IP DSCP bits are 0 - 5 and 6,7 for ECN. 
> Here the goal is to copy lower 3 bits of DSCP/TOS or copy from mapping 
> table(based on Service Level Agreement) or configuration from the 
> controller. Copying from SLA mapping table can be added later, but 
> configuration can override EXP bits now using "set_mpls_tc" action.

Please do check again.  We have
    #define IP_ECN_MASK 0x03
    #define IP_DSCP_MASK 0xfc
in packets.h and I'm sure that's correct.

<rk>  I had tested with "ping -Q <1-7> <dst-ip>" and I had seen correct TOS bits copied into MPLS headers. I will fix it as needed.

> > In get_l3_ttl_and_tos(), I wonder about "ih6->ip6_vfc >> 4", too.
> > By my reading, this will always equal 6, since we know that this is 
> > IPv6.  I don't think the "6" in IPv6 means that it always has to 
> > have a TOS of 6.
> 
> IPV6 tos bits are 8 bits long(starting from bit 21), I just take lower 
> 3 bits from there. Same argument as above.

Please check again.  We have
    #define IP_VER(ip_ihl_ver) ((ip_ihl_ver) >> 4) and
    #define IP6_VER(ip6_vfc) ((ip6_vfc) >> 4) in packets.h and I'm sure that's correct.

<rk> Will double check.

> > In get_l3_ttl_and_tos(), I don't see any check that the payload is 
> > long enough to hold the entire IPv4 or IPV6 header.
> 
> I don't see the need, but will fix it if needed. 

By omitting the check we could incur a segfault.  Please check the length.

<rk> I have added the check. I got confused with checking the payload itself and was questioning why?

> > lib/packets.c
> > -------------
> > 
> > I see many uses of __be<N> in this file.  Those are kernel types.
> > In userspace, use ovs_be<N>.
> 
> I double checked I don't see any __be*

Hmm, I must have been looked elsewhere.  Sorry.

> > Why is it still not allowed to push an MPLS header onto non-IP data?
> > I think that it should be.
> 
> It does push MPLS onto non-IP data. I have tested it on ARP traffic.

push_mpls() calls push_mpls_lse() in this case.  push_mpls_lse() starts out this:
    /* neither ipv4 or ipv6 return; */
which implies that it does nothing to non-IP.  Please remove the comment.

<rk> I  have fixed all the comments and its format as well.

Thanks,
Ravi

Thanks,

Ben.
_______________________________________________
dev mailing list
dev at openvswitch.org
http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list