[ovs-dev] MPLS and VLAN QinQ patch

Ravi.Kerur at telekom.com Ravi.Kerur at telekom.com
Thu May 24 21:58:53 UTC 2012


Replies Inline <rk>

-----Original Message-----
From: Ben Pfaff [mailto:blp at nicira.com] 
Sent: Thursday, May 24, 2012 1:45 PM
To: Kerur, Ravi
Cc: dev at openvswitch.org
Subject: Re: [ovs-dev] MPLS and VLAN QinQ patch

On Tue, May 22, 2012 at 07:36:32PM +0200, Ravi.Kerur at telekom.com wrote:
> Attached MPLS and VLAN QinQ patch after rebasing to following commit
> 
> commit 046f1f89e6d7716581de207dd0c54421926bc25b
> Author: Ethan Jackson <ethan at nicira.com<mailto:ethan at nicira.com>>
> Date:   Mon May 21 13:20:18 2012 -0700
> 
> Patch(s) have undergone additional integration testing and 
> incorporates initial code review comments from Ben.

Here's my second and final round of comments for this version of the patch.

Thanks, Ravi!  This is getting much closer.

High-level comments
-------------------

OF1.1 says that, on MPLS push, the MPLS traffic class of the new MPLS tag is only derived from an existing inner MPLS tag, never from an IP TOS, and that if there is no inner MPLS tag it must be set to 0.  I think your code tries to derive it from inner IP headers.

<rk> Generally IP TOS(older version) or IP DSCP values are copied to MPLS EXP bits, 6 bits of DSCP are mapped to 3 EXP bits in MPLS as there can be no more than 5 types of classification it can be easily done. I thought this would make more practical sense since its already deployed than the OF 1.1. Sorry for the confusion here, in my initial patch I had followed OF 1.1 spec and never used TOS from IP, changed it in this version of code as I found it more practical.  


lib/ofp-util.c
--------------

In validate_actions(), the following "if" can never trigger because mpls_ttl is an 8-bit quantity:
+            if (mpls_ttl & ~0xff) {
+                error = OFPERR_OFPBAC_BAD_ARGUMENT;
+            }

<rk> My mistake, will fix it.

get_l3_ttl_and_tos()
--------------------

Why does get_l3_ttl_and_tos() consider an unknown IP version as success, with a default?  This seems odd.

<rk> To handle non-IP traffic i.e push/pop MPLS over non-IP traffic e.g. ARP(practical?), VPLS. I had also mentioned in previous discussion that MPLS ttl actions will not work for these type of traffic but other actions lke push_mpls,pop_mpls, set_mpls_label/set_mpls_tc would be. In this function I keep them to default for non-IP traffic.

In get_l3_ttl_and_tos(), it's unnecessary to check both 'ih' and 'ih6'
for NULL, since they point to the same location.

<rk> agreed, will keep just one.

In get_l3_ttl_and_tos(), I'm pretty sure that "ih->tos & 0x07" is wrong.  By my reading, it extracts the two ECN bits and the lowest-order DSCP bit.  Is that really what you want?

<rk> I will check again. I thought IP DSCP bits are 0 - 5 and 6,7 for ECN. Here the goal is to copy lower 3 bits of DSCP/TOS or copy from mapping table(based on Service Level Agreement) or configuration from the controller. Copying from SLA mapping table can be added later, but configuration can override EXP bits now using "set_mpls_tc" action.

In get_l3_ttl_and_tos(), I wonder about "ih6->ip6_vfc >> 4", too.  By my reading, this will always equal 6, since we know that this is IPv6.
I don't think the "6" in IPv6 means that it always has to have a TOS of 6.

<rk> IPV6 tos bits are 8 bits long(starting from bit 21), I just take lower 3 bits from there. Same argument as above.

In the push_mpls() case, at least, it would be better to check the Ethertype, since we have it, than the first byte of the payload.

<rk> will fix it.

In get_l3_ttl_and_tos(), I don't see any check that the payload is long enough to hold the entire IPv4 or IPV6 header.

<rk> I don't see the need, but will fix it if needed. 

lib/packets.c
-------------

I see many uses of __be<N> in this file.  Those are kernel types.  In userspace, use ovs_be<N>.

<rk> I double checked I don't see any __be*

Most of the functions in this file have a "void" return type and end in an explicit "return;".  You can remove the "return;" since it serves no purpose.

<rk> Old coding style habit. Will remove them.

set_new_mpls_lse() is a little puzzling.  It only sets the fields that are nonzero in 'label'.  What if a caller wants to set a zero value?
(Are zero values reserved in MPLS?)

Why is it still not allowed to push an MPLS header onto non-IP data?
I think that it should be.

<rk> It does push MPLS onto non-IP data. I have tested it on ARP traffic.

In push_mpls(), I don't think either of the checks in the first "if"
is necessary.  If they are, then you can drop the "eh == NULL" check because the packet->size check is sufficient to ensure eh != NULL.
You can definitely drop the "packet->size >= sizeof(struct eth_header)" check in the second "if" in the function, because it duplicates that check from four lines earlier\.

<rk> will fix it.

The cases in push_mpls() are a bit confusing.  What if you separate it into steps:

        1. Determine correct TC, TTL, and BOS based on what's already
           in the packet.  This is the part that changes based on
           what's already in the packet.

        2. Push MPLS header based on information from previous step.
           I'm pretty sure that a single procedure works here,
           regardless of the initial state of the packet.

<rk> I will see if I can simplify further. Code flow is as follows

1. CASE IPv4/IPv4: get_l3_ttl_and_tos(Get layer-3 tos and ttl), push_mpls_lse(def_label, tc, ttl and bos), adjust ofpbuf.
2. CASE MPLS: get_inner_tag_values(copy inner tag values, set stack=0), push_mpls and adjust ofpbuf.
3. CASE VLAN: get_l3_ttl_and_tos, push_mpls and adjust ofpbuf.

I will further simplify this.
 
In pop_mpls(), I think that the initial test should be for size that is at least the size of an ethernet header followed by an MPLS header.

In pop_mpls(), one should write:
        if (mh->mpls_lse & ntohl(MPLS_STACK_MASK)) { instead of:
        if (ntohl(mh->mpls_lse) & MPLS_STACK_MASK) { for performance.

In pop_mpls(), one may write:
            eh->eth_type = ethtype;
instead of:
            char *t_ethtype = (char *)packet->l2_5 - 2;
            ovs_be16 *n_ethtype = (ovs_be16*) t_ethtype;
            *n_ethtype = ethtype;

<rk> will fix it.

ofproto-dpif.c
--------------

I think that say, two, dec_mpls_ttl actions in a single flow, starting from a TTL of, say, 2, will fail to detect reaching TTL 0.  Also, the existing implementation of dec ttl for IP stop translating actions after reaching TTL 0.  Presumably the MPLS implementation should do the same.


ovs-dpctl.c
-----------

In do_normalize_actions(), I'd start using a switch statement.

<rk> will fix it.

Thanks,
Ravi



More information about the dev mailing list