[ovs-dev] MPLS important comments (was: Re: Patch for MPLS)

Ravi.Kerur at telekom.com Ravi.Kerur at telekom.com
Mon Mar 12 23:39:02 UTC 2012


Attached latest patch after incorporating code review comments. I will go through code review comments one more time to make sure nothing is missed.

Jesse: If you could please review kernel changes it will be helpful.

I have tested code after changes on CentOS 6.2 and Ubuntu 11.10. 

"make check" and "make distcheck" matches latest git code results. 

Thanks
Ravi

-----Original Message-----
From: Kerur, Ravi 
Sent: Monday, March 12, 2012 1:58 PM
To: Ben Pfaff
Cc: dev at openvswitch.org
Subject: RE: MPLS important comments (was: Re: [ovs-dev] Patch for MPLS)

Thanks Ben for the review. Please see inline for <rk>, mostly agreeing to changes, but some requires inputs/clarifications. 

-----Original Message-----
From: Ben Pfaff [mailto:blp at nicira.com] 
Sent: Friday, March 09, 2012 11:20 AM
To: Kerur, Ravi
Cc: dev at openvswitch.org
Subject: MPLS important comments (was: Re: [ovs-dev] Patch for MPLS)

Thank you for the patch!  I have some comments.  I'm dividing them
into two groups.  The first group, in this email, is comments that
point out likely bugs.  The second group, which I will send
separately, is comments that point out less important issues.

I didn't review the kernel code.  That will have to wait for Jesse.


include/linux/openvswitch.h
---------------------------

We don't usually create structs for the arguments when the arguments
are simple types such as a single integer.  Instead, just indicate in
the comment on the side the type and in the big comment on "enum
ovs_action_attr" the effect of the action.  Actually every new action
needs to be explained in the big comment.

Your change renumbers OVS_ACTION_ATTR_SAMPLE but we can't do that
because it's part of the ABI.

<rk> When I started implementation I thought it will be good to store it in a variable and might have to reuse in some other part of the code. On a second look it doesn't look like that's the case. I have made necessary changes.

lib/dpif-netdev.c
-----------------

OVS_ACTION_ATTR_SET_MPLS_LABEL takes an entire MPLS LSE as its
argument, and the odp-util.c code parses an entire MPLS LSE (except
the bottom-of-stack bit, which I've commented on elsewhere), but the
dpif-netdev.c code only sets the "label" bits from the LSE into the
packet.  The "datapath" version of the code, on the other hand, does
something more complicated (it appears to set all the fields within
the LSE that are nonzero into the packet).  This looks very
inconsistent and probably wrong.

<rk> OVS_ACTION_ATTR_SET_MPLS_LABEL was a misnomer. Changed it to OVS_ACTION_ATTR_SET_MPLS_LSE. Moreover, userspace code was not tested until after potential fix was sent out. It's now inline with kernel and rest of the code base.

lib/flow.c
----------

This adds some support for 802.1ad but I'd prefer to keep that
separate from MPLS support, so please remove the 802.1ad stuff from
this patch.

<rk> Done.

flow_extract() now has code to skip over inner VLAN tags.  What's the
rationale for that?  It seems unwise to me, because it makes it
impossible for a controller to distinguish a packet with one VLAN tag
from one with a dozen.  The implementation is also not correct as far
as I can tell, because it can overrun the end of the packet.

For the same reason I wonder whether we should really skip over all
the MPLS labels?

<rk> I don't know of any use-case where a packet can carry more than 2 vlan tags. Idea is to parse all vlan_tags/mpls_labels such that packet->l3 is correctly set after the parse. While parsing multiple tags/labels update flow->vlan_tci and flow->mpls_lse only after parsing first tag/label. If this is not the right way to do it let me know how you plan to handle this. I will incorporate it.

I found the flow_extract() treatment confusing.  It skips past all the
MPLS labels, which would imply that we are going to look at packet
data beyond them, but that will in fact never happen (because the
dl_type is always MPLS).  We do want to skip the MPLS header so that
we can set the l3 pointer properly.

<rk> Please see above response.

flow_is_mpls_label_valid() looks wrong to me, because I think that
the left-shift in computing 'tmp_label' could discard some high 1-bits
that aren't really valid.  I would consider writing it as:
    return !(mpls_label & htonl(MPLS_LABEL_MASK >> MPLS_LABEL_SHIFT));
which I think has the desired effect.

<rk> Fixed this.

flow_format() doesn't show the "bottom of stack" bit.  Should it?  (It
looks like the kernel populates it from the packet, so I would expect
so.)

<rk> Good to have it. Added stack information.

I have a concern about flow_hash_symmetric_l4().  This function is not
well-documented, but it's intent is that packets in either direction
in a pair of related flows (e.g. the two flows that make up a TCP
connection) will have the same hash.  Will two flows related in that
way have the same MPLS LSE?  This function should only hash the
mpls_lse if the answer is "yes".

<rk> Theoretically the answer is "yes" but I am not 100% on it. I would like to leave it there unless I see it's not needed at all.

lib/meta-flow.c
---------------

I guess that MFF_MPLS_LABEL and MFP_MPLS_TC should have a
prerequisite, because the MPLS label may only be matched if the
Ethernet type indicates that the packet is a MPLS packet (right?).  So
I would add an MFP_MPLS and appropriate support for it.

<rk> Fixed it.

I think that the definitions in mf_is_value_valid() and mf_get_value()
are contradictory.  mf_is_value_valid() insists that the label be in
the least significant 20 bits of value->be32, but mf_get_value() puts
it in the most significant 20 bits.  I suggest that both should use
the least significant 20 bits.

I think that mf_random_value() has the same problem.

<rk> Code is written with flow->mpls_lse following MPLS format because it is easier for anyone debugging it. Issue is when a user configures for a match based on a label it's in lower 20 bits. In mf_set_value, flow->mpls_lse is updated and it updates according to MPLS format. For nx_put_match, it follows the spec by putting the label in lower 20bits. mf_is_value_valid is called from nx_pull_match in which case label is in lower 20bits and hence the deviation. 

mf_set_value, mf_get_value and mf_random_value follow the same format. I am bit reluctant to make changes now since it requires complete code changes and I am not seeing any added value in doing it.


lib/nx-match.c
--------------

Like flow_format(), I would expect format_mpls_label() to print the
"bottom of stack" bit.  Similarly with the parsing code in
parse_odp_actions() and parse_odp_key_attr().

format_odp_action() and other code expects that
OVS_ACTION_ATTR_POP_MPLS has an ethertype argument, but
parse_odp_action() doesn't supply one.

<rk> Fixed it.

lib/odp-util.h
--------------

In parse_odp_action(), the number here should be 16, not 11, in two
places:
+    if (!strncmp(s, "copy_mpls_ttl_in", 11)) {
+        nl_msg_put_flag(actions, OVS_ACTION_ATTR_COPY_TTL_IN);
+        return 11;
+    }

and here it should be 17, not 12, in two places:
+    if (!strncmp(s, "copy_mpls_ttl_out", 12)) {
+        nl_msg_put_flag(actions, OVS_ACTION_ATTR_COPY_TTL_OUT);
+        return 12;
+    }

<rk> Initially it was just "copy_ttl_in/out", later on I changed it to "copy_mpls_ttl_in/out", but forgot to update it here. Fixed it now.

lib/ofp-parse.c
---------------

In parse_protocol(), there are two entries for "mpls".  Only the first
one will have an effect, the second one would need a different name.

<rk> Agreed. Have changed it i.e mpls -> unicast, mplsm -> multicast

lib/ofp-print.c
---------------

In ofp_print_action(), please put "0x" into the following formats.
Otherwise, the values printed will be parsed as decimal if they are
fed back into the parser:
+   case OFPUTIL_NXAST_PUSH_MPLS:
+        nampush = (const struct nx_action_push_mpls *) a;
+        ds_put_format(s, "push_mpls:%"PRIx16, ntohs(nampush->ethertype));
+        break;
+
+    case OFPUTIL_NXAST_POP_MPLS:
+        nampop = (const struct nx_action_pop_mpls *) a;
+        ds_put_format(s, "pop_mpls:%"PRIx16, ntohs(nampop->ethertype));
+        break;

<rk> Fixed.


In ofp_match_to_string(), I don't think it's a good idea to abbreviate
both MPLS ethertypes the same way.  Otherwise, the shorthand loses
information.

<rk> Fixed.


lib/packets.c
-------------

dec_mpls_ttl() will decrement a ttl of 0 to 255.  Is that acceptable
behavior?  Perhaps the comment should mention it.

copy_mpls_ttl_in() and copy_mpls_ttl_out() assume that the l3 header
is present and that it is IPv4, or alternatively that an inner MPLS
header is present, without checking.

set_mpls_lse_tc() is buggy.  It needs to do the shift before the mask.

set_mps_lse_stack() is buggy the same way.

push_mpls() reads the IP header without first checking that it is
present.

<rk> As mentioned earlier, userspace was not tested at all until last week. Taken care of it after unit-testing.

It appears that push_mpls() refuses to add an MPLS header if there is
a VLAN tag and the inner Ethertype is not IPv4.  Why?

<rk> I didn't think any other L3 protocol would fit into the use-case. For IPV6, it has its own label and I am not sure there would be a case for <VLAN/IPV6> packet to push a new MPLS label. Make sense? Or are you referring to other L3 protocol aka IPX...?

ofproto/ofproto-dpif.c
----------------------

Did you carefully work out how the actions are implemented?  Does the
right thing happen (a no-op) if an OpenFlow flow has a "push mpls"
action followed by a "pop mpls" action?  etc.

<rk> I don't think push_mpls followed by pop_mpls as a nop is a valid use case. However, pop_mpls followed by push_mpls is a valid use case for label swap. I have tested code for both scenarios with latest code and it works fine.

I'm not sure, in fact, that it makes sense to try to delay committing
MPLS actions.  It seems tricky, to me, at least.  I'd be tempted to
write them to the ODP actions immediately rather than trying to defer
it.  The code would be simpler and more obviously correct, at any rate.

<rk> When I started off implementation I wanted to take similar approach of the existing code since there is nothing special in MPLS. Is there a specific reason why existing code does it? I have made necessary changes not to delay mpls actions.


tests/test-mpls.c
-----------------

I don't understand why this program is here.  You should be able to
write unit tests for this feature without actually opening raw
sockets, etc.  We have sufficient tools to test packets through the
"dummy" interfaces, which some of the existing unit tests.  At any
rate I believe that this test program requires "root" privileges to
work.

<rk> test-mpls.c is the main unit-test function I have used for all mpls and regression testing. For "root" privileges, I believe even "make install" or "make check" requires it, so I am not sure why this is a special case?

You didn't add any unit tests, just this test program.  That means
that "make check" doesn't actually test anything MPLS related.  We
need at least basic unit tests.

<rk> I will add unit-test cases for mpls. Since I need to do this for vlan qinq as well, I would like to combine them and send it as a separate patch. 

<rk> Now, a request. It's really hard to carry around this huge change for a long time, because everytime I merge I see conflict in different areas of code and have to retest it each time. If you could let me push the modified code(I will send out updated patch shortly) into targeted release it would really help me. I will work on unit-tests shortly after the push and get back to you at the earliest. I have made sure "make check" matches latest code in master. Since new code is not breaking existing functionality it would really help me if it finds a home.

Thanks,
Ravi

Thanks,

Ben.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-MPLS-functionality-based-on-OF-1.1-Specification(1).patch
Type: application/octet-stream
Size: 123208 bytes
Desc: 0001-Add-MPLS-functionality-based-on-OF-1.1-Specification(1).patch
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20120313/94da2634/attachment-0004.obj>


More information about the dev mailing list