[ovs-dev] MPLS important comments (was: Re: Patch for MPLS)

Tue Mar 13 18:38:11 UTC 2012

Thanks Ben, on what Linux version you are testing this? When I sent out the patch yesterday I had tested "make C=1" on Ubuntu 11.10(3.0 kernel) and CentOS 6.2(2.6 kernel) and had not seen sparse issues. I re-ran "make C=1" again today with the patch I sent out yesterday and didn't see any warnings. Please let me know on what version you are running this I will fix it.

Thanks
Ravi

-----Original Message-----
From: Ben Pfaff [mailto:blp at nicira.com]
Sent: Tuesday, March 13, 2012 10:36 AM
To: Kerur, Ravi
Cc: jesse at nicira.com; dev at openvswitch.org
Subject: Re: MPLS important comments (was: Re: [ovs-dev] Patch for MPLS)

Thanks for the patch, it applies cleanly against current "master".  (I
had missed it on the first pass through my inbox this morning.)

I see the following "sparse" warnings for the kernel code.  It
otherwise builds cleanly, and I can confirm that the unit tests pass
for me.

actions.c:244:36: warning: incorrect type in assignment (different base types)
actions.c:244:36:    expected unsigned int [unsigned] [usertype] <noident>
actions.c:244:36:    got restricted __be32 [usertype] <noident>
actions.c:265:32: warning: cast to restricted __be32
actions.c:265:32: warning: cast to restricted __be32
actions.c:265:32: warning: cast to restricted __be32
actions.c:265:32: warning: cast to restricted __be32
actions.c:265:32: warning: cast to restricted __be32
actions.c:265:32: warning: cast to restricted __be32
actions.c:325:32: warning: cast to restricted __be32
actions.c:325:32: warning: cast to restricted __be32
actions.c:325:32: warning: cast to restricted __be32
actions.c:325:32: warning: cast to restricted __be32
actions.c:325:32: warning: cast to restricted __be32
actions.c:325:32: warning: cast to restricted __be32
actions.c:346:45: warning: incorrect type in assignment (different base types)
actions.c:346:45:    expected unsigned int [unsigned] [usertype] <noident>
actions.c:346:45:    got restricted __be32 [usertype] <noident>
actions.c:362:32: warning: cast to restricted __be32
actions.c:362:32: warning: cast to restricted __be32
actions.c:362:32: warning: cast to restricted __be32
actions.c:362:32: warning: cast to restricted __be32
actions.c:362:32: warning: cast to restricted __be32
actions.c:362:32: warning: cast to restricted __be32
actions.c:369:45: warning: incorrect type in assignment (different base types)
actions.c:369:45:    expected unsigned int [unsigned] [usertype] <noident>
actions.c:369:45:    got restricted __be32 [usertype] <noident>
actions.c:386:32: warning: cast to restricted __be32
actions.c:386:32: warning: cast to restricted __be32
actions.c:386:32: warning: cast to restricted __be32
actions.c:386:32: warning: cast to restricted __be32
actions.c:386:32: warning: cast to restricted __be32
actions.c:386:32: warning: cast to restricted __be32
actions.c:393:41: warning: cast to restricted __be32
actions.c:393:41: warning: cast to restricted __be32
actions.c:393:41: warning: cast to restricted __be32
actions.c:393:41: warning: cast to restricted __be32
actions.c:393:41: warning: cast to restricted __be32
actions.c:393:41: warning: cast to restricted __be32
actions.c:396:58: warning: incorrect type in assignment (different base types)
actions.c:396:58:    expected unsigned int [unsigned] [usertype] <noident>
actions.c:396:58:    got restricted __be32 [usertype] <noident>
actions.c:412:32: warning: cast to restricted __be32
actions.c:412:32: warning: cast to restricted __be32
actions.c:412:32: warning: cast to restricted __be32
actions.c:412:32: warning: cast to restricted __be32
actions.c:412:32: warning: cast to restricted __be32
actions.c:412:32: warning: cast to restricted __be32
actions.c:419:53: warning: incorrect type in assignment (different base types)
actions.c:419:53:    expected unsigned int [unsigned] [usertype] <noident>
actions.c:419:53:    got restricted __be32 [usertype] <noident>
actions.c:421:41: warning: cast to restricted __be32
actions.c:421:41: warning: cast to restricted __be32
actions.c:421:41: warning: cast to restricted __be32
actions.c:421:41: warning: cast to restricted __be32
actions.c:421:41: warning: cast to restricted __be32
actions.c:421:41: warning: cast to restricted __be32
actions.c:425:53: warning: incorrect type in assignment (different base types)
actions.c:425:53:    expected unsigned int [unsigned] [usertype] <noident>
actions.c:425:53:    got restricted __be32 [usertype] <noident>

Thanks,

Ben.

On Tue, Mar 13, 2012 at 12:39:02AM +0100, Ravi.Kerur at telekom.com wrote:
> Attached latest patch after incorporating code review comments. I
> will go through code review comments one more time to make sure
> nothing is missed.
>
> Jesse: If you could please review kernel changes it will be helpful.
>
> I have tested code after changes on CentOS 6.2 and Ubuntu 11.10.
>
> "make check" and "make distcheck" matches latest git code results.
>
> Thanks
> Ravi
>
> -----Original Message-----
> From: Kerur, Ravi
> Sent: Monday, March 12, 2012 1:58 PM
> To: Ben Pfaff
> Cc: dev at openvswitch.org
> Subject: RE: MPLS important comments (was: Re: [ovs-dev] Patch for MPLS)
>
> Thanks Ben for the review. Please see inline for <rk>, mostly agreeing to changes, but some requires inputs/clarifications.
>
> -----Original Message-----
> From: Ben Pfaff [mailto:blp at nicira.com]
> Sent: Friday, March 09, 2012 11:20 AM
> To: Kerur, Ravi
> Cc: dev at openvswitch.org
> Subject: MPLS important comments (was: Re: [ovs-dev] Patch for MPLS)
>
> Thank you for the patch!  I have some comments.  I'm dividing them
> into two groups.  The first group, in this email, is comments that
> point out likely bugs.  The second group, which I will send
> separately, is comments that point out less important issues.
>
> I didn't review the kernel code.  That will have to wait for Jesse.
>
>
> include/linux/openvswitch.h
> ---------------------------
>
> We don't usually create structs for the arguments when the arguments
> are simple types such as a single integer.  Instead, just indicate in
> the comment on the side the type and in the big comment on "enum
> ovs_action_attr" the effect of the action.  Actually every new action
> needs to be explained in the big comment.
>
> Your change renumbers OVS_ACTION_ATTR_SAMPLE but we can't do that
> because it's part of the ABI.
>
> <rk> When I started implementation I thought it will be good to store it in a variable and might have to reuse in some other part of the code. On a second look it doesn't look like that's the case. I have made necessary changes.
>
> lib/dpif-netdev.c
> -----------------
>
> OVS_ACTION_ATTR_SET_MPLS_LABEL takes an entire MPLS LSE as its
> argument, and the odp-util.c code parses an entire MPLS LSE (except
> the bottom-of-stack bit, which I've commented on elsewhere), but the
> dpif-netdev.c code only sets the "label" bits from the LSE into the
> packet.  The "datapath" version of the code, on the other hand, does
> something more complicated (it appears to set all the fields within
> the LSE that are nonzero into the packet).  This looks very
> inconsistent and probably wrong.
>
> <rk> OVS_ACTION_ATTR_SET_MPLS_LABEL was a misnomer. Changed it to OVS_ACTION_ATTR_SET_MPLS_LSE. Moreover, userspace code was not tested until after potential fix was sent out. It's now inline with kernel and rest of the code base.
>
> lib/flow.c
> ----------
>
> This adds some support for 802.1ad but I'd prefer to keep that
> separate from MPLS support, so please remove the 802.1ad stuff from
> this patch.
>
> <rk> Done.
>
> flow_extract() now has code to skip over inner VLAN tags.  What's the
> rationale for that?  It seems unwise to me, because it makes it
> impossible for a controller to distinguish a packet with one VLAN tag
> from one with a dozen.  The implementation is also not correct as far
> as I can tell, because it can overrun the end of the packet.
>
> For the same reason I wonder whether we should really skip over all
> the MPLS labels?
>
> <rk> I don't know of any use-case where a packet can carry more than 2 vlan tags. Idea is to parse all vlan_tags/mpls_labels such that packet->l3 is correctly set after the parse. While parsing multiple tags/labels update flow->vlan_tci and flow->mpls_lse only after parsing first tag/label. If this is not the right way to do it let me know how you plan to handle this. I will incorporate it.
>
> I found the flow_extract() treatment confusing.  It skips past all the
> MPLS labels, which would imply that we are going to look at packet
> data beyond them, but that will in fact never happen (because the
> dl_type is always MPLS).  We do want to skip the MPLS header so that
> we can set the l3 pointer properly.
>
> <rk> Please see above response.
>
> flow_is_mpls_label_valid() looks wrong to me, because I think that
> the left-shift in computing 'tmp_label' could discard some high 1-bits
> that aren't really valid.  I would consider writing it as:
>     return !(mpls_label & htonl(MPLS_LABEL_MASK >> MPLS_LABEL_SHIFT));
> which I think has the desired effect.
>
> <rk> Fixed this.
>
> flow_format() doesn't show the "bottom of stack" bit.  Should it?  (It
> looks like the kernel populates it from the packet, so I would expect
> so.)
>
> <rk> Good to have it. Added stack information.
>
> I have a concern about flow_hash_symmetric_l4().  This function is not
> well-documented, but it's intent is that packets in either direction
> in a pair of related flows (e.g. the two flows that make up a TCP
> connection) will have the same hash.  Will two flows related in that
> way have the same MPLS LSE?  This function should only hash the
> mpls_lse if the answer is "yes".
>
> <rk> Theoretically the answer is "yes" but I am not 100% on it. I would like to leave it there unless I see it's not needed at all.
>
> lib/meta-flow.c
> ---------------
>
> I guess that MFF_MPLS_LABEL and MFP_MPLS_TC should have a
> prerequisite, because the MPLS label may only be matched if the
> Ethernet type indicates that the packet is a MPLS packet (right?).  So
> I would add an MFP_MPLS and appropriate support for it.
>
> <rk> Fixed it.
>
> I think that the definitions in mf_is_value_valid() and mf_get_value()
> are contradictory.  mf_is_value_valid() insists that the label be in
> the least significant 20 bits of value->be32, but mf_get_value() puts
> it in the most significant 20 bits.  I suggest that both should use
> the least significant 20 bits.
>
> I think that mf_random_value() has the same problem.
>
> <rk> Code is written with flow->mpls_lse following MPLS format because it is easier for anyone debugging it. Issue is when a user configures for a match based on a label it's in lower 20 bits. In mf_set_value, flow->mpls_lse is updated and it updates according to MPLS format. For nx_put_match, it follows the spec by putting the label in lower 20bits. mf_is_value_valid is called from nx_pull_match in which case label is in lower 20bits and hence the deviation.
>
> mf_set_value, mf_get_value and mf_random_value follow the same format. I am bit reluctant to make changes now since it requires complete code changes and I am not seeing any added value in doing it.
>
>
> lib/nx-match.c
> --------------
>
> Like flow_format(), I would expect format_mpls_label() to print the
> "bottom of stack" bit.  Similarly with the parsing code in
> parse_odp_actions() and parse_odp_key_attr().
>
> format_odp_action() and other code expects that
> OVS_ACTION_ATTR_POP_MPLS has an ethertype argument, but
> parse_odp_action() doesn't supply one.
>
> <rk> Fixed it.
>
> lib/odp-util.h
> --------------
>
> In parse_odp_action(), the number here should be 16, not 11, in two
> places:
> +    if (!strncmp(s, "copy_mpls_ttl_in", 11)) {
> +        nl_msg_put_flag(actions, OVS_ACTION_ATTR_COPY_TTL_IN);
> +        return 11;
> +    }
>
> and here it should be 17, not 12, in two places:
> +    if (!strncmp(s, "copy_mpls_ttl_out", 12)) {
> +        nl_msg_put_flag(actions, OVS_ACTION_ATTR_COPY_TTL_OUT);
> +        return 12;
> +    }
>
> <rk> Initially it was just "copy_ttl_in/out", later on I changed it to "copy_mpls_ttl_in/out", but forgot to update it here. Fixed it now.
>
> lib/ofp-parse.c
> ---------------
>
> In parse_protocol(), there are two entries for "mpls".  Only the first
> one will have an effect, the second one would need a different name.
>
> <rk> Agreed. Have changed it i.e mpls -> unicast, mplsm -> multicast
>
> lib/ofp-print.c
> ---------------
>
> In ofp_print_action(), please put "0x" into the following formats.
> Otherwise, the values printed will be parsed as decimal if they are
> fed back into the parser:
> +   case OFPUTIL_NXAST_PUSH_MPLS:
> +        nampush = (const struct nx_action_push_mpls *) a;
> +        ds_put_format(s, "push_mpls:%"PRIx16, ntohs(nampush->ethertype));
> +        break;
> +
> +    case OFPUTIL_NXAST_POP_MPLS:
> +        nampop = (const struct nx_action_pop_mpls *) a;
> +        ds_put_format(s, "pop_mpls:%"PRIx16, ntohs(nampop->ethertype));
> +        break;
>
> <rk> Fixed.
>
>
> In ofp_match_to_string(), I don't think it's a good idea to abbreviate
> both MPLS ethertypes the same way.  Otherwise, the shorthand loses
> information.
>
> <rk> Fixed.
>
>
> lib/packets.c
> -------------
>
> dec_mpls_ttl() will decrement a ttl of 0 to 255.  Is that acceptable
> behavior?  Perhaps the comment should mention it.
>
> copy_mpls_ttl_in() and copy_mpls_ttl_out() assume that the l3 header
> is present and that it is IPv4, or alternatively that an inner MPLS
> header is present, without checking.
>
> set_mpls_lse_tc() is buggy.  It needs to do the shift before the mask.
>
> set_mps_lse_stack() is buggy the same way.
>
> push_mpls() reads the IP header without first checking that it is
> present.
>
> <rk> As mentioned earlier, userspace was not tested at all until last week. Taken care of it after unit-testing.
>
> It appears that push_mpls() refuses to add an MPLS header if there is
> a VLAN tag and the inner Ethertype is not IPv4.  Why?
>
> <rk> I didn't think any other L3 protocol would fit into the use-case. For IPV6, it has its own label and I am not sure there would be a case for <VLAN/IPV6> packet to push a new MPLS label. Make sense? Or are you referring to other L3 protocol aka IPX...?
>
> ofproto/ofproto-dpif.c
> ----------------------
>
> Did you carefully work out how the actions are implemented?  Does the
> right thing happen (a no-op) if an OpenFlow flow has a "push mpls"
> action followed by a "pop mpls" action?  etc.
>
> <rk> I don't think push_mpls followed by pop_mpls as a nop is a valid use case. However, pop_mpls followed by push_mpls is a valid use case for label swap. I have tested code for both scenarios with latest code and it works fine.
>
> I'm not sure, in fact, that it makes sense to try to delay committing
> MPLS actions.  It seems tricky, to me, at least.  I'd be tempted to
> write them to the ODP actions immediately rather than trying to defer
> it.  The code would be simpler and more obviously correct, at any rate.
>
> <rk> When I started off implementation I wanted to take similar approach of the existing code since there is nothing special in MPLS. Is there a specific reason why existing code does it? I have made necessary changes not to delay mpls actions.
>
>
> tests/test-mpls.c
> -----------------
>
> I don't understand why this program is here.  You should be able to
> write unit tests for this feature without actually opening raw
> sockets, etc.  We have sufficient tools to test packets through the
> "dummy" interfaces, which some of the existing unit tests.  At any
> rate I believe that this test program requires "root" privileges to
> work.
>
> <rk> test-mpls.c is the main unit-test function I have used for all mpls and regression testing. For "root" privileges, I believe even "make install" or "make check" requires it, so I am not sure why this is a special case?
>
> You didn't add any unit tests, just this test program.  That means
> that "make check" doesn't actually test anything MPLS related.  We
> need at least basic unit tests.
>
> <rk> I will add unit-test cases for mpls. Since I need to do this for vlan qinq as well, I would like to combine them and send it as a separate patch.
>
> <rk> Now, a request. It's really hard to carry around this huge change for a long time, because everytime I merge I see conflict in different areas of code and have to retest it each time. If you could let me push the modified code(I will send out updated patch shortly) into targeted release it would really help me. I will work on unit-tests shortly after the push and get back to you at the earliest. I have made sure "make check" matches latest code in master. Since new code is not breaking existing functionality it would really help me if it finds a home.
>
> Thanks,
> Ravi
>
> Thanks,
>
> Ben.