[ovs-dev] [PATCH v4 1/4] Add support for 802.1ad (QinQ tunneling)

Sun Jul 31 00:22:47 UTC 2016

Thanks! I'm working on code changes according to your comments. I
think we need more discussion about the ethertype matching. Please see
inline.

Thanks,
Xiao

On Thu, Jul 28, 2016 at 2:40 AM, Ben Pfaff <blp at ovn.org> wrote:
> On Tue, Jul 12, 2016 at 11:38:54PM +0800, Xiao Liang wrote:
>> Flow key handleing changes:
>> - Add VLAN header array in struct flow, to record multiple 802.1q VLAN
>>   headers.
>> - Add dpif multi-VLAN capability probing. If datapath supports multi-VLAN,
>>   increase the maximum depth of nested OVS_KEY_ATTR_ENCAP.
>>
>> Refacter VLAN handling in dpif-xlate:
>> - Introduce 'xvlan' to track VLAN stack during flow processing.
>> - Input and output VLAN translation according to the xbundle type.
>>
>> Push VLAN action support:
>> - Allow ethertype 0x88a8 in VLAN headers and push_vlan action.
>> - Support push_vlan on dot1q packets.
>>
>> Add new port VLAN mode "dot1q-tunnel":
>> - Example:
>>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100
>>   Pushes another VLAN 100 header on packets (tagged and untagged) on ingress,
>>   and pops it on egress.
>> - Customer VLAN check:
>>     ovs-vsctl set Port p1 vlan_mode=dot1q-tunnel tag=100 cvlans=10,20
>>   Only customer VLAN of 10 and 20 are allowed.
>>
>> Signed-off-by: Xiao Liang <shaw.leon at gmail.com>
>
> Thanks for working on this!  I can see that it was a great deal of
> work.  In addition to the "sparse" related fixes from a few days ago,
> I have some more detailed comments.
>
> It looks like this patch causes some tests to fail and then later
> patches fix them.  This is not our practice: a patch should never make
> tests fails (at least not intentionally).  Instead, if a patch
> requires changes to tests, then the same patch should update the
> tests.  If the later patches are required to fix tests, they should be
> folded into this one.

I will squash the commits.

>
> Our practice is to give arrays names that are plural, like 'vlans'.
>
> Let's talk about VLANs in the flow struct.  The use of an array of
> TPID+TCI seems fine.  I think, however, that we need to define and
> enforce an invariant: if vlan[i] matches on anything, for i > 0, then
> every vlan[j], for j < i, needs to match on (tci & VLAN_CFI) != 0.
> Otherwise we can end up with nonsensical matches like one that matches
> on vlan[0] being not present (i.e. match on (vlan[0] & VLAN_CFI) == 0)
> but vlan[1] being present (i.e. match on (vlan[1] & VLAN_CFI) != 0).
> We enforce a similar invariant for MPLS.
>

You are right, thanks.

> I'm concerned about backward compatibility.  Consider some application
> built on Open vSwitch using OpenFlow.  Today, it can distinguish
> single-tagged and double-tagged packets (that use outer Ethertype
> 0x8100), as follows:
>
>     - A single-tagged packet has vlan_tci != 0 and some non-VLAN
>       dl_type.
>
>     - A double-tagged packet has vlan_tci != 0 and a VLAN dl_type.
>
> With this patch, this won't work, because neither kind of packet has a
> VLAN dl_type.  Instead, applications need to first match on the outer
> VLAN, then pop it off, then match on the inner VLAN.  This difference
> could lead to security problems in applications.  An application
> might, for example, want to pop an outer VLAN and forward the packet,
> but only if there is no inner VLAN.  If it is implemented according to
> the previous rules, then it will not notice the inner VLAN.

Maybe some applications are implemented this way, but they are
probably wrong. OpenFlow says eth_type is "ethernet type of the
OpenFlow packet payload, after VLAN tags", so it is the payload
ethtype for a double-tagged packet. It's the same for single-tagged
packet: application must explicitly match vlan_tci to decide whether
it has VLAN tag.
The problem is that there is currently no way to peek inner VLAN
without popping the outer one. I heard from Tom that you have plan to
support TPID matching. Is it possible to add extension match fields
like TPID1, TPID2 to match both TPIDs? This may also provide a way to
differentiate 0x8100 and 0x88a8.

>
> There are probably multiple ways to deal with this problem.  Let me
> propose one.  It is somewhat awkward, so there might be a more
> graceful way.  Basically the idea is to retain the current view from
> an OpenFlow perspective:
>
>     - Packet without tags: vlan_tci == 0, dl_type is non-VLAN
>     - Packet with 1 tag:   vlan_tci != 0, dl_type is non-VLAN
>     - Packet with 2+ tags: vlan_tci != 0, dl_type is 2nd VLAN
>
> So, when a packet with 2 tags pops off the outermost one, dl_type
> becomes the inner Ethertype (such as IPv4) and vlan_tci becomes the
> single remaining VLAN.
>
> I think there's another backward compatibility risk.  This patch adds
> support for TPID 0x88a8 without adding any way for OpenFlow
> applications to distinguish 88a8 from 8100.  This means that
> applications that previously handled 0x8100 VLANs will now handle
> 0x88a8 VLANs whereas previously they were able to filter these out.  I
> don't have a wonderful idea on how to handle this but I think we need
> some way.  (The OpenFlow spec is totally unhelpful here.)
>
> The tests seem incomplete in that they do not seem to add much in the
> way of tests for OpenFlow handling of multiple VLANs.  I'd feel more
> confident if it added some.
>

I found some tests added in Tom's patches. I'm trying to include them
and other tests.

> In a few places it would be more convenient to make struct
> flow_vlan_hdr into a union, like this:
>
>     union flow_vlan_hdr {
>         ovs_be32 qtag;
>         struct {
>             ovs_be16 tpid;  /* ETH_TYPE_VLAN_DOT1Q or ETH_TYPE_DOT1AD */
>             ovs_be16 tci;
>         };
>     };

That's a good idea.

>
> We could take advantage of it like this, for example:
>
> @@ -326,16 +326,12 @@ parse_mpls(const void **datap, size_t *sizep)
>  }
>
>  static inline bool
> -parse_vlan(const void **datap, size_t *sizep, struct flow_vlan_hdr *vlan_hdrs)
> +parse_vlan(const void **datap, size_t *sizep, union flow_vlan_hdr *vlan_hdrs)
>  {
>      int encaps;
>      const ovs_be16 *eth_type;
> -    struct qtag_prefix {
> -        ovs_be16 eth_type;
> -        ovs_be16 tci;
> -    };
>
> -    memset(vlan_hdrs, 0, sizeof(struct flow_vlan_hdr) * FLOW_MAX_VLAN_HEADERS);
> +    memset(vlan_hdrs, 0, sizeof(union flow_vlan_hdr) * FLOW_MAX_VLAN_HEADERS);
>      data_pull(datap, sizep, ETH_ADDR_LEN * 2);
>
>      eth_type = *datap;
> @@ -343,11 +339,11 @@ parse_vlan(const void **datap, size_t *sizep, struct flow_vlan_hdr *vlan_hdrs)
>      for (encaps = 0;
>           eth_type_vlan(*eth_type) && encaps < FLOW_MAX_VLAN_HEADERS;
>           encaps++) {
> -        if (OVS_LIKELY(*sizep
> -                       >= sizeof(struct qtag_prefix) + sizeof(ovs_be16))) {
> -            const struct qtag_prefix *qp = data_pull(datap, sizep, sizeof *qp);
> -            vlan_hdrs[encaps].tpid = qp->eth_type;
> -            vlan_hdrs[encaps].tci = qp->tci | htons(VLAN_CFI);
> +        if (OVS_LIKELY(*sizep >= sizeof(ovs_be32) + sizeof(ovs_be16))) {
> +            const ovs_16aligned_be32 *qp = data_pull(datap, sizep, sizeof *qp);
> +            vlan_hdrs[encaps].qtag = get_16aligned_be32(qp);
> +            vlan_hdrs[encaps].tci |= htons(VLAN_CFI);
> +
>              eth_type = *datap;
>          } else {
>              return false;
> diff --git a/lib/flow.h b/lib/flow.h
> index 4882e26..4851a32 100644
> --- a/lib/flow.h
> +++ b/lib/flow.h
> @@ -735,8 +735,8 @@ static inline uint16_t
>  miniflow_get_vid(const struct miniflow *flow, size_t n)
>  {
>      if (n < FLOW_MAX_VLAN_HEADERS) {
> -        uint32_t u32 = MINIFLOW_GET_U32(flow, vlan[n]);
> -        return vlan_tci_to_vid(((struct flow_vlan_hdr *)&u32)->tci);
> +        union flow_vlan_hdr hdr = { .qtag = MINIFLOW_GET_BE32(flow, vlan[n]) };
> +        return vlan_tci_to_vid(hdr.tci);
>      }
>      return 0;
>  }
> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 56a6145..6838d7d 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -5560,8 +5560,7 @@ commit_vlan_action(const struct flow* flow, struct flow *base,
>      for (base_n--, flow_n--;
>           base_n >= 0 && flow_n >= 0;
>           base_n--, flow_n--) {
> -        if (memcmp(&base->vlan[base_n], &flow->vlan[flow_n],
> -                   sizeof(struct flow_vlan_hdr))) {
> +        if (base->vlan[base_n].qtag != flow->vlan[flow_n].qtag) {
>              break;
>          }
>      }
> @@ -5569,7 +5568,7 @@ commit_vlan_action(const struct flow* flow, struct flow *base,
>      /* Pop all mismatching vlan of base, push thoses of flow */
>      for (; base_n >= 0; base_n--) {
>          nl_msg_put_flag(odp_actions, OVS_ACTION_ATTR_POP_VLAN);
> -        memset(&wc->masks.vlan[base_n], 0xff, sizeof(wc->masks.vlan[base_n]));
> +        wc->masks.vlan[base_n] = OVS_BE32_MAX;
>      }
>
>      for (; flow_n >= 0; flow_n--) {
>
> It seems risky to fail to add a way for match_format() to print out
> matches past the first VLAN.  It is likely to cause great confusion at
> some point.
>
Yes, I need to think carefully about this.

> This code uses the term "shift" for what is usually termed "push".  A
> "shift" can go in either direction.  I'd use "push".
>
Yes, "push" looks symmetric. I used "shift" because it leaves room for
a header rather than push data.

> I suggest that flow_{shift,push}_vlan() should take an optional 'wc'
> argument like so many other functions that modify flows.  This would
> make it more like the MPLS code and thus easier to understand.
>
> I'm not confident about the change to handle_packet_upcall().
>
> struct xvlan xvlan might be easier to use as just a flow_vlan_hdr.

flow_vlan_hdr is in big-endian and in which vlan-id/pcp are encoded. I
defined struct xvlan to avoid conversion each time.

>
> The commit_vlan_action() function would be better if it was more like
> commit_mpls_action(), to make it easier to verify correctness.  The
> main difference is that at the datapath level VLAN only has push and
> pop, not "set", but that's a straightforward change.  I'd introduce
> functions to count and find common VLAN tags, like we have for MPLS.

Agree, extracting a function makes it more clear.

>
> I'll probably have further comments on later versions of this patch.
>
> Thanks again for working on this!