[ovs-dev] [PATCH v4] Extend OVS IPFIX exporter to export tunnel headers

Wenyu Zhang wenyuz at vmware.com
Mon Jul 21 09:40:30 UTC 2014


Hello Pravin,

I added some new comments inline, may you take a look? Please ingore the firest reply if you haven't looked at it.

And I have sent out patch v5, with the addressed comments except two points I have explained in comments.

Bests,
Wenyu

-----Original Message-----
From: Wenyu Zhang 
Sent: Monday, July 21, 2014 2:43 PM
To: 'Pravin Shelar'
Cc: Romain Lenglet; Ben Pfaff; Jesse Gross; dev at openvswitch.org
Subject: RE: [PATCH v4] Extend OVS IPFIX exporter to export tunnel headers

Thanks for the review. My comments inline.

Bests,
Wenyu

-----Original Message-----
From: Pravin Shelar [mailto:pshelar at nicira.com]
Sent: Saturday, July 19, 2014 3:51 AM
To: Wenyu Zhang
Cc: Romain Lenglet; Ben Pfaff; Jesse Gross; dev at openvswitch.org
Subject: Re: [PATCH v4] Extend OVS IPFIX exporter to export tunnel headers

sparse warning


  CHECK   /home/pravin/ovs/w8/datapath/linux/vport.c
/home/pravin/ovs/w8/datapath/linux/../flow.h:126:15: warning: memset with byte count of 0
  CC [M]  /home/pravin/ovs/w8/datapath/linux/vport.o

Wenyu: It is a data structure alignment issue, I will correct it.

On Fri, Jul 18, 2014 at 1:59 AM, Wenyu Zhang <wenyuz at vmware.com> wrote:
> Extend IPFIX exporter to export tunnel headers when both input and 
> output of the port.
> Add three other_config options in IPFIX table: enable-input-sampling, 
> enable-output-sampling and enable-tunnel-sampling, to control whether 
> sampling tunnel info, on which direction (input or output).
> Insert sampling action before output action and the output tunnel port 
> is sent to datapath in the sampling action.
> Make datapath collect output tunnel info and send it back to userpace 
> in upcall message with a new additional optional attribute.
> Add a tunnel ports map to make the tunnel port lookup faster in 
> sampling upcalls in IPFIX exporter. Make the IPFIX exporter generate 
> IPFIX template sets with enterprise elements for the tunnel info, save 
> the tunnel info in IPFIX cache entries, and send IPFIX DATA with tunnel info.
> Add flowDirection element in IPFIX templates.
>
> Signed-off-by: Wenyu Zhang <wenyuz at vmware.com>
> Acked-by: Romain Lenglet <rlenglet at vmware.com>
> ---
>  datapath/actions.c                    |   16 +-
>  datapath/datapath.c                   |   42 +++-
>  datapath/datapath.h                   |    2 +
>  datapath/flow.h                       |   55 +++-
>  datapath/flow_netlink.c               |   58 ++++-
>  datapath/flow_netlink.h               |    2 +
>  datapath/vport-geneve.c               |   37 ++-
>  datapath/vport-gre.c                  |   35 ++-
>  datapath/vport-lisp.c                 |   39 ++-
>  datapath/vport-vxlan.c                |   38 ++-
>  datapath/vport.c                      |   33 +++
>  datapath/vport.h                      |   11 +
>  include/linux/openvswitch.h           |   21 +-
>  lib/dpif-linux.c                      |    2 +
>  lib/dpif.h                            |    1 +
>  lib/odp-util.c                        |  183 +++++++++-----
>  lib/odp-util.h                        |    2 +
>  lib/packets.h                         |    9 +-
>  ofproto/automake.mk                   |    3 +
>  ofproto/ipfix-enterprise-entities.def |   19 ++
>  ofproto/ofproto-dpif-ipfix.c          |  444 ++++++++++++++++++++++++++++++---
>  ofproto/ofproto-dpif-ipfix.h          |   14 +-
>  ofproto/ofproto-dpif-upcall.c         |   20 +-
>  ofproto/ofproto-dpif-xlate.c          |   51 +++-
>  ofproto/ofproto-dpif.c                |   20 ++
>  ofproto/ofproto.h                     |    3 +
>  ofproto/tunnel.c                      |    4 +
>  tests/odp.at                          |    9 +-
>  utilities/ovs-vsctl.8.in              |    8 +-
>  vswitchd/bridge.c                     |    9 +
>  vswitchd/vswitch.ovsschema            |    5 +-
>  vswitchd/vswitch.xml                  |   27 ++
>  32 files changed, 1045 insertions(+), 177 deletions(-)  create mode
> 100644 ofproto/ipfix-enterprise-entities.def
>
> diff --git a/datapath/actions.c b/datapath/actions.c index
> 39a21f4..226b849 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> @@ -501,7 +501,9 @@ static int output_userspace(struct datapath *dp, 
> struct sk_buff *skb,  {
>         struct dp_upcall_info upcall;
>         const struct nlattr *a;
> -       int rem;
> +       int rem, err;
> +       struct ovs_tunnel_info output_tunnel_info;
> +       struct vport *vport;
>
>         BUG_ON(!OVS_CB(skb)->pkt_key);
>
> @@ -509,6 +511,7 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>         upcall.key = OVS_CB(skb)->pkt_key;
>         upcall.userdata = NULL;
>         upcall.portid = 0;
> +       upcall.out_tun_info = NULL;
>
>         for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>                  a = nla_next(a, &rem)) { @@ -520,6 +523,17 @@ static 
> int output_userspace(struct datapath *dp, struct sk_buff *skb,
>                 case OVS_USERSPACE_ATTR_PID:
>                         upcall.portid = nla_get_u32(a);
>                         break;
> +
> +               case OVS_USERSPACE_ATTR_TUNNEL_OUT_PORT:
> +                       /* Get out tunnel info*/
> +                       vport = ovs_vport_ovsl_rcu(dp, nla_get_u32(a));
> +                       if (vport->ops->get_out_tun_info){
> +                               err = vport->ops->get_out_tun_info(vport, skb, &output_tunnel_info);
> +                               if (err == 0) {
> +                                       upcall.out_tun_info = &output_tunnel_info;
> +                               }
> +                        }
> +                       break;
>                 }
>         }
>
According to ovs coding style, variable are declared in local block, so you can move vport and err to respective local block.
Wenyu: OK, I can move it.

In case of error from get_out_tun_info() or invalid vport, we can skip user upcall, since userspace will discard this data anyways.
Wenyu: It seems better to keep calling user upcall, because that the normal info, not including tunnel info, about this packet can be sent to userspace, and userspace can record the nomral info of this packet at least.

....

>
>  static size_t upcall_msg_size(const struct nlattr *userdata,
> +                              bool out_tun_key,
>                               unsigned int hdrlen)  {
>         size_t size = NLMSG_ALIGN(sizeof(struct ovs_header)) @@ -418,6
> +429,10 @@ static size_t upcall_msg_size(const struct nlattr 
> +*userdata,
>         if (userdata)
>                 size += NLA_ALIGN(userdata->nla_len);
>
> +       /* OVS_PACKET_ATTR_OUT_TUNNEL_KEY */
> +        if (out_tun_key)
> +               size += nla_total_size(tun_key_attr_size());
> +
>         return size;
>  }
>

Can you pass down upcall_info to upcall_msg_size(), it look more consistent that way.
Wenyu: It souds reasonable, I will change it.

>  static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
> -                                        const struct iphdr *iph, __be64 tun_id,
> -                                        __be16 tun_flags,
> -                                        struct geneve_opt *opts,
> -                                        u8 opts_len)
> +                                         const struct iphdr *iph,
> +                                         const void *tph,
> +                                         __be64 tun_id,
> +                                         __be16 tun_flags,
> +                                         struct geneve_opt *opts,
> +                                         u8 opts_len)
>  {

Rather than passing down tph, can just pass tp_src_port and tp_dst_port? we can avoid if conditions in this function.
Wenyu: Ok.  To avoid the if condition may help performance.

>         tun_info->tunnel.tun_id = tun_id;
>         tun_info->tunnel.ipv4_src = iph->saddr; @@ -79,6 +83,21 @@ 
> static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
>         tun_info->tunnel.ipv4_ttl = iph->ttl;
>         tun_info->tunnel.tun_flags = tun_flags;
>
> +       /* For the tunnel types on the top of IPsec,the tp_src and tp_dst of
> +        * the upper tunnel are used.
> +        * E.g: GRE over IPSEC, the tp_src and tp_port are zero.
> +        */
> +       if (tph &&
> +           (iph->protocol == IPPROTO_UDP ||
> +            iph->protocol == IPPROTO_TCP)) {
> +               const struct udphdr * udp_hdr = tph;
> +               tun_info->tunnel.tp_src = udp_hdr->source;
> +               tun_info->tunnel.tp_dst = udp_hdr->dest;
> +       } else {
> +               tun_info->tunnel.tp_src = 0;
> +               tun_info->tunnel.tp_dst = 0;
> +        }
> +
>         /* clear struct padding. */
>         memset((unsigned char *) &tun_info->tunnel + OVS_TUNNEL_KEY_SIZE, 0,
>                sizeof(tun_info->tunnel) - OVS_TUNNEL_KEY_SIZE); @@
> -87,6 +106,30 @@ static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
>         tun_info->options_len = opts_len;  }
>
> +static inline void ovs_flow_out_tun_info_init(struct ovs_tunnel_info *out_tun_info,
> +                                             const struct ovs_tunnel_info *tun_info,
> +                                             __be32 saddr,
> +                                             __be16 tp_src,
> +                                             __be16 tp_dst) {
> +       out_tun_info->tunnel.tun_id = tun_info->tunnel.tun_id;
> +       out_tun_info->tunnel.ipv4_src = saddr;
> +       out_tun_info->tunnel.ipv4_dst = tun_info->tunnel.ipv4_dst;
> +       out_tun_info->tunnel.ipv4_tos = tun_info->tunnel.ipv4_tos;
> +       out_tun_info->tunnel.ipv4_ttl = tun_info->tunnel.ipv4_ttl;
> +       out_tun_info->tunnel.tun_flags = tun_info->tunnel.tun_flags;
> +
> +       out_tun_info->tunnel.tp_src = tp_src;
> +       out_tun_info->tunnel.tp_dst = tp_dst;
> +
> +       /* clear struct padding. */
> +       memset((unsigned char *) &tun_info->tunnel + OVS_TUNNEL_KEY_SIZE, 0,
> +              sizeof(tun_info->tunnel) - OVS_TUNNEL_KEY_SIZE);
> +
> +       out_tun_info->options = tun_info->options;
> +       out_tun_info->options_len = tun_info->options_len; }
> +

If you change ovs_flow_tun_info_init() according to comments above, there is no need for separate function ovs_flow_out_tun_info_init().
You can use ovs_flow_tun_info_init() to setup out_tun_key.

Wenyu: The two functions are different due to that the ipv4_src/ipv4_dst/ipv4_tos/ipv4_ttl are from different data.
For ovs_flow_tun_info_init(), these fields are from iphdr.
But for ovs_flow_out_tun_info_init(), ipv4_src are from the result of route lookup, and the others are from input tunnel info.
Otherwise we transfer the 4 fields one by one in parameters too, the two function can be merged into one. But I am afraid whether the parameters will be too many, such like:
static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
                                                 __be32 saddr, __be32 daddr, u8 tos, u8 ttl,
                                                 __be16 tp_src,
                                                 __be16 tp_dst,
                                                 __be64 tun_id,
                                                 __be16 tun_flags,
                                                 struct geneve_opt *opts,
                                                 u8 opts_len);

Wenyu: I am thinking about it and find a better way to do the merging:
      Introduce a function ovs_flow_tun_info_init__()  with all the parameters to do the real assignment.
      ovs_flow_tun_info_init() will call ovs_flow_tun_info_init__(), and its parameters keep unchanged.
     And I use ovs_flow_tun_info_init__() to setup out_tun_key directly. 

>
> +static int geneve_get_out_tun_info(struct vport *vport, struct sk_buff *skb,
> +                                  struct ovs_tunnel_info
> +*out_tun_info) {
> +       struct geneve_port *geneve_port = geneve_vport(vport);
> +
> +       if (unlikely(!OVS_CB(skb)->tun_info))
> +               return -EINVAL;
You can move this check inside ovs_vport_get_out_tun_info().
Wenyu:  I think that it seems reasonable 
Wenyu:  After checking the codes again, I perfer to keep the check here. Because the tp_src and tp_dst need be prepared before ovs_vport_get_out_tun_info() is called, we'd better do the check before calculating tp_src and tp_dst.
> +
> +
> +

extra blank lines.
Wenyu: OK, I will remove them.

> +       /*
> +        * Get tp_src and tp_dst, refert to geneve_build_header().
> +        */
> +       return ovs_vport_get_out_tun_info(out_tun_info, OVS_CB(skb)->tun_info,
> +                                         ovs_dp_get_net(vport->dp),
> +                                         IPPROTO_UDP, skb->mark,
> +                                         vxlan_src_port(1, USHRT_MAX, 
> + skb),
> +                                         
> + inet_sport(geneve_port->sock->sk));
> +
> +}
> +

.....
>  }
> +
> +int ovs_vport_get_out_tun_info(struct ovs_tunnel_info *out_tun_info,
> +                              const struct ovs_tunnel_info *tun_info,
> +                              struct net *net,
> +                              u8 ipproto,
> +                              u32 skb_mark,
> +                              __be16 tp_src,
> +                              __be16 tp_dst) {
> +       struct rtable *rt;
...
> +
> +       ip_rt_put(rt);
> +
> +       /* Generate out_tun_info based on tun_info, saddr, tp_src and tp_dst*/
> +        ovs_flow_out_tun_info_init(out_tun_info, tun_info, saddr, 
> + tp_src, tp_dst);
> +
Use tab for indentation.
Wenyu: Thanks for pointing it out. I will correct it.

> +       return 0;
> +}
> diff --git a/datapath/vport.h b/datapath/vport.h
....

> -       OVS_PACKET_ATTR_USERDATA,    /* OVS_ACTION_ATTR_USERSPACE arg. */
> +       OVS_PACKET_ATTR_PACKET,         /* Packet data. */
> +       OVS_PACKET_ATTR_KEY,            /* Nested OVS_KEY_ATTR_* attributes. */
> +       OVS_PACKET_ATTR_ACTIONS,        /* Nested OVS_ACTION_ATTR_* attributes. */
> +       OVS_PACKET_ATTR_USERDATA,       /* OVS_ACTION_ATTR_USERSPACE arg. */
> +       OVS_PACKET_ATTR_OUT_TUNNEL_KEY, /* Nested
> + OVS_TUNNEL_KEY_ATTR_* attributes */
>         __OVS_PACKET_ATTR_MAX
>  };
>

After reading patch I think "egress_tunnel_info" better than out_tunnel_key, Can you change all names to use egress_tunnel_info suffix?
Wenyu:  The " OVS_PACKET_ATTR_OUT_TUNNEL_KEY " is the response to " OVS_USERSPACE_ATTR_TUNNEL_OUT_PORT", which is the part of the sample action just before output action. And there is an element in the ipfix's usderdata named output_odp_port.  The "egress_tunnel_info" sounds good separately, but "out_tun_key" sounds more consistent with other related elements.  I perfer to keep it if no other consideration.


More information about the dev mailing list