[ovs-dev] [PATCH 2/2] dpif-netdev: Translate Geneve options per-flow, not per-packet.

Jarno Rajahalme jrajahalme at nicira.com
Mon Aug 3 22:45:29 UTC 2015


With few suggestions and questions below:

Acked-by: Jarno Rajahalme <jrajahalme at nicira.com>

  Jarno

> On Jul 29, 2015, at 8:09 PM, Jesse Gross <jesse at nicira.com> wrote:
> 
> The kernel implementation of Geneve options stores the TLV option
> data in the flow exactly as received, without any further parsing.
> This is then translated to known options for the purposes of matching
> on flow setup (which will then install a datapath flow in the form
> the kernel is expecting).
> 
> The userspace implementation behaves a little bit differently - it
> looks up known options as each packet is received. The reason for this
> is there is a much tighter coupling between datapath and flow translation
> and the representation is generally expected to be the same. This works
> but it incurs work on a per-packet basis that could be done per-flow
> instead.
> 
> This introduces a small translation step for Geneve packets between
> datapath and flow lookup for the userspace datapath in order to
> allow the same kind of processing that the kernel does.
> 
> There is a second benefit to this as well: for some operations it is
> preferable to keep the options exactly as they were received on the wire,
> which this enables. One example is that for packets that are executed from
> ofproto-dpif-upcall to the datapath, this avoids the translation of
> Geneve metadata. Since this conversion is potentially lossy (for unknown
> options), keeping everything in the same format removes the possibility
> of dropping options if the packet comes back up to userspace and the
> Geneve option translation table has changed. To help with these types of
> operations, most functions can understand both formats of data and seamlessly
> do the right thing.
> 
> Signed-off-by: Jesse Gross <jesse at nicira.com>
> ---
> lib/automake.mk               |   1 +
> lib/dpif-netdev.c             |  55 ++++++-
> lib/flow.c                    |  48 ++++--
> lib/flow.h                    |  13 +-
> lib/geneve.h                  |  63 ++++++++
> lib/meta-flow.c               |   6 +-
> lib/netdev-vport.c            |  26 ++--
> lib/odp-execute.c             |   2 +-
> lib/odp-util.c                |  58 ++++---
> lib/odp-util.h                |  12 +-
> lib/packets.h                 |  41 +----
> lib/tun-metadata.c            | 352 ++++++++++++++++++++++++++++++------------
> lib/tun-metadata.h            |  74 ++++++---
> ofproto/ofproto-dpif-sflow.c  |   2 +-
> ofproto/ofproto-dpif-upcall.c |   2 +-
> tests/tunnel-push-pop.at      |   2 +-
> 16 files changed, 534 insertions(+), 223 deletions(-)
> create mode 100644 lib/geneve.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index faca968..5b6e9e8 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -81,6 +81,7 @@ lib_libopenvswitch_la_SOURCES = \
> 	lib/fatal-signal.h \
> 	lib/flow.c \
> 	lib/flow.h \
> +	lib/geneve.h \
> 	lib/guarded-list.c \
> 	lib/guarded-list.h \
> 	lib/hash.c \
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index f587df5..c31a7e0 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1884,8 +1884,8 @@ dpif_netdev_mask_from_nlattrs(const struct nlattr *key, uint32_t key_len,
>     if (mask_key_len) {
>         enum odp_key_fitness fitness;
> 
> -        fitness = odp_flow_key_to_mask(mask_key, mask_key_len, key, key_len,
> -                                       &wc->masks, flow);
> +        fitness = odp_flow_key_to_mask_udpif(mask_key, mask_key_len, key,
> +                                             key_len, &wc->masks, flow);
>         if (fitness) {
>             /* This should not happen: it indicates that
>              * odp_flow_key_from_mask() and odp_flow_key_to_mask()
> @@ -1919,7 +1919,7 @@ dpif_netdev_flow_from_nlattrs(const struct nlattr *key, uint32_t key_len,
> {
>     odp_port_t in_port;
> 
> -    if (odp_flow_key_to_flow(key, key_len, flow)) {
> +    if (odp_flow_key_to_flow_udpif(key, key_len, flow)) {
>         /* This should not happen: it indicates that odp_flow_key_from_flow()
>          * and odp_flow_key_to_flow() disagree on the acceptable form of a
>          * flow.  Log the problem as an error, with enough details to enable
> @@ -3014,11 +3014,25 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_,
>                  struct ofpbuf *actions, struct ofpbuf *put_actions)
> {
>     struct dp_netdev *dp = pmd->dp;
> +    struct flow_tnl orig_tunnel;
> +    int err;
> 
>     if (OVS_UNLIKELY(!dp->upcall_cb)) {
>         return ENODEV;
>     }
> 
> +    orig_tunnel.flags = flow->tunnel.flags;
> +    if (flow->tunnel.flags & FLOW_TNL_F_UDPIF) {
> +        orig_tunnel.metadata.present.len = flow->tunnel.metadata.present.len;
> +        memcpy(orig_tunnel.metadata.opts.gnv, flow->tunnel.metadata.opts.gnv,
> +               flow->tunnel.metadata.present.len);
> +        err = tun_metadata_from_geneve_udpif(&orig_tunnel, &orig_tunnel,
> +                                             &flow->tunnel);
> +        if (err) {
> +            return err;
> +        }
> +    }
> +

All these new blocks in dp_netdev_upcall() would benefit from short comments. I.e., “Upcall processing expects the geneve options to be in the translated format, but we need to retain the raw format for datapath use.” 

>     if (OVS_UNLIKELY(!VLOG_DROP_DBG(&upcall_rl))) {
>         struct ds ds = DS_EMPTY_INITIALIZER;
>         char *packet_str;
> @@ -3046,8 +3060,39 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, struct dp_packet *packet_,
>         ds_destroy(&ds);
>     }
> 
> -    return dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata,
> -                         actions, wc, put_actions, dp->upcall_aux);
> +    err = dp->upcall_cb(packet_, flow, ufid, pmd->core_id, type, userdata,
> +                        actions, wc, put_actions, dp->upcall_aux);
> +    if (err && err != ENOSPC) {
> +        return err;
> +    }
> +

/* Translate tunnel metadata masks to datapath format. */

> +    if (wc) {
> +        if (wc->masks.tunnel.metadata.present.map) {
> +            struct geneve_opt opts[GENEVE_TOT_OPT_SIZE /
> +                                   sizeof(struct geneve_opt)];
> +
> +            tun_metadata_to_geneve_udpif_mask(&flow->tunnel,
> +                                              &wc->masks.tunnel,
> +                                              orig_tunnel.metadata.opts.gnv,
> +                                              orig_tunnel.metadata.present.len,
> +                                              opts);
> +
> +            memset(&wc->masks.tunnel.metadata, 0,
> +                   sizeof wc->masks.tunnel.metadata);
> +            memcpy(&wc->masks.tunnel.metadata.opts.gnv, opts,
> +                   orig_tunnel.metadata.present.len);
> +        }
> +        wc->masks.tunnel.metadata.present.len = 0xff;
> +    }
> +

/* Restore tunnel metadata. */

Might add a sentence why restoring the original is better than translation, and how this works with the mask (unknown options will be wildcarded).

> +    if (orig_tunnel.flags & FLOW_TNL_F_UDPIF) {
> +        memcpy(&flow->tunnel.metadata.opts.gnv, orig_tunnel.metadata.opts.gnv,
> +               orig_tunnel.metadata.present.len);
> +        flow->tunnel.metadata.present.len = orig_tunnel.metadata.present.len;
> +        flow->tunnel.flags |= FLOW_TNL_F_UDPIF;
> +    }
> +
> +    return err;
> }
> 
> static inline uint32_t
> diff --git a/lib/flow.c b/lib/flow.c
> index 352e9b8..d3d25e4 100644
> --- a/lib/flow.c
> +++ b/lib/flow.c
> @@ -462,9 +462,22 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
>         miniflow_push_words(mf, tunnel, &md->tunnel,
>                             offsetof(struct flow_tnl, metadata) /
>                             sizeof(uint64_t));
> -        if (md->tunnel.metadata.opt_map) {
> -            miniflow_push_words(mf, tunnel.metadata, &md->tunnel.metadata,
> -                                 sizeof md->tunnel.metadata / sizeof(uint64_t));
> +
> +        if (!(md->tunnel.flags & FLOW_TNL_F_UDPIF)) {
> +            if (md->tunnel.metadata.present.map) {
> +                miniflow_push_words(mf, tunnel.metadata, &md->tunnel.metadata,
> +                                    sizeof md->tunnel.metadata /
> +                                    sizeof(uint64_t));
> +            }
> +        } else {
> +            if (md->tunnel.metadata.present.len) {
> +                miniflow_push_words(mf, tunnel.metadata.present,
> +                                    &md->tunnel.metadata.present, 1);
> +                miniflow_push_words(mf, tunnel.metadata.opts.gnv,
> +                                    md->tunnel.metadata.opts.gnv,
> +                                    DIV_ROUND_UP(md->tunnel.metadata.present.len,
> +                                    sizeof(uint64_t)));

Wrong indentation here.

> +            }
>         }
>     }
>     if (md->skb_priority || md->pkt_mark) {
> @@ -815,7 +828,7 @@ flow_get_metadata(const struct flow *flow, struct match *flow_metadata)
>     if (flow->tunnel.gbp_flags) {
>         match_set_tun_gbp_flags(flow_metadata, flow->tunnel.gbp_flags);
>     }
> -    tun_metadata_get_fmd(&flow->tunnel.metadata, flow_metadata);
> +    tun_metadata_get_fmd(&flow->tunnel, flow_metadata);
>     if (flow->metadata != htonll(0)) {
>         match_set_metadata(flow_metadata, flow->metadata);
>     }
> @@ -1161,9 +1174,16 @@ void flow_wildcards_init_for_packet(struct flow_wildcards *wc,
>         WC_MASK_FIELD(wc, tunnel.gbp_id);
>         WC_MASK_FIELD(wc, tunnel.gbp_flags);
> 
> -        if (flow->tunnel.metadata.opt_map) {
> -            wc->masks.tunnel.metadata.opt_map = flow->tunnel.metadata.opt_map;
> -            WC_MASK_FIELD(wc, tunnel.metadata.opts);
> +        if (!(flow->tunnel.flags & FLOW_TNL_F_UDPIF)) {
> +            if (flow->tunnel.metadata.present.map) {
> +                wc->masks.tunnel.metadata.present.map =
> +                                              flow->tunnel.metadata.present.map;
> +                WC_MASK_FIELD(wc, tunnel.metadata.opts.u8);
> +            }
> +        } else {
> +            WC_MASK_FIELD(wc, tunnel.metadata.present.len);
> +            memset(wc->masks.tunnel.metadata.opts.gnv, 0xff,
> +                   flow->tunnel.metadata.present.len);
>         }
>     } else if (flow->tunnel.tun_id) {
>         WC_MASK_FIELD(wc, tunnel.tun_id);
> @@ -1253,9 +1273,17 @@ flow_wc_map(const struct flow *flow, struct miniflow *map)
> 
>     map->tnl_map = 0;
>     if (flow->tunnel.ip_dst) {
> -        map->tnl_map = MINIFLOW_TNL_MAP(tunnel);
> -        if (!flow->tunnel.metadata.opt_map) {
> -            map->tnl_map &= ~MINIFLOW_TNL_MAP(tunnel.metadata);
> +        map->tnl_map |= MINIFLOW_TNL_MAP__(tunnel,
> +                                          offsetof(struct flow_tnl, metadata));
> +        if (!(flow->tunnel.flags & FLOW_TNL_F_UDPIF)) {
> +            if (flow->tunnel.metadata.present.map) {
> +                map->tnl_map |= MINIFLOW_TNL_MAP__(tunnel.metadata,
> +                                                 sizeof(flow->tunnel.metadata));

This could be MINIFLOW_TNL_MAP() instead?

> +            }
> +        } else {
> +            map->tnl_map |= MINIFLOW_TNL_MAP(tunnel.metadata.present.len);
> +            map->tnl_map |= MINIFLOW_TNL_MAP__(tunnel.metadata.opts.gnv,
> +                                             flow->tunnel.metadata.present.len);
>         }
>     }
> 
> 

(snip)

> diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
> index bd95c8e..0f1724a 100644
> --- a/tests/tunnel-push-pop.at
> +++ b/tests/tunnel-push-pop.at
> @@ -132,7 +132,7 @@ AT_CHECK([ovs-ofctl dump-ports int-br | grep 'port  5'], [0], [dnl
>   port  5: rx pkts=1, bytes=98, drop=0, errs=0, frame=0, over=0, crc=0
> ])
> AT_CHECK([ovs-appctl dpif/dump-flows int-br], [0], [dnl
> -tunnel(tun_id=0x7b,src=1.1.2.92,dst=1.1.2.88,ttl=64,geneve({class=0xffff,type=0x80,len=4,0xa/0xf}),flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(6081),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:drop
> +tunnel(tun_id=0x7b,src=1.1.2.92,dst=1.1.2.88,ttl=64,geneve({class=0xffff,type=0x80,len=4,0xa/0xf}{class=0xffff,type=0,len=4}),flags(-df-csum+key)),skb_mark(0),recirc_id(0),in_port(6081),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:drop

Why this changed?

> ])
> 
> OVS_VSWITCHD_STOP
> -- 
> 2.1.4
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev




More information about the dev mailing list