[ovs-dev] [PATCH v5] tunneling: Avoid recirculation on datapath by computing the recirculate actions at translate time.

Joe Stringer joe at ovn.org
Wed May 17 18:24:25 UTC 2017


On 17 May 2017 at 04:37, Zoltán Balogh <zoltan.balogh at ericsson.com> wrote:
> Hi Joe
>
> I started to rework my patch based on your comments and suggestions. I had some
> difficulties with the last one. Let us focus on this below.
>
>> >> >      if (credit_counts) {
>> >> > +        uint64_t stats_n_bytes = 0;
>> >> > +
>> >> > +        if (rule->truncated_packet_size) {
>> >> > +            stats_n_bytes = MIN(rule->truncated_packet_size, stats->n_bytes);
>> >> > +        } else {
>> >> > +            stats_n_bytes = stats->n_bytes;
>> >> > +        }
>> >>
>> >> Is this fixing a separate issue? I'm confused about whether truncated
>> >> packet stats are being correctly attributed today on master. If so, I
>> >> think this should split out into a separate patch. You might also
>> >> consider whether it makes more sense to modify 'stats' earlier in the
>> >> path so that each rule doesn't need to individually apply the stats
>> >> adjustment. I could imagine a store/restore of the stats plus
>> >> modifying for truncation during the xlate_output_trunc_action()
>> >> processing rather than pushing this complexity all the way into the
>> >> rule stats attribution.
>>
>> >
>> > Incorrect bytes statistics on the underlay bridge is an unwanted side-effect of
>> > the original "Avoid recirculate" commit. By exluding recirculation, there is no
>> > upcall for the tunnelled packets, and the packet size is not set by
>> > upcall_xlate() again:
>> >
>> >   static void
>> >   upcall_xlate(struct udpif *udpif, struct upcall *upcall,
>> >                struct ofpbuf *odp_actions, struct flow_wildcards *wc)
>> >   {
>> >       struct dpif_flow_stats stats;
>> >       struct xlate_in xin;
>> >
>> >       stats.n_packets = 1;
>> >       stats.n_bytes = dp_packet_size(upcall->packet);
>> >       stats.used = time_msec();
>> >       stats.tcp_flags = ntohs(upcall->flow->tcp_flags);
>> >
>> >       xlate_in_init(&xin, upcall->ofproto,
>> >                     ofproto_dpif_get_tables_version(upcall->ofproto),
>> >                     upcall->flow, upcall->in_port, NULL,
>> >                     stats.tcp_flags, upcall->packet, wc, odp_actions);
>> >
>> >       if (upcall->type == DPIF_UC_MISS) {
>> >           xin.resubmit_stats = &stats;
>> >
>> > Here upcall_xlate() sets xin.resubmint_stats which will be used for calculating
>> > statistic. These are const values, that's why I have not updated them earlier.
>> >
>> >     /* If nonnull, flow translation credits the specified statistics to each
>> >      * rule reached through a resubmit or OFPP_TABLE action.
>> >      *
>> >      * This is normally null so the client has to set it manually after
>> >      * calling xlate_in_init(). */
>> >     const struct dpif_flow_stats *resubmit_stats;
>> >
>> > If it does not harm, then the const modifier could be removed and stats updated
>> > at earlier stage.
>>
>> I see, thanks for the explanation. Mostly I'm just trying to push the
>> question of "how can we make this code easier to read and
>> understand?". Const is nice because it tells you these fields won't
>> change. But I think it's probably a little easier if the truncation of
>> the statistics is performed where the truncation occurs, so that the
>> core stats attribution logic can be oblivious of this particular
>> feature.
>
> I removed the const qualifier from resubmit_stats, removed the truncation from
> rule_dpif_credit_stats__() and applied it to resubmit_stats during translation.
> That works fine for a packet translated due to an upcall. At the end we get a
> sinlge datapath flow in the netdev datapath. It should look like this:
>
> netdev at ovs-netdev: hit:4 missed:10
>         br-int:
>                 br-int 65534/1: (tap)
>                 br-int-ns1 1/3: (system)
>                 gre0 2/5: (gre: remote_ip=10.0.0.20)
>         br-p:
>                 br-p 65534/2: (tap)
>                 br-p-ns2 3/4: (system)
>
> flow-dump from non-dpdk interfaces:
> recirc_id(0),in_port(3),eth_type(0x0800),ipv4(tos=0/0x3,ttl=64,frag=no), packets:0, bytes:0, used:never, actions:trunc(200),clone(tnl_push(tnl_port(5)
> ,header(size=38,type=3,eth(dst=cc:cc:cc:dd:dd:de,src=02:3e:b1:dc:0e:42,dl_type=0x0800),ipv4(src=10.0.0.10,dst=10.0.0.20,proto=47,tos=0,ttl=64,frag=0x4
> 000),gre((flags=0x0,proto=0x6558))),out_port(2)),set(ipv4(tos=0/0x3,ttl=63)),4)
>
>
> At this point I get a problem, when a new packet arrives but there is no need
> for translation since we have the datapath flow in the netdev datapath.
> In this case the byte and packet statistics are calculated by the
> rule_dpif_credit_stats__() function as well.
>
> static void
> rule_dpif_credit_stats__(struct rule_dpif *rule,
>                          const struct dpif_flow_stats *stats,
>                          bool credit_counts)
>     OVS_REQUIRES(rule->stats_mutex)
> {
>     if (credit_counts) {
>         rule->stats.n_packets += stats->n_packets;
>         rule->stats.n_bytes += stats->n_bytes;
>     }
>     rule->stats.used = MAX(rule->stats.used, stats->used);
> }
>
> However, in this case the 'stats' argument does not come from resubmit_stats.
> It is derived from dpif_flow_stats and udpif_flow stats resulting in the
> original packet size.
> I guess, this is due to fact, that we have a single datapath flow which can't
> reflect the size of the original and the altered packet size. Someone could
> confirm this, to make sure I'm not wrong.

upcall_xlate() generates the 'dpif_flow_stats' from the packet in the
upcall path. It then populates xin.resubmit_stats and calls
xlate_actions().

For revalidator path, it is similar but populated from xlate_key().

What I imagined happening during translation is that when the flow
with the truncate action is handled, the 'resubmit_stats' would be
modified for subsequent translation. That is to say, in
xlate_output_trunc_action() where it calls out to the
xlate_output_action() common code, there would be store/restore of the
resubmit_stats.

Backing up a bit for context, the stats attribution goes roughly like this:
* First upcall, handler thread calls through the translate code with a
packet. The resubmit_stats are derived from that packet. This goes
through xlate_actions().
* First dump of flow from revalidator thread fetches the flow and runs
the same xlate_actions() with whatever stats it has (may be zero).
This time, whenever stats attribution or side effects occur, an
xlate_cache entry is generated.
* Second and subsequent dumps of flows fetches the flow and shortcuts
the xlate_actions() by using the xlate_cache instead - ie a call to
xlate_push_stats().

So, in the same place where the resubmit_stats is manipulated, you
would also need to generate a new XC entry which would manipulate the
stats - this would be a 'side-effect'. I'd imagine that prior to the
full output translation there would be a XC_TRUNCATE(truncated_size)
then afterwards there would be an XC_TRUNCATE_RESET(). Or it could be
just XC_SET_SIZE(...) where 0 is reset and non-zero is a truncate
size. In the implementation/execution in xlate_push_stats() when
performing XC_TRUNCATE you would need to store the original push_stats
size somewhere, then calculate a new 'n_bytes' based on the number of
packets and existing bytes*. For XC_TRUNCATE_RESET(), it would restore
the original push_stats size.

* Hmm, I'm not sure the calculation will be 100% here. Let's say there
were 3 packets hit the flow, 50B, 200B, 300B. If output(max_len=100)
was executed, then we don't know how many of the packets were
truncated. The maximum number of bytes that could be transmitted is
300, but the actual number was 250. We could divide the n_bytes by
n_packets, subtract the max_len and then multiply back up by the
number of packets, which works for this case assuming floating point
arithmetic but is slightly off if using integer math..

> These functions are invoked:
>
> static void
> revalidate(struct revalidator *revalidator)
> {
> ...
>         const struct dpif_flow *f;
> ...
>         n_dumped = dpif_flow_dump_next(dump_thread, flows, ARRAY_SIZE(flows));
> ...
>          for (f = flows; f < &flows[n_dumped]; f++, index++) {
>             long long int used = f->stats.used;
> ...
>                 result = revalidate_ukey(udpif, ukey, &f->stats, &odp_actions,
>                                          reval_seq, &recircs);
> ...
> }
>
> static enum reval_result
> revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>                 const struct dpif_flow_stats *stats,
>                 struct ofpbuf *odp_actions, uint64_t reval_seq,
>                 struct recirc_refs *recircs)
>     OVS_REQUIRES(ukey->mutex)
> {
> ...
>     push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes
>                     ? stats->n_bytes - ukey->stats.n_bytes
>                     : 0);
> ...
>         xlate_push_stats(ukey->xcache, &push);
> ...
> }
>
> void
> xlate_push_stats(struct xlate_cache *xcache,
>                  const struct dpif_flow_stats *stats)
> {
> ...
>         xlate_push_stats_entry(entry, stats);
> ...
> }
>
> void
> xlate_push_stats_entry(struct xc_entry *entry,
>                        const struct dpif_flow_stats *stats)
> {
> ...
>         rule_dpif_credit_stats(entry->rule, stats);
> ...
> }
>
> void
> rule_dpif_credit_stats(struct rule_dpif *rule,
>                        const struct dpif_flow_stats *stats)
> {
> ...
>         rule_dpif_credit_stats__(rule, stats, true);
> ...
> }
>
> So, I'm not sure if stats correction can be solved at an earlier stage but in
> rule_dpif_credit_stats__(). I would like to get some help or verification
> from experts. Comment are welcome.


More information about the dev mailing list