[ovs-dev] [PATCH RFC] ofproto-dpif-upcall: Increment dump_seq before instead of after dump.

Thu Jul 9 01:32:46 UTC 2015

On 8 July 2015 at 17:10, Ben Pfaff <blp at nicira.com> wrote:
> On Wed, Jul 08, 2015 at 01:12:26PM -0700, Joe Stringer wrote:
>> On 7 July 2015 at 21:42, Ben Pfaff <blp at nicira.com> wrote:
>> > On Tue, Jul 07, 2015 at 08:13:58PM -0700, Joe Stringer wrote:
>> >> On 6 July 2015 at 22:20, Ben Pfaff <blp at nicira.com> wrote:
>> >> > When process_upcall() passes a miss upcall to upcall_xlate(), the new
>> >> > ukey's dump_seq member is initialized from the current dump_seq.  Later,
>> >> > when udpif_revalidator() calls revalidate(), any dumped flow for which
>> >> > ukey->dump_seq equals the current dump_seq is skipped.  However, until now
>> >> > the current dump_seq was only incremented *after* a revalidation run is
>> >> > completed.  That means that, if a ukey added is added between revalidation
>> >> > runs, it will be skipped on the subsequent revalidation run.  This commit
>> >> > fixes the problem by incrementing dump_seq just before a revalidation run
>> >> > instead of just after.
>> >>
>> >> I think this change makes sense. In cases with few flows (or generally
>> >> low revalidator workload), it makes stats and forwarding behaviour
>> >> more accurate, then in cases with high revalidator workload there
>> >> should be no effect (the end of one round melds into the start of the
>> >> next).
>> >>
>> >> The few areas that I can think of which are relevant are:
>> >> - ofproto-dpif's run() function uses udpif_dump_seq() to determine
>> >> when all the stats have just finished being updated, indicating it's a
>> >> good time to do periodic work like timing out openflow flows and
>> >> rebalancing bonds. It seems like this would be best performed between
>> >> revalidation rounds, but I don't have a good way of quantifying how
>> >> much this matters.
>> >> - In the testsuite we use 'ovs-appctl revalidator/wait' (which is
>> >> based on dump_seq) to determine when the flows have been dumped and
>> >> stats updated, which will be triggered at the end of revalidation
>> >> currently. This change would make it more likely that the testsuite
>> >> calls revalidator/wait while the revalidators are sleeping, and
>> >> returns before the flows are actually dumped. This may or may not be a
>> >> problem in practice, unfortunately the testuite only shows up this
>> >> kind of issue as intermittent failures on various tests. Anyway, this
>> >> could be trivially solved if it shows to be a problem.
>> >> - The race conditions mentioned in ukey_install() for the userspace
>> >> datapath are more likely, due to there being a smaller window between
>> >> dump_seq being incremented and revalidators reaching the GC stage. In
>> >> practice I doubt it's a problem.
>> >
>> > Thanks for all the thoughts!
>> >
>> > Do you think that it's best to address all of these by experimenting and
>> > fixing any observed problems (how?), or by adding comments that describe
>> > the design and the possible shortcomings, or by adding to the commit
>> > message, or by doing something else?
>>
>> In practice, the cases above require the intersection of multiple
>> corner cases before they present themselves. This makes them harder to
>> address. I'm also aware that having spent lots of time in this code
>> I'm prone to overstate the probability of some of these issues :-)
>>
>> One general approach would be to divide 'dump_seq' into
>> 'dump_start_seq' and 'dump_end_seq'. The first two cases above can use
>> the end_seq, while the ukey logic can use the start_seq. I expect that
>> this will fix the test case you mention with a smaller change to the
>> current logic. This addresses the first two cases above.
>>
>> Case-by-case:
>> - The run() issue is most likely to present itself when we have
>> openflow flows with a small idle time, eg 1 second. We might be able
>> to produce a case where we install such a flow, then send some traffic
>> to hit that flow, and the flow is timed out despite the packets
>> arriving. However, with 1-second revalidation cycles we're always
>> going to have poor accuracy in this kind of case, so this is probably
>> best addressed through documentation. If we don't already document
>> what can be expected about the stats collection, then we probably
>> should.
>> - The revalidator/wait issue is most likely to show up as intermittent
>> test failures for tests like "ofproto-dpif - ofproto/trace from dpctl
>> output", where stats are wrong for a test because revalidators didn't
>> run, or we're filtering flow dump messages and expecting them to be
>> there (but they aren't, because revalidators didn't run to dump the
>> flows yet). If such tests fail intermittently on travis, we can modify
>> the logic to, eg, perform revalidator/wait twice, or modify the
>> udpif_run() to only respond to unix connections after a couple of
>> revalidation rounds.
>> - In the worst case, the ukey_install() issue will show up as
>> extraneous warnings about "failed to flow_del", because the flow is
>> marked as "existing in the datapath", but it wasn't yet installed.
>> Ultimately the solution to this problem is to refactor the upcall_cb()
>> interface to split flow installation into two phases, like it happens
>> for the linux datapath. This might not even be much work, I just
>> didn't consider it worthwhile due to the low probability of hitting
>> the issue.
>>
>> Most of these cases will come up more frequently on single-core boxes
>> with high parallelism, due to more crazy scheduling of OVS threads.
>> The limited resources in the travis-ci environment can pick up on some
>> of these issues.
>
> Thanks for the detailed analyses!
>
> Should I move forward with this by doing the division into start_seq and
> end_seq as you suggest?

Yeah, that sounds like a good idea, thanks!