[ovs-dev] [PATCH] ofproto: fix interactions between flow monitors and barriers

YAMAMOTO Takashi yamamoto at valinux.co.jp
Fri Feb 28 03:25:30 UTC 2014


> On Thu, Feb 27, 2014 at 02:50:47PM +0900, YAMAMOTO Takashi wrote:
>> >> On Wed, Feb 05, 2014 at 07:04:05PM +0900, YAMAMOTO Takashi wrote:
>> >>> Following OpenFlow 1.4 semantics, make barriers wait for
>> >>> flow monitor replies.  This should fix a race in
>> >>> "ofproto - flow monitoring pause and resume" test.
>> >>> 
>> >>> Signed-off-by: YAMAMOTO Takashi <yamamoto at valinux.co.jp>
>> >> 
>> >> I still don't understand this.  You cite the following:
>> >> 
>> >>      * OpenFlow 1.4.0 p.110:
>> >>      *    A OFPMP_FLOW_MONITOR multipart reply can not cross a barrier
>> >>      *    handshake.  The switch must always send the OFPMP_FLOW_MONITOR
>> >>      *    multipart reply for a given flow table change before the reply
>> >>      *    to a OFPT_BARRIER_REQUEST request that follows the request
>> >>      *    responsible for the flow table change.
>> >> 
>> >> but can you elaborate on your thought process here?  After all, when
>> >> we're paused we're not sending any flow table changes at all.
>> > 
>> > right.  and when resumed, we send flow table changes.
>> > the "ofproto - flow monitoring pause and resume" test seems to
>> > expect that it can wait for these flow table changes after resume
>> > using a barrier.  the expectation seems valid from my reading of
>> > the specification.
>> 
>> fwiw, i still see the failure with today's master.
> 
> OK, I spent some time trying to understand this, and I think that I
> follow now.  (My misunderstanding was partly because it doesn't
> actually fail for me.)
> 
> This test starts an ovs-ofctl monitoring the flow table and makes that
> process block so that it isn't actually reading any of the monitor
> replies, causing the queue to fill up and ovs-vswitchd to send down a
> "pause" message.  Then it unblocks, sends a barrier, waits for the
> reply, and then tells ovs-ofctl to exit.  It checks the log to make
> sure that the output includes a resume message.  Thus, it implicitly
> depended on the resume coming before the barrier reply or, at least,
> before it tells ovs-ofctl to exit.
> 
> If I'm reading it right, this commit fixes the problem by never
> sending a barrier reply while a monitor is paused, instead waiting for
> the monitor to resume first.  This will certainly avoid the problem.
> I am not sure that the spec actually requires it (if it does, then it
> is due to lack of alertness on my part, because the OpenFlow spec is
> based on what Open vSwitch does).  It seems to me, though, that it
> adds a risk of not ever making forward progress, because it is
> possible that we resume and then pause again before the barrier reply
> is sent.
> 
> Thus, I am inclined to consider this a bug in the test, and fixing it
> by waiting until the "resume" shows up in the monitor log.
> 
> What do you think?

thanks for taking you time for this.

your understanding of the situation and what my patch does
is right.  we can avoid the problem by fixing the test in
that way.

probably either the spec or my reading of it is wrong.
but how it should be?  simply a barrier is not usable for the
purpose to ensure receiving all flow updates for previous mods,
as the peer switch is free to pause/resume in the meantime?
in that case, what is the restriction "A OFPMP_FLOW_MONITOR
multipart reply can not cross a barrier handshake" intended for?

the test in question uses multiple controllers, one for monitoring
and another for flow_mod.  i guess that the spec was written with
a single controller assumption and doesn't cover the test case.

YAMAMOTO Takashi

> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list