[ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time

Lilijun (Jerry, Cloud Networking) jerry.lilijun at huawei.com
Thu Dec 13 07:07:10 UTC 2018


Hi, Ben

You're right, thanks for your help.

I have got the cause of this problem:   there is some mistake in my code that causes the current thread's waiter can't be inserted into &waiter->thread->waiters in the function seq_wait__(). So it can't be woke up when other threads have changed the global_seqno.

> -----Original Message-----
> From: Ben Pfaff [mailto:blp at ovn.org]
> Sent: Monday, December 10, 2018 11:49 PM
> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun at huawei.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> 
> I mean that the seq_wait() call should cause ovsrcu_synchronize() to wake
> up immediately when it can return.  If it doesn't, something is wrong.
> 
> On Mon, Dec 10, 2018 at 07:30:09AM +0000, Lilijun (Jerry, Cloud Networking)
> wrote:
> > Hi Ben,
> >
> > Do you mean that there some threads in in ovsrcu_threads list were
> wrongly using seq_wait or rcu functions?
> >
> > Thanks, I will do some more research on this problem.
> >
> > > -----Original Message-----
> > > From: Ben Pfaff [mailto:blp at ovn.org]
> > > Sent: Monday, December 10, 2018 10:08 AM
> > > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun at huawei.com>
> > > Cc: dev at openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block time
> > >
> > > If that's the case, then there's something wrong with seq_wait() or
> > > the use of it in this context.  The timer logic here is just to issue log
> messages.
> > >
> > > It would be better to figure out the real problem, because that
> > > would reduce the 100 ms (or 1000 ms) latency to near-zero.
> > >
> > > On Mon, Dec 10, 2018 at 01:50:41AM +0000, Lilijun (Jerry, Cloud
> > > Networking)
> > > wrote:
> > > > Hi, Ben
> > > >
> > > > Thanks for your reply.
> > > >
> > > > Do you mean that the function poll_timer_wait_until(start +
> > > warning_threshold) call only  wakeup to do some need log messages
> > > and
> > > ovsrcu_synchronize() will wait for seq_wait()?
> > > >
> > > > According to my understanding, ovsrcu_synchronize() will return
> > > > until
> > > every thread in ovsrcu_threads list have change their seqno to
> target_seqno.
> > > When some thread call ovsrcu_quiesce(), global_seqno is changed.
> > > Then
> > > seq_wait() will check if global_seqno is changed and call
> > > poll_immediate_wake_at() to finish the poll_block()'s waiting.
> > > >
> > > > We assume that the first call of seq_wait() in this loop check
> > > > global_seqno
> > > failed for other thread hasn't been ready to call ovsrcu_quiesce().
> > > Then the loop will sleep 1000ms(set by warning_threshold) by
> > > poll_block() and check the thread's seqno or global_seqno again when
> > > no other threads finish the poll_block()'s waiting in early.
> > > >
> > > > If we set warning_threshold to 100ms, the look will check the
> > > > seqno more
> > > frequently and may break earlier.  I have test it myself, when
> > > warning_threshold is set to 100 as default, the loop 's time indeed
> > > reduce to 100ms, that's maybe some evidence.
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Ben Pfaff [mailto:blp at ovn.org]
> > > > > Sent: Friday, December 07, 2018 10:37 PM
> > > > > To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun at huawei.com>
> > > > > Cc: dev at openvswitch.org
> > > > > Subject: Re: [ovs-dev] [PATCH] [PATCH] rcu: reduce RCU block
> > > > > time
> > > > >
> > > > > On Fri, Dec 07, 2018 at 02:03:27PM +0800, Lilijun wrote:
> > > > > > When calling ovsrcu_synchronize(), it always block 1000ms
> > > > > > because we poll block until elapsed time become greater than
> > > > > > warning_threshold(1000).That's too long for some configuration
> > > commands.
> > > > > > So this patch reduces warning_threshold's default value to 100
> > > > > > and print logs after it have elapsed more than 1000ms.
> > > > > >
> > > > > > Signed-off-by: Lilijun <jerry.lilijun at huawei.com>
> > > > >
> > > > > This only makes any sense if the seq_wait() doesn't work.  Do
> > > > > you have evidence that is the case?


More information about the dev mailing list