[ovs-dev] [PATCH v2 6/8] ovs-thread: Quiesce when joining pthreads
grive at u256.net
Wed Jun 30 13:11:02 UTC 2021
On Wed, Jun 30, 2021, at 11:09, Ilya Maximets wrote:
> On 5/20/21 3:35 PM, Gaetan Rivet wrote:
> > Joining pthreads makes the caller quiescent. It should register as such,
> > as joined threads may wait on an RCU callback executing before quitting,
> > deadlocking the caller.
> Hi, Gaetan.
> This patch doesn't look right to me. The problem is that users of this
> function has no idea that the quiescent state will be entered by this
> function. And this is really hard to track down, because it can be called
> very deep inside some separate part of the code base that user at the
> top level might not even know about. For example, a lot of call chains
> in ovsdb-server may lead to xpthread_join called from ovsdb/log.c.
> Even though ovsdb-server is single-threaded now, insertion of the
> ovsrcu_quiesce_start() into xpthread_join() will effectively mean that
> ovsdb-server will never be able to use RCU if it will become multi-threaded
> someday, e.g. it will not be able to use CMAPs.
> So, instead of doing that, callers should enter quiescent state before
> joining, and this should be done at the highest level of a call chain
> possible. We have the same thing with cond_wait() implementation, you
> may add similar comment to the join() function as we have for cond_wait().
> Best regards, Ilya Maximets.
I think you are right, this is dangerous and puts the caller in a position
where they might break RCU without realizing.
Without quiescing, joining on a thread that has RCU callbacks scheduled
has 2 different behaviors: if the RCU is not blocking this is fine and
joining will resolve ; if it is blocking then both threads enter a deadlock.
The solution would be to explicitly quiesce as you mentioned, but this is what
I tried to avoid with this series: requiring developers to 'fix' code that is
only an issue if this very specific compilation option is used.
I don't yet see a good solution to it.
Documenting only is bound to leave incorrect code through, that would be fine
most of the time but will break on people trying to use RCU blocking.
I could try to detect the deadlock (I don't see how yet), issue a warning and
make the joined thread non-blocking / wake it if already blocked.
Thanks for the review,
More information about the dev