[ovs-dev] [PATCH V3 1/2] ofproto-dpif-upcall: Allow main thread to pause all revalidators.

Joe Stringer joestringer at nicira.com
Sat Aug 29 18:34:22 UTC 2015

Thanks for working on this, Alex. I've considered implementing an
approach like this before, but haven't had a strong reason to.

On 29 August 2015 at 00:42, Alex Wang <ee07b291 at gmail.com> wrote:
> This commit adds logic using ovs barrier to allow main thread pause
> all revalidators.  This new feature will be used in a later patch.
> Signed-off-by: Alex Wang <ee07b291 at gmail.com>


> @@ -762,6 +791,11 @@ udpif_revalidator(void *arg)
>      revalidator->id = ovsthread_id_self();
>      for (;;) {
> +        /* Pauses all revalidators if wanted. */
> +        if (latch_is_set(&udpif->pause_latch)) {
> +            revalidator_pause(revalidator);
> +        }
> +

Is there anything that ensures all revalidators are either before this
statement, or after this statement, when the latch is modified? We've
had issues with barriers before where not all revalidators hit a
barrier, and that can cause deadlocks or hung threads in some cases.
For instance, what happens if the system is under heavy load (let's
say, nested virtualization) and we have one revalidator thread which
runs and proceeds through this piece, then a second revalidator is
delayed before executing this, then the latch is modified by the main
thread, then the second revalidator gets scheduled to run? There won't
be enough threads hitting the barrier to let it continue.

I think that this type of issue doesn't affect exit_latch, because
exit_latch is only checked by the lead revalidator thread, which then
raises udpif->reval_exit before the barrier, and the value of
udpif->reval_exit is checked by all revalidators after the barrier.
Perhaps it would make sense in this case to follow that logic too?

More information about the dev mailing list