[ovs-dev] [PATCH] ofproto-dpif-upcall: Batch upcalls.

Wed Aug 28 16:39:56 UTC 2013

On Aug 27, 2013, at 11:02 PM, Ben Pfaff <blp at nicira.com> wrote:

> On Tue, Aug 27, 2013 at 03:50:06PM -0700, Jarno Rajahalme wrote:
>> 
>> On Aug 27, 2013, at 3:13 PM, Ben Pfaff <blp at nicira.com> wrote:
>> 
>>> On Tue, Aug 27, 2013 at 03:04:54PM -0700, Jarno Rajahalme wrote:
>>>> Batching reduces overheads and enables upto 4 times the upcall processing
>>>> performance in a specialized test case.
>>>> 
>>>> Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
>>> 
>>> Nice!
>>> 
>>>> +    for (n = 0; n < udpif->n_handlers; ++n) {
>>>> +        handler = &udpif->handlers[n];
>>>> +        if (handler->n_new_upcalls) {
>>>> +            handler->n_new_upcalls = 0;
>>>> +            xpthread_cond_signal(&handler->wake_cond);
>>>> +        }
>>> 
>>> Usually there's a race if one signals a condition variable without
>>> holding the corresponding mutex.  Did you check that there is no race
>>> here?
>> 
>> I did not notice anything hanging or the like, if that's what you
>> mean. Documentation says:
> 
> One specific race that I was concerned about is:
> 
> 1. This code in udpif_miss_handler() checks n_upcalls and sees that it
>   is zero.
> 
>        if (!handler->n_upcalls) {
> 
> 2. This code in recv_upcalls() signals wake_cond:
> 
>    for (n = 0; n < udpif->n_handlers; ++n) {
>        handler = &udpif->handlers[n];
>        if (handler->n_new_upcalls) {
>            handler->n_new_upcalls = 0;
>            xpthread_cond_signal(&handler->wake_cond);
>        }
>    }
> 
> 3. This code in udpif_miss_handler() starts waiting on wake_cond:
> 
>            ovs_mutex_cond_wait(&handler->wake_cond, &handler->mutex);
> 
> Maybe this race cannot happen, because n_upcalls only changes with the
> mutex taken.  I guess that's the case.

It seems that the cond_signal can indeed happen before the corresponding cond_wait, but only if n_upcalls became zero due to the handler already running in parallel with the dispatcher and emptying the queue right before the dispatcher got to sending the wake signal. In this case the wait is the right thing to do as the wake-up signal was unnecessary.

Mutex is taken before new upcalls are pushed, so the handler would have to enter the cond_wait before the dispatcher can push new upcalls (and increment the n_upcalls). Hence no race.

From the above I see that when both the dispatcher and the handler are running in parallel, it usually is not necessary to signal the handler each time FLOW_MISS_MAX_PATCH upcalls has been pushed. This incremental will signal only if the actual number of upcalls in the list reaches the limit, or when no more upcalls are incoming, and there is (was) something in the queue:

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 112c78d..0344dcb 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -566,13 +566,13 @@ recv_upcalls(struct udpif *udpif)
             ovs_mutex_lock(&handler->mutex);
             if (handler->n_upcalls < MAX_QUEUE_LENGTH) {
                 list_push_back(&handler->upcalls, &upcall->list_node);
-                handler->n_upcalls++;
+                handler->n_new_upcalls = ++handler->n_upcalls;
                 ovs_mutex_unlock(&handler->mutex);
 
-                if (++handler->n_new_upcalls >= FLOW_MISS_MAX_BATCH) {
-                    handler->n_new_upcalls = 0;
+                if (handler->n_new_upcalls >= FLOW_MISS_MAX_BATCH) {
                     xpthread_cond_signal(&handler->wake_cond);
                 }
+
                 if (!VLOG_DROP_DBG(&rl)) {
                     struct ds ds = DS_EMPTY_INITIALIZER;
 
@@ -590,12 +590,11 @@ recv_upcalls(struct udpif *udpif)
         } else {
             ovs_mutex_lock(&udpif->upcall_mutex);
             if (udpif->n_upcalls < MAX_QUEUE_LENGTH) {
-                udpif->n_upcalls++;
+                n_udpif_new_upcalls = ++udpif->n_upcalls;
                 list_push_back(&udpif->upcalls, &upcall->list_node);
                 ovs_mutex_unlock(&udpif->upcall_mutex);
 
-                if (++n_udpif_new_upcalls >= FLOW_MISS_MAX_BATCH) {
-                    n_udpif_new_upcalls = 0;
+                if (n_udpif_new_upcalls >= FLOW_MISS_MAX_BATCH) {
                     seq_change(udpif->wait_seq);
                 }
             } else {
@@ -613,7 +612,6 @@ recv_upcalls(struct udpif *udpif)
         }
     }
     if (n_udpif_new_upcalls) {
-        n_udpif_new_upcalls = 0;
         seq_change(udpif->wait_seq);
     }
 }

(The code you cited at 2. remains the same, right before the last hunk above.)

I will send v2 if you are happy with the analysis, and if the incremental did not introduce new problems that I can't immediately see.

  Jarno

> 
> Ethan, unless you see something, I'm happy with this.  (I'll be at
> VMworld tomorrow and probably not continuing the conversation.)
> 
> Acked-by: Ben Pfaff <blp at nicira.com>