[ovs-discuss] ovs-vswitchd 2.0 has high cpu usage

Tue Dec 3 03:33:24 UTC 2013

Hi Alex,

Thanks for your kind feedback.

On Tuesday, December 03, 2013 3:01 AM, Alex Wang wrote:
> This is the case when the rate of incoming upcalls is slower than the
> "dispatcher" reading speed.  After "dispatcher" breaks out the for loop, it is
> necessary to wake up "handler" threads with upcalls, since the processing
> latency matters.
>
> A batch mode may help, but more research needs to be done on reducing
> the latency.  Have you done any experiment on related issues?

Yes I tried replacing ovs_mutex_cond_wait() with pthread_cond_timedwait() in
handler with 10ms timeout, and removed the final cond_signal loop in dispatcher. 
It reduced CPU cost significantly for vswitchd with the previous hping3 test with same
throughput; but in idle situation with simple ping test it shows the timeout
mechanism introduces up to 20ms latency in idle situation. It is hard to strike a
balance with this approach ...

BTW, I noticed that there is a wasted cond_signal in current code introduced
in a previous patch: "ofproto-dpif-upcall: reduce number of wakeup" and suggested
a patch:
http://openvswitch.org/pipermail/dev/2013-November/034427.html
Could you take a look and probably merge with your patch for improving fairness?

>
> > > 2. why ovs-vswitchd occupies so much CPU in short-lived flow test before
> > my
> > > change? And why it drops so dramatically? What's the contention between
> > > ovs-vswitchd and upcall_handler?
>
> Yes, we also know that.  And we will start solving this soon.
>
It seems fmbs are still handled by ovs-vswitchd instead of multi-threading?
What's the work division between miss-handler and ovs-vswitchd (current and future)?
Anyway, it is great that you are solving this. 

> > > A better solution for this bottleneck of dispatcher, in my opinion, could be
> > that
> > > each handler thread receives upcalls assigned to them from kernel directly
> > thus
> > > no conditional wait and signal involved, which avoid unnecessary context
> > switch
> > > and futex scalling problem in multicore env. The selection of handler can be
> > > done by kernel with same kind of hash, but put into different queues
> > > per-handler, and this way packet order is preserved. Can this be a valid
> > > proposal?
>
> Yes, I agree, this sounds like the direction we will go in the long term.  But for
> now, we are focusing on partially addressing this in userspace.  Since:
>
> - we want to address the fairness issue as well.  And it is much easier to
> model the solution in userspace first.
> - the goal is to guarantee the upcall handling fairness even under DOS type
> attack.

Understand, and we will also have more test on the behavior when dispatcher becomes
the bottleneck.

Best regards,
Han