[ovs-discuss] ovs-vswitchd 2.0 has high cpu usage

Ethan Jackson ethan at nicira.com
Tue Dec 3 20:54:51 UTC 2013


Hi Han,

Thanks for taking the time to do some profiling work on this.  In
response to this thread, I'd like to get a couple of points across
which hopefully you'll find helpful.

It's important to note that OVS 2.0 is the first release with multiple
thread support.  In this release we focused on getting the basic
structure down and correctness.  There's going to be a ton of
interesting work which can be done to optimize from this point
forward.  Specifically we're going to look at structural changes to
further improve the performance and it's ability so scale to multiple
threads.  Also we'll invest in improving the efficiency of each
thread, possibly by implementing something like RCU.  Given that it's
early days, it's not surprising that these things aren't as efficient
as they could be yet.

With regards to the fmbs.  These are a temporary solution which
clearly aren't ideally.  I'm working on some patches which should be
ready this week which remove them entirely, and instead handle flow
installation from the miss handler threads directly.  Obviously, this
is going to have a significant impact on the performance
characteristics of the switch, so it may be worth watching out for
these patches to be merged.

There's a tradeoff between CPU utilization and latency.  When a new
flow miss enters userspace, we can either choose to hold onto it in
hopes of forming a batch, or process it immediately reducing the time
required for new connections.  We've chosen the latter approach for a
couple of reasons.

First, latency is extremely important for many applications, so it's
worth optimizing for.  It'd be interesting to try the netperf TCP_CRR
test with your 10ms delay patch.  My suspicion is that it'd be
significantly worse than the more responsive approach.  Also, as the
system becomes more loaded, batching will happen naturally anyways so
the cost of this system is relatively low when under load.

Second, any system with a large number of threads is likely used
more-or-less exclusively as a switch.  For hypervisors, users can
easily reduce the number of threads and thus reducing the CPU
utilization if they're worried about it.  Given that in most cases
these highly threaded systems are used just to forward traffic, the
amount of CPU they're using should come second to their actual
performance moving packets around.  Rhetorically, would you rather
have a top of rack with high latency and 500% utilization, or low
latency and 1000%?

Ethan

On Mon, Dec 2, 2013 at 7:33 PM, Zhou, Han <hzhou8 at ebay.com> wrote:
> Hi Alex,
>
> Thanks for your kind feedback.
>
> On Tuesday, December 03, 2013 3:01 AM, Alex Wang wrote:
>> This is the case when the rate of incoming upcalls is slower than the
>> "dispatcher" reading speed.  After "dispatcher" breaks out the for loop, it is
>> necessary to wake up "handler" threads with upcalls, since the processing
>> latency matters.
>>
>> A batch mode may help, but more research needs to be done on reducing
>> the latency.  Have you done any experiment on related issues?
>
> Yes I tried replacing ovs_mutex_cond_wait() with pthread_cond_timedwait() in
> handler with 10ms timeout, and removed the final cond_signal loop in dispatcher.
> It reduced CPU cost significantly for vswitchd with the previous hping3 test with same
> throughput; but in idle situation with simple ping test it shows the timeout
> mechanism introduces up to 20ms latency in idle situation. It is hard to strike a
> balance with this approach ...
>
> BTW, I noticed that there is a wasted cond_signal in current code introduced
> in a previous patch: "ofproto-dpif-upcall: reduce number of wakeup" and suggested
> a patch:
> http://openvswitch.org/pipermail/dev/2013-November/034427.html
> Could you take a look and probably merge with your patch for improving fairness?
>
>>
>> > > 2. why ovs-vswitchd occupies so much CPU in short-lived flow test before
>> > my
>> > > change? And why it drops so dramatically? What's the contention between
>> > > ovs-vswitchd and upcall_handler?
>>
>> Yes, we also know that.  And we will start solving this soon.
>>
> It seems fmbs are still handled by ovs-vswitchd instead of multi-threading?
> What's the work division between miss-handler and ovs-vswitchd (current and future)?
> Anyway, it is great that you are solving this.
>
>> > > A better solution for this bottleneck of dispatcher, in my opinion, could be
>> > that
>> > > each handler thread receives upcalls assigned to them from kernel directly
>> > thus
>> > > no conditional wait and signal involved, which avoid unnecessary context
>> > switch
>> > > and futex scalling problem in multicore env. The selection of handler can be
>> > > done by kernel with same kind of hash, but put into different queues
>> > > per-handler, and this way packet order is preserved. Can this be a valid
>> > > proposal?
>>
>> Yes, I agree, this sounds like the direction we will go in the long term.  But for
>> now, we are focusing on partially addressing this in userspace.  Since:
>>
>> - we want to address the fairness issue as well.  And it is much easier to
>> model the solution in userspace first.
>> - the goal is to guarantee the upcall handling fairness even under DOS type
>> attack.
>
> Understand, and we will also have more test on the behavior when dispatcher becomes
> the bottleneck.
>
> Best regards,
> Han



More information about the discuss mailing list