[ovs-dev] 4K resubmit limit and stack size usage

Eelco Chaudron echaudro at redhat.com
Fri Apr 24 10:06:02 UTC 2020



On 23 Apr 2020, at 17:32, William Tu wrote:

> On Thu, Apr 23, 2020 at 6:29 AM Eelco Chaudron <echaudro at redhat.com> 
> wrote:
>>
>> Hi Ben et al.
>>
>> We recently had an issue where OVS would crash as it was running out 
>> of
>> stack space processing an OVN flow loop :)  I was hoping it would 
>> jump
>> out of the loop, but due to change, "790c5d269 ofproto-dpif: Do not
>> count resubmit to later tables against limit." the resubmit loop can 
>> be
>> up to 4K.
>>
>> When the clone action is used (and others) the stack size increases
>> quite drastically, some tests showed that over 19M was needed to 
>> reach
>> the 4K limit. Even a simple resubmit to resubmit jump back and forth
>> till 4K is reached requires a 3.5M stack size.
>>
>> Some small changes, like doing malloc for mf_subvalue, and 
>> actset_stub
>> in clone_xlate_actions() allowed the worst case to go from around 19M 
>> to
>> 12M, but still, this is a lot of stack memory.
>>
>> One idea could be that on the last action in the list try to unwind 
>> the
>> stack (recursion) to the previous nonfinal action and then continue
>> processing this action. I'm not too familiar with the xlate code, but 
>> it
>> looks quite complex already, so not sure if this is an option :) Also
>> not sure if this gives us enough relief in all the OVN scenario as 
>> they
>> use a lot of resubmits in a single action list.
>>
>> Another idea Dumitru had was to delay clone() execution until you get
>> back to the root actionset. So when you hit a clone() action you 
>> store
>> the state in the ctx, and then go over the list once you return (this
>> could result in a growing list). But you do not end up processing the
>> clones on the branch of the tree. The only problem is that this 
>> results
>> in out of order processing of the action list, i.e. 
>> clone(resubmit(,5)),
>> 2. Will first sent out the packet on 2 and then on the destination on
>> the clone() action. I guess this is a blocking thing, as the OpenFlow
>> specification specifies action lists should be executed in order.
>>
>> Any other ideas on the above or on how to optimize the stack usage?
>>
> Hi Eelco,
> Can we just increase the stack space to a larger value?
> Ex: setting ulimit -s to 32Mb
> William

You are right, I should have mentioned the problem statement why this 
might not be desired.

Let's assume you have a system with 56 cores. In this case, you will get 
roughly 56 threads, all taking 32M, so 1.7G. To make it even worse we 
run OVS with the mlockall() option so all memory gets reserved and pined 
into memory...

I know the number of cores can be tuned with 
n-revalidator-threads/n-handler-threads and the stack size with systemd 
(in our case). But it would be good the minimize the stack usage in 
general, so we can avoid all this setup specific tuning.

//Eelco



More information about the dev mailing list