[ovs-discuss] ovs-vswitchd mlockall and stack size

Tue Jul 15 22:47:14 UTC 2014

I agree.

That sounds like a reasonable step to take.

Thanks for the analysis.  (Please include it in the commit message.)

On Mon, Jul 14, 2014 at 06:06:09PM +0100, Anoob Soman wrote:
> Hi Ben,
> 
> Thanks for the suggestions. I did some quick test and analysis of
> the stack usage on ovs-2.1.2 (planning to do it on master later) and
> here are some of the findings.
> 
> The list shows the stack usage (in bytes) by each of the functions.
> I have collected only those functions which uses more than 1KB of
> stack.
> dpif_linux_operate             114536
> updif_upcall_handler            70856
> nl_sock_recv__                  65656
> json_from_stream                8216
> udpif_revalidator               7672
> revalidator_sweep__             6008
> system_stats_thread_func        4872
> netdev_linux_run                4408
> dpif_linux_port_poll            4328
> nln_run                         4208
> nl_sock_transact_multiple       3368
> xlate_actions                   3112
> ofproto_trace                   2760
> handle_openflow__               2072
> do_xlate_actions                1928
> handle_flow_stats_request       1784
> ofp_print_nxst_flow_monitor_reply 1712
> handle_aggregate_stats_request  1368
> netdev_linux_sys_get_stats      1352
> append_group_stats &
> ofproto_dpif_execute_actions    1320
> parse_odp_key_mask_attr         1304
> xlate_group_bucket              1272
> dpif_ipfix_cache_expire
> & describe_fd                   1256
> dpif_linux_execute              1160
> handle_flow_monitor_request     1112
> sfl_agent_sysError              1080
> handle_meter_mod                1048
> sfl_agent_error                 1032
> 
> As you can see there are few functions that are in packet processing path.
> 
> Assuming that we ran a "AT_SETUP([ofproto-dpif - infinite
> resubmit])" test, which cause do_xlate_actions (and friends) to
> re-curse (64 levels deep), roughly around 400KB of stack would be
> used by "updif_upcall_handler". I think the stack usage of
> udpif_revalidator should be same as that of updif_upcall_handler (if
> not less). I limited the stack size of all the pthreads to 512KB and
> was able to run both the tests you mentioned.
> 
> I tried valgrind (tool=massif) against ovs-vswitchd and ran
> "AT_SETUP([ofproto-dpif - infinite resubmit])" test and valgrind
> reported the max stack usage was around 400MB and
> "AT_SETUP([ofproto-dpif - exponential resubmit chain])" uses around
> 700MB. This was with 4vCPUs (6 pthreads). valgrind though reports
> the total stack usage by all threads.
> 
> This makes me believe that 1MB of stack size should be enough for
> each of pthreads and 512KB would be tight. Let me know your
> thoughts. I will send out a patch which would limit pthread stack
> size to 1024KB and would it make it "other-config" configurable.
> 
> Thanks,
> Anoob.
> 
> On 08/07/14 17:47, Ben Pfaff wrote:
> >I guess that the biggest effect on stack size would be the flow table
> >and in particular how much recursion flow processing causes.  There are
> >a few tests that force as-deep-as-possible recursion:
> >
> >     AT_SETUP([ofproto-dpif - infinite resubmit])
> >
> >I don't think that forcing all packets to userspace would have much of
> >an effect.  (The closest equivalent would be to disable megaflows,
> >there's an "ovs-appctl" command for that, look in "ovs-appctl help".)
> >
> >Another hint toward maximum stack requirement is to look through the
> >generated asm for stack usage, e.g.:
> >
> >         objdump -dr vswitchd/ovs-vswitchd|sed -n 's/^.*sub.*$0x\([0-9a-f]\{1,\}\),%esp/\1/p'|sort|uniq|less
> >
> >which shows that we have at least one place where we allocate 327,788
> >bytes on the stack (!).  I hope that is not in the flow processing path!
> >
> >On Tue, Jul 08, 2014 at 05:36:07PM +0100, Anoob Soman wrote:
> >>I have been running tests with 1MB stack size and ovs-vswitchd seem
> >>to hold pretty well. I will try to do some more experiments to find
> >>out the max depth of the stack, but I am afraid this will totally
> >>depend on the test I am running. Any suggestion on what sort of test
> >>I should be running ? More over "force-miss-model" other-config is
> >>missing from 2.1.x as there is no concept of facets. Is there way
> >>that I can force all packets to be processed in userspace, other
> >>than me doing "ovs-dpctl del-flows" periodically.
> >>
> >>Thanks,
> >>Anoob.
> >>On 08/07/14 17:15, Ben Pfaff wrote:
> >>>On Tue, Jul 08, 2014 at 05:08:43PM +0100, Anoob Soman wrote:
> >>>>Since openvswitch has moved to multi-threaded model, RSS usage of
> >>>>ovs-vswitchd has increased quite significantly compared to the last
> >>>>release we used (ovs-1.4.x). Part of the problem is using mlockall
> >>>>(with MCL_CURRENT|MCL_FUTURE) on ovs-vswitchd, which causes every
> >>>>pthreads stack's and heap's virtual address to locked to RAM.
> >>>>ovs-vswitch (2.1.x) running on a 8 vCPU dom0 (10 pthreads) uses
> >>>>around 89M of RSS (80MB just for stack), without any VMs running on
> >>>>the host. One way to reduce RSS would be to reduce the number of
> >>>>"n-handler-threads" and "n-revalidator-threads", but I am not sure
> >>>>about the performance impact of having these thread numbers reduced.
> >>>>I am wondering if the stack size of the pthreads can be reduce
> >>>>(using pthread_attr_setstack). By default pthreads max stack size is
> >>>>8MB and mlockall locks all of this 8MB into RAM. What could be
> >>>>optimal stack size that can be used.
> >>>I think it would be very reasonable to reduce the stack sizes, but I
> >>>don't know the "correct" size off-hand.  Since you're looking at the
> >>>problem already, perhaps you should consider some experiments.
>