Stack Size and openMP

Dear All,

there a various threads here as well as in other forums dealing with segmentation fault problems related to the stack size and openMP. Unfortunately, the information is (at least for my understanding) not consistent and--because some threads are rather old--it might also be outdated. My prior information is https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/, where the recommendation for openMP applications is to unlimit the stack size of the operation system via

ulimit -s unlimited

or something like that, rather than giving the compiler a limit (via -heap-arrays) for the size of arrays that it should put on the stack (i.e. all arrays know to compile time to be larger are put on the heap). Unfortunately, the linked other thread (https://software.intel.com/en-us/articles/intel-fortran-compiler-increased-stack-usage-of-80-or-higher-compilers-causes-segmentation-fault) recommends using "-heap-arrays" independent if the application uses openMP or not. Also, https://software.intel.com/en-us/forums/topic/501500#comment-1779157 recommends not to set the stack size to unlimited.

I tried to set the stack size to a bigger size (default on Ubuntu is 8 kiB), but this might cause problems with other software using threads and also, it is hard for me to estimate the stack size.

function mesh_build_cellnodes(nodes,Ncellnodes)

 implicit none
 integer(pInt),                         intent(in) :: Ncellnodes                                    !< requested number of cellnodes
 real(pReal), dimension(3,mesh_Nnodes), intent(in) :: nodes
 real(pReal), dimension(3,Ncellnodes) :: mesh_build_cellnodes

 integer(pInt) :: &
   e,t,n,m, &
   localCellnodeID
 real(pReal), dimension(3) :: &
   myCoords

 mesh_build_cellnodes = 0.0_pReal
!$OMP PARALLEL DO PRIVATE(e,localCellnodeID,t,myCoords)
 do n = 1_pInt,Ncellnodes                                                                           ! loop over cell nodes
   e = mesh_cellnodeParent(1,n)
   localCellnodeID = mesh_cellnodeParent(2,n)
   t = mesh_element(2,e)                                                                            ! get element type
   myCoords = 0.0_pReal
   do m = 1_pInt,FE_Nnodes(t)
     myCoords = myCoords + nodes(1:3,mesh_element(4_pInt+m,e)) &
                         * FE_cellnodeParentnodeWeights(m,localCellnodeID,t)
   enddo
   mesh_build_cellnodes(1:3,n) = myCoords / sum(FE_cellnodeParentnodeWeights(:,localCellnodeID,t))
 enddo
!$OMP END PARALLEL DO

end function mesh_build_cellnodes

where "mesh_Nnodes" can be in the range of 1000 to some Mio. According to https://software.intel.com/en-us/forums/topic/301590#comment-1524955, I understand that the "-heap-arrays" option will force the compiler to allocate "nodes" to the heap and not to the stack independently of the value given because its size is not known at compile time. My current solution is, to set

-heap-arrays 8

as an compiler option since this seems to be a reasonable value for the stack size. In fact, on Ubuntu it is

ulimit -s 8192

My application runs fine with that (but would crash when I do not use the compiler option). The other possibility would be to unlimit the stack size (and omit the compiler option).

Finally, my question is which of the two method is better for openMP (1 to 32 threads) applications and which advantages and disadvantages they have. I know, that setting and limit for arrays to be put on the stack might influence performance and I'm currently do some profiling.

Thanks in advance for your contributions and apologies for raising this question again.

Latest Images

Trending Articles

Latest Images