Dear All,
there a various threads here as well as in other forums dealing with segmentation fault problems related to the stack size and openMP. Unfortunately, the information is (at least for my understanding) not consistent and--because some threads are rather old--it might also be outdated. My prior information is https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/, where the recommendation for openMP applications is to unlimit the stack size of the operation system via
ulimit -s unlimited
or something like that, rather than giving the compiler a limit (via -heap-arrays) for the size of arrays that it should put on the stack (i.e. all arrays know to compile time to be larger are put on the heap). Unfortunately, the linked other thread (https://software.intel.com/en-us/articles/intel-fortran-compiler-increased-stack-usage-of-80-or-higher-compilers-causes-segmentation-fault) recommends using "-heap-arrays" independent if the application uses openMP or not. Also, https://software.intel.com/en-us/forums/topic/501500#comment-1779157 recommends not to set the stack size to unlimited.
I tried to set the stack size to a bigger size (default on Ubuntu is 8 kiB), but this might cause problems with other software using threads and also, it is hard for me to estimate the stack size.
function mesh_build_cellnodes(nodes,Ncellnodes) implicit none integer(pInt), intent(in) :: Ncellnodes !< requested number of cellnodes real(pReal), dimension(3,mesh_Nnodes), intent(in) :: nodes real(pReal), dimension(3,Ncellnodes) :: mesh_build_cellnodes integer(pInt) :: & e,t,n,m, & localCellnodeID real(pReal), dimension(3) :: & myCoords mesh_build_cellnodes = 0.0_pReal !$OMP PARALLEL DO PRIVATE(e,localCellnodeID,t,myCoords) do n = 1_pInt,Ncellnodes ! loop over cell nodes e = mesh_cellnodeParent(1,n) localCellnodeID = mesh_cellnodeParent(2,n) t = mesh_element(2,e) ! get element type myCoords = 0.0_pReal do m = 1_pInt,FE_Nnodes(t) myCoords = myCoords + nodes(1:3,mesh_element(4_pInt+m,e)) & * FE_cellnodeParentnodeWeights(m,localCellnodeID,t) enddo mesh_build_cellnodes(1:3,n) = myCoords / sum(FE_cellnodeParentnodeWeights(:,localCellnodeID,t)) enddo !$OMP END PARALLEL DO end function mesh_build_cellnodes
where "mesh_Nnodes" can be in the range of 1000 to some Mio. According to https://software.intel.com/en-us/forums/topic/301590#comment-1524955, I understand that the "-heap-arrays" option will force the compiler to allocate "nodes" to the heap and not to the stack independently of the value given because its size is not known at compile time. My current solution is, to set
-heap-arrays 8
as an compiler option since this seems to be a reasonable value for the stack size. In fact, on Ubuntu it is
ulimit -s 8192
My application runs fine with that (but would crash when I do not use the compiler option). The other possibility would be to unlimit the stack size (and omit the compiler option).
Finally, my question is which of the two method is better for openMP (1 to 32 threads) applications and which advantages and disadvantages they have. I know, that setting and limit for arrays to be put on the stack might influence performance and I'm currently do some profiling.
Thanks in advance for your contributions and apologies for raising this question again.