Quantcast
Channel: Intel® Fortran Compiler
Viewing all articles
Browse latest Browse all 3270

Stack Size and openMP

$
0
0

Dear All,

there a various threads here as well as in other forums dealing with segmentation fault problems related to the stack size and openMP. Unfortunately, the information is (at least for my understanding) not consistent and--because some threads are rather old--it might also be outdated. My primary information is https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/, where the recommendation for openMP applications is to remove the stack size limit of the operation system via

ulimit -s unlimited

or set it to a high value, rather than giving the compiler a limit (via -heap-arrays) for the size of arrays that it should put on the stack (i.e. all arrays known to compile time to be larger are put on the heap). Unfortunately, the other thread linked in the aforementioned (https://software.intel.com/en-us/articles/intel-fortran-compiler-increased-stack-usage-of-80-or-higher-compilers-causes-segmentation-fault) recommends using "-heap-arrays" independent if the application uses openMP or not. Also, https://software.intel.com/en-us/forums/topic/501500#comment-1779157 recommends not to set the stack size to unlimited.

I tried to set the stack size to a bigger size (default on Ubuntu is 8 kiB), but this sometimes causes problems with other software that is using threads. Also, it is hard for me to estimate the required stack size.

A critical part of the my code looks like

function mesh_build_cellnodes(nodes,Ncellnodes)

	 implicit none
	 integer(pInt),                         intent(in) :: Ncellnodes
	 real(pReal), dimension(3,mesh_Nnodes), intent(in) :: nodes
	 real(pReal), dimension(3,Ncellnodes) :: mesh_build_cellnodes

	 integer(pInt) :: &
	   e,t,n,m, &
	   localCellnodeID
	 real(pReal), dimension(3) :: &
	   myCoords

	 mesh_build_cellnodes = 0.0_pReal
	!$OMP PARALLEL DO PRIVATE(e,localCellnodeID,t,myCoords)
	 do n = 1_pInt,Ncellnodes
	   e = mesh_cellnodeParent(1,n)
	   localCellnodeID = mesh_cellnodeParent(2,n)
	   t = mesh_element(2,e)
	   myCoords = 0.0_pReal
	   do m = 1_pInt,FE_Nnodes(t)
	     myCoords = myCoords + nodes(1:3,mesh_element(4_pInt+m,e)) &
	                         * FE_cellnodeParentnodeWeights(m,localCellnodeID,t)
	   enddo
	   mesh_build_cellnodes(1:3,n) = myCoords / sum(FE_cellnodeParentnodeWeights(:,localCellnodeID,t))
	 enddo
	!$OMP END PARALLEL DO

end function mesh_build_cellnodes

where "mesh_Nnodes" can be in the range of 1000 to some Mio. According to https://software.intel.com/en-us/forums/topic/301590#comment-1524955, I understand that the "-heap-arrays" option will force the compiler to allocate "nodes" to the heap and not to the stack independently of the value given because its size is not known at compile time. A possible solution is, to give

-heap-arrays 8

as an compiler option since this seems to be a reasonable value for the stack size. In fact, on Ubuntu it is

ulimit -s 8192

My application runs fine with that (but would crash when I do not use the compiler option). The other possibility is still to unlimit the stack size (and omit the compiler option).

Since removing the stack size limit (and omitting the heap-arrays option) will significantly improve the performance at a slightly higher memory consumption (according to the time command, see attached file), I would rather use this option.

Therefore, my question is, if this method has any disadvantages for an application running with 1 to 32 threads.

Thanks in advance for your contributions and apologies for raising this question again.


Viewing all articles
Browse latest Browse all 3270

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>