I have a segfault I would appreciate some help with. A nearly minimal code that reproduces it is attached.
The background is, that I am developing a code that handles big matrices, which should be distributed over CPUs along one index (labeled z in the example). I want to determine the distribution during run time, based on the number of prcosses as returned by an MPI routine. The way I have set it up it to have a module "global", that all other modules use, with some auxiliary variables related to the partitioning in it. In the main program I then obtain the number of processes and allocate these variables (in the example code only ny and nz, integers that appear in loop bounds, and kz, an allocatable array). Note that I have removed all MPI-related code from the example, setting nprocs and myrank by a simple assignment.
When I compile the attached code on our small cluster, running Linux version 2.6.18-164.11.1.el5 (Red Hat 4.1.2-46) and ifort version 11.1, I find that
* with optimization -O1 and -O2 the code runs and terminates cleanly;
* with optimization -O3 I get:
> ifort -O3 -traceback -o test.x DNS_int.f90
> ./test.x
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
test.x 0000000000403062 hit3d_mp_rhs3_ 44 hit3d.f90
test.x 0000000000402EC0 hit3d_mp_rhs_ 22 hit3d.f90
test.x 0000000000402C53 MAIN__ 33 DNS_int.f90
test.x 0000000000402ACC Unknown Unknown Unknown
libc.so.6 0000003A9B01D994 Unknown Unknown Unknown
test.x 00000000004029D9 Unknown Unknown Unknown
It would seem that the root cause is the way that kz is handled. If I declare it just like kx and ky, rather than dynamically, the segfault disappears. That would not be a solution, though, as I need to allocate it dynamically.
My questions:
1) Is the construction I use correct? If not, please suggest a correct way to do this (to allocate kz based on a value of nprocs determined during runtime).
2) If it is correct, then is this a compiler bug? Is there a work-around that keeps my code portable and the executable near-optimal?
Two more observations that may be relevant:
When I add compiler flags sometimes the segfault goes away. For instance, combining -O3 with any of the following: -check pointers, -check bounds, -check uninit, -no-vec makes the segfault disappear,
When I compile the code on my laptop, running Linux version 3.11.0-26-generic (Ubuntu 13.10) with ifort 12.1.0, there is no segfault at all at any optimization level.
Any help with this would be greatly appreciated.