Hi,
I am running a mpi CFD code in my sch's cluster using 100 cpu.
Due to my program's structure, this is the max cpu I can use. At the same time, due to the number of grids using in my code, I am reaching the available memory limit.
The code ran and hang at a spot. After debugging, I found that it hangs in one of the subroutines. In this subroutines, I have to update the values of different variables across all cpu using mpi.
There are some local array which I need to create. If I declare them using:
subroutine mpi_var_...
real(8) :: var_ksta(row_num*size_x*size_y), ...
...
end subroutine
The code hangs.
However, if I do this:
subroutine mpi_var_...
real(8), allocatable :: var_ksta(:) ...
allocate (var_ksta(row_num*size_x*size_y)
...
deallocate (var_ksta, STAT=status(1))
end subroutine
The code works. So how different is memory allocated in these 2 situations?
If I am not tied down by memory limit, is the 1st subroutine faster or the same as the 2nd one (with allocation / deallocation slowing it down)?
Thanks!