Hello,
I wonder is anyone has the time and inclination to have look at the code below for
any possible improvements.
The extract included here is the the heaviest user of cpu in a large-ish simulation code .
A typical run would take 6-9 months of running 24/24 and 7/7 with 6 threads on six cores.
The omp part is working very well and there cannot be much inprovement with the multithreading part.
The compiler call used for the whole code is:
ifort -O3 -r8 -openmp -fpp -parallel -mcmodel=medium -i-dynamic -shared-intel
Would there be a benefit if part or all of it were written in assembler?
Lots and lots of thanks for any suggestions.
--
! typical values ! N1 = 768 ! N2 = N3 = 12 ! M3 = M2 = 42 Do KEL = 1, N3 Do JEL = 1, N2 ... [address calculations] !$OMP PARALLEL DEFAULT(SHARED) PRIVATE( I, J, K, JA, KA, JJ, KK ) !$OMP DO Do K = 1, M3 Do J = 1, M2 JJ = (J-1)*NX32 ! - copy into work arrays for later fft. Do I = 1, N1 WK_1( JJ+I, K ) = U( J_Jump+J, K_Jump+K, I ) WK_2( JJ+I, K ) = V( J_Jump+J, K_Jump+K, I ) WK_3( JJ+I, K ) = W( J_Jump+J, K_Jump+K, I ) End Do Do I = 1, N1, 2 ! - du/dx WKX_1( JJ+I, K ) = -Wv(i)*U( J_Jump+J, K_Jump+K, I+1 ) WKX_1( JJ+I+1, K ) = Wv(i)*U( J_Jump+J, K_Jump+K, I ) ! - dv/dx WKX_2( JJ+I, K ) = -Wv(i)*V( J_Jump+J, K_Jump+K, I+1 ) WKX_2( JJ+I+1, K ) = Wv(i)*V( J_Jump+J, K_Jump+K, I ) ! - dw/dx WKX_3( JJ+I, K ) = -Wv(i)*W( J_Jump+J, K_Jump+K, I+1 ) WKX_3( JJ+I+1, K ) = Wv(i)*W( J_Jump+J, K_Jump+K, I ) End Do ! - Y derivatives. Do JA = 1, M2 Do I = 1, N1 WK_4( JJ+I, K ) = WK_4( JJ+I, K ) + RDY*DYGL(J,JA)*U( J_jump+JA, K_jump+K, I ) ! du/dy WK_5( JJ+I, K ) = WK_5( JJ+I, K ) + RDY*DYGL(J,JA)*V( J_jump+JA, K_jump+K, I ) ! dv/dy WK_6( JJ+I, K ) = WK_6( JJ+I, K ) + RDY*DYGL(J,JA)*W( J_jump+JA, K_jump+K, I ) ! dw/dy End Do End Do ! - Z derivatives. Do KA = 1, M3 Do I = 1, N1 WK_7( JJ+I, K ) = WK_7( JJ+I, K ) + RDZ*DZGL(K,KA)*U( J_jump+J, K_jump+KA, I ) ! du/dz WK_8( JJ+I, K ) = WK_8( JJ+I, K ) + RDZ*DZGL(K,KA)*V( J_jump+J, K_jump+KA, I ) ! dv/dz WK_9( JJ+I, K ) = WK_9( JJ+I, K ) + RDZ*DZGL(K,KA)*W( J_jump+J, K_jump+KA, I ) ! dw/dz End Do End Do End Do End Do ! eo single element loop. !$OMP END DO !$OMP END PARALLEL ... [other stuff] end do end do