I have a big set of code with OMP4.0 directives (target, simd...)
In one module the compiler throws lot's of warnings about "loops not vectorized with simd" although it should.
I cut the code down to the bare minimum that still produces this behaviour:
SUBROUTINE simdTest IMPLICIT NONE INTEGER :: i, j, k, sr, tn,nzb,nzt,nxl,nxr,nys,nyn REAL :: s1, s2, s3, s4 REAL, DIMENSION(:,:,:), ALLOCATABLE :: u,v,pt,rmask,sums_l REAL, DIMENSION(:,:), ALLOCATABLE :: usws,vsws,shf !$omp parallel do schedule(runtime) private(s1,s2,s3) DO k = nzb, nzt+1 !$omp simd collapse( 2 ) reduction( +: s1, s2, s3 ) DO i = nxl, nxr DO j = nys, nyn s1 = s1 + u(k,j,i) * rmask(j,i,sr) s2 = s2 + v(k,j,i) * rmask(j,i,sr) s3 = s3 + pt(k,j,i) * rmask(j,i,sr) ENDDO ENDDO sums_l(k,1,tn) = s1 sums_l(k,2,tn) = s2 sums_l(k,4,tn) = s3 ENDDO !$omp parallel do reduction( +: s1, s2, s3, s4) schedule(runtime) DO i = nxl, nxr DO j = nys, nyn s1 = s1 + usws(j,i) * rmask(j,i,sr) s2 = s2 + vsws(j,i) * rmask(j,i,sr) s3 = s3 + shf(j,i) * rmask(j,i,sr) s4 = s4 + 0.0 ENDDO ENDDO sums_l(nzb,12,tn) = s1 sums_l(nzb,14,tn) = s2 sums_l(nzb,16,tn) = s3 END SUBROUTINE
If you compile this with "ifort -openmp -O2" it will warn about the first loop. If you remove literally anything (even from second loop) it will vectorize.
Message from vec-report is "subscript to complex".
Could you explain that? IMO not vectorizing the first loop would lead to significant performance loss.