Quantcast
Channel: Intel® Fortran Compiler
Viewing all articles
Browse latest Browse all 3270

optimization help/vectorization/SIMD questions

$
0
0

Hi,

Consider the following code snippet: 

    do i = 1, size(rhs,dim=2)
       if (n == biggest_int) exit !Overflow!
       n1 = n
       n = n + 1
       n1on = real(n1,WP)/real(n,WP)
       ! Add SIMD dir?
       !!!!DIR$ SIMD PRIVATE(p,k)
       do concurrent (j=1:size(lhs)) ! I'm nervous about p and k getting stepped on
          delta(j) = rhs(j,i) - local_res(j)%M(1)
          local_res(j)%M(1) = local_res(j)%M(1) + delta(j)/real(n,WP)
          !DIR$ LOOP COUNT (1,2,3,4,5)
          do p = local_res(j)%p,2,-1 !iterate backwards because new M4 depends on old M3,M2 etc.
             sum(j) = 0
             !DIR$ LOOP COUNT (0,1,2,3,4)
             do k = 1,p-2
                sum(j) = sum(j) + &
                     local_res(j)%binkp(k,p)*((-delta(j)/n)**k)*n1on*local_res(j)%M(p-k)
             end do
             local_res(j)%M(p) = n1on*local_res(j)%M(p) + sum(j) + &
                  ((delta(j)/n)**p)*(n1**p + n1*((-1)**p))/n
          end do
          local_res(j)%n   = n
          local_res(j)%min = min(lhs(j)%min,rhs(j,i))
          local_res(j)%max = max(lhs(j)%max,rhs(j,i))
       end do
    end do

Note that the outermost do loop has data dependencies… It is performing a data reduction operation over one dimension of rhs.

Iterations of the next do loop (expressed as do concurrent) may be performed in any order. My reading of ‘Modern Fortran Explained’ p. 360:

 any variable referenced is either previously defined in the same iteration, or its value is

not affected by any other iteration;

To me this means that the loop indices contained within this `do concurrent` loop, p, and k, are *NOT* in danger of getting stepped on, since they are “previously defined in the same iteration” of the current `do concurrent` loop. Is this correct?

The loops *inside* the `do concurrent` are order dependent. The loop over p moves backwards, because we are updating values of `local_res(j)%M(p)` using the old (relative to the iteration over p) values of `local_res(j)%M(2)` … `local_res(j)%M(p-1)` for the update.

The loop over k has a data dependency because it is a sum reduction. The sum may be performed in any order, so long as two iterations don’t try to write the sum at once.

Typically the upper bound of the `do concurrent` loop will be quite large, so my idea was to try to do some optimizations on this loop, such as SIMD or vectorization. Am I on the right track here?

If the directive on line 7 is *not* commented out (by removing 3 of 4 !s) then the compiler gives an error: 

error #8592: Within a SIMD region, a DO-loop control-variable must not be specified in a PRIVATE/REDUCTION/FIRSTPRIVATE/LASTPRIVATE SIMD clause.   [P]
          do p = local_res(j)%p,2,-1 !iterate backwards because new M4 depends on old M3,M2 etc.
-------------^

Does this mean I don’t need the private clause and that p and k are safe from getting stepped on, or does this mean that the directive is telling the compiler that loops over p and k have no data dependencies?

Any and all advice is greatly welcome.

Thanks!


Viewing all articles
Browse latest Browse all 3270

Trending Articles