Hi,
consider the following code
SUBROUTINE FOO(lower, upper, a1, a2, a3) IMPLICIT NONE INTEGER(4) :: lower, upper, a1, a2, a3 INTEGER(4) :: idx idx = lower DO WHILE (idx <= upper) SELECT CASE (idx) CASE (0) a1 = a1 + 2 CASE (1) a2 = a2 + 4 CASE (2) a3 = a3 + 6 END SELECT idx = idx + 1 END DO END SUBROUTINE FOO
compiled with intel fortran 14.0.2 or 15.0.0 using the following command:
ifort -c -fopenmp -O2 test.f90
the compiler emits an extra store on the location of the varable a3. We can confirm this upon examination of the assembly code.
ifort -S -fopenmp -O2 test.f90
this generates a
test.s
which contains the following assembly code
foo_: # parameter 1: %rdi # parameter 2: %rsi # parameter 3: %rdx # parameter 4: %rcx # parameter 5: %r8 ..B1.1: # Preds ..B1.0 movslq (%rdi), %r11 #7.5 movslq (%rsi), %rax #8.5 xorl %esi, %esi #8.5 cmpq %rax, %r11 #8.19 jg ..B1.10 # Prob 9% #8.19 # LOE rax rdx rcx rbx rbp rsi r8 r11 r12 r13 r14 r15 ..B1.2: # Preds ..B1.1 subq %r11, %rax #8.5 movl (%r8), %edi #15.11 incq %rax #8.5 movl (%rcx), %r9d #13.11 movl (%rdx), %r10d #11.11 movq %rbx, -24(%rsp) #8.5 movq %rbp, -16(%rsp) #8.5 # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d ..B1.3: # Preds ..B1.8 ..B1.2 movq %r11, %rbx #17.7 addq %rsi, %rbx #17.7 je ..B1.7 # Prob 25% #9.20 # LOE rax rdx rcx rbx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d ..B1.4: # Preds ..B1.3 cmpq $1, %rbx #9.20 jne ..B1.6 # Prob 66% #9.20 # LOE rax rdx rcx rbx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d ..B1.5: # Preds ..B1.4 addl $4, %r9d #13.11 movl %r9d, (%rcx) #13.11 jmp ..B1.8 # Prob 100% #13.11 # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d ..B1.6: # Preds ..B1.4 cmpq $2, %rbx #15.11 lea 6(%rdi), %ebp #15.11 cmove %ebp, %edi #15.11 #### <---- ISSUE HERE jmp ..B1.8 # Prob 100% #15.11 # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d ..B1.7: # Preds ..B1.3 addl $2, %r10d #11.11 movl %r10d, (%rdx) #11.11 # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d ..B1.8: # Preds ..B1.7 ..B1.5 ..B1.6 incq %rsi #8.5 cmpq %rax, %rsi #8.5 jb ..B1.3 # Prob 82% #8.5 # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d ..B1.9: # Preds ..B1.8 movq -24(%rsp), %rbx # movq -16(%rsp), %rbp # movl %edi, (%r8) #15.11 #### <---- ISSUE HERE # LOE rbx rbp r12 r13 r14 r15 ..B1.10: # Preds ..B1.1 ..B1.9 ret #19.1
As you can see in block B1.6 (line 40 above) there is a conditional move (cmove) which implements the branch for the CASE(2) in the Fortran code. This conditional move cannot directly store a 32-bit value to a 64-bit address so it keeps in another register instead. This temporary register is finally stored in memory at the end of the function, in block B1.9 (line 55 above). Unfortunately this store is always executed when the loop is non-empty
When this code is run in a multiple thread environment, this extra store causes a data-race: imagine that we have 3 threads and each thread executes one different branch. If the thread that runs CASE(2) is not the last, threads executing CASE(0) and CASE(1) will overwrite the value of A3.
It is interesting to note that, changing the loop above from
DO WHILE( ... )
to
DO idx = lower, upper
does not expose this issue. Also, this problem does not happen if we use -O1 rather than -O2. Flag -common-args seems also to avoid this problem (although our code does not expose aliasing in the dummy arguments), I guess this is by chance.
Given that Fortran does not make any assumption on the memory model, the emitted code is probably legal (as it is from a single-thread point of view), is there a way to ensure that Intel Fortran does not make this sort of extra stores in memory?
Kind regards