Quantcast
Channel: Intel® Fortran Compiler
Viewing all articles
Browse latest Browse all 3270

Extra stores emitted by the compiler and data-races

$
0
0

Hi,

consider the following code

SUBROUTINE FOO(lower, upper, a1, a2, a3)
    IMPLICIT NONE
    INTEGER(4) :: lower, upper, a1, a2, a3
    INTEGER(4) :: idx

    idx = lower
    DO WHILE (idx <= upper)
      SELECT CASE (idx)
        CASE (0)
          a1 = a1 + 2
        CASE (1)
          a2 = a2 + 4
        CASE (2)
          a3 = a3 + 6
      END SELECT
      idx = idx + 1
    END DO
END SUBROUTINE FOO

compiled with intel fortran 14.0.2 or 15.0.0 using the following command:

ifort -c -fopenmp -O2 test.f90

the compiler emits an extra store on the location of the varable a3. We can confirm this upon examination of the assembly code.

ifort -S -fopenmp -O2 test.f90

this generates a

test.s

which contains the following assembly code

 

foo_:
# parameter 1: %rdi
# parameter 2: %rsi
# parameter 3: %rdx
# parameter 4: %rcx
# parameter 5: %r8
..B1.1:                         # Preds ..B1.0
        movslq    (%rdi), %r11                                  #7.5
        movslq    (%rsi), %rax                                  #8.5
        xorl      %esi, %esi                                    #8.5
        cmpq      %rax, %r11                                    #8.19
        jg        ..B1.10       # Prob 9%                       #8.19
                                # LOE rax rdx rcx rbx rbp rsi r8 r11 r12 r13 r14 r15
..B1.2:                         # Preds ..B1.1
        subq      %r11, %rax                                    #8.5
        movl      (%r8), %edi                                   #15.11
        incq      %rax                                          #8.5
        movl      (%rcx), %r9d                                  #13.11
        movl      (%rdx), %r10d                                 #11.11
        movq      %rbx, -24(%rsp)                               #8.5
        movq      %rbp, -16(%rsp)                               #8.5
                                # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d
..B1.3:                         # Preds ..B1.8 ..B1.2
        movq      %r11, %rbx                                    #17.7
        addq      %rsi, %rbx                                    #17.7
        je        ..B1.7        # Prob 25%                      #9.20
                                # LOE rax rdx rcx rbx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d
..B1.4:                         # Preds ..B1.3
        cmpq      $1, %rbx                                      #9.20
        jne       ..B1.6        # Prob 66%                      #9.20
                                # LOE rax rdx rcx rbx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d
..B1.5:                         # Preds ..B1.4
        addl      $4, %r9d                                      #13.11
        movl      %r9d, (%rcx)                                  #13.11
        jmp       ..B1.8        # Prob 100%                     #13.11
                                # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d
..B1.6:                         # Preds ..B1.4
        cmpq      $2, %rbx                                      #15.11
        lea       6(%rdi), %ebp                                 #15.11
        cmove     %ebp, %edi                                   #15.11 #### <---- ISSUE HERE
        jmp       ..B1.8        # Prob 100%                     #15.11
                                # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d
..B1.7:                         # Preds ..B1.3
        addl      $2, %r10d                                     #11.11
        movl      %r10d, (%rdx)                                 #11.11
                                # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d
..B1.8:                         # Preds ..B1.7 ..B1.5 ..B1.6
        incq      %rsi                                          #8.5
        cmpq      %rax, %rsi                                    #8.5
        jb        ..B1.3        # Prob 82%                      #8.5
                                # LOE rax rdx rcx rsi r8 r11 r12 r13 r14 r15 edi r9d r10d
..B1.9:                         # Preds ..B1.8
        movq      -24(%rsp), %rbx                               #
        movq      -16(%rsp), %rbp                               #
        movl      %edi, (%r8)                                   #15.11 #### <---- ISSUE HERE
                                # LOE rbx rbp r12 r13 r14 r15
..B1.10:                        # Preds ..B1.1 ..B1.9
        ret                                                     #19.1

As you can see in block B1.6 (line 40 above) there is a conditional move (cmove) which implements the branch for the CASE(2) in the Fortran code. This conditional move cannot directly store a 32-bit value to a 64-bit address so it keeps in another register instead. This temporary register is finally stored in memory at the end of the function, in block B1.9 (line 55 above). Unfortunately this store is always executed when the loop is non-empty

When this code is run in a multiple thread environment, this extra store causes a data-race: imagine that we have 3 threads and each thread executes one different branch. If the thread that runs CASE(2) is not the last, threads executing CASE(0) and CASE(1) will overwrite the value of A3.

It is interesting to note that, changing the loop above from

DO WHILE( ... )

to

DO idx = lower, upper

does not expose this issue. Also, this problem does not happen if we use -O1 rather than -O2. Flag -common-args seems also to avoid this problem (although our code does not expose aliasing in the dummy arguments), I guess this is by chance.

 

Given that Fortran does not make any assumption on the memory model, the emitted code is probably legal (as it is from a single-thread point of view), is there a way to ensure that Intel Fortran does not make this sort of extra stores in memory?

Kind regards


Viewing all articles
Browse latest Browse all 3270

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>