Hi,
I have a question about AVX instruction. I compiled my code using ifort 13 with -O2 and -xHost. I want to enable 256-bit wide AVX to perform four 64-bit floating point operations per cycle.
Here is my first code piece:
623 !DIR$ SIMD 624 do ii = 1, Nc 625 ! diagonal components first 626 StrnRt(ii,1) = JAC(ii) * ( & 627 MT1(ii,1) * VelGrad1st(ii,1) & 628 + MT1(ii,2) * VelGrad1st(ii,3) ) ... 640 end do
The assembly files show that the following instructions were generated for line 627:
vmulsd 8(%r8,%r14,8), %xmm1, %xmm3 #627.38 vmulpd %xmm6, %xmm5, %xmm11 #627.38 vmulpd %ymm5, %ymm4, %ymm10 #627.38
I understand why I got vmulsd. My question is why vmulpd %xmm6, %xmm5, %xmm11 was generated and what does it stand for? I think vmulpd should be an AVX instruction and should use ymm to have 256-bit wide vectorization.
For the second code piece:
643 !DIR$ SIMD 644 do ii = 1, Nc 645 ! diagonal components first 646 StrnRt(ii,1) = JAC(ii) * ( & 647 MT1(ii,1) * VelGrad1st(ii,1) & 648 + MT1(ii,2) * VelGrad1st(ii,4) & 649 + MT1(ii,3) * VelGrad1st(ii,7) ) ... 685 end do
The assembly files show that the following instructions were generated for line 647:
vmulsd (%r12), %xmm4, %xmm6 #647.38 vmulpd %xmm11, %xmm10, %xmm0 #647.38
Here again I got vmulpd with xmm. I even did NOT get vmulpd with ymm. I am worrying that this code piece is only performing two 64-bit floating point operations per cycle, rather than four.
I truly appreciate your help.
Best regards,
Wentao