Hi,
I compile the following small code with "-O3 -shared-intel" on three different clusters:
- cluster1: Intel(R) Xeon(R) CPU X5675 with ifort 12.1.0
- cluster2: Intel(R) Xeon(R) CPU X5650 with ifort 12.1.0
- cluster3: Intel(R) Xeon(R) CPU E5-2650 v2 with ifort 15.0.0
program main c implicit none integer jma, kma, ntstepmax integer na integer nfx,nfy,nfz real lnx,lny,lnz parameter (jma = 139, kma = 16) parameter (ntstepmax = 100) parameter (nfx = 1180, nfy = 8, nfz = 14) parameter (lnx = 590, lny = 4, lnz = 7) parameter (na = 1) c integer ntstep integer i,j,k,i2,j2,k2,l real a(na,-nfx:nfx,-nfy:nfy,-nfz:nfz) real xu(-nfx:nfx,-nfy+1:jma+nfy,-nfz+1:kma+nfz) real yu(1:jma+2,1:kma+2) c do ntstep = 1,ntstepmax c write (*,*) ' ........ ntstep = ',ntstep c l = 1 c do k = 1,kma do j = 1,jma yu(j+1,k+1) = 0.0 c do k2 = -nfz,nfz do j2 = -nfy,nfy do i2 = -nfx,nfx c$$$ do i2 = -nfx,nfx c$$$ do j2 = -nfy,nfy c$$$ do k2 = -nfz,nfz yu(j+1,k+1) = yu(j+1,k+1) +& xu(i2,j+j2,k+k2)*a(l,i2,j2,k2) enddo enddo enddo c enddo enddo c enddo c end
The results are quite strange:
- cluster1: 0m29s
- cluster2: 0m37s
- cluster3: 2m32s
Between cluster1 and cluster2 the difference of time is small and it could be linked to the difference of CPU frequency between both cluster.
But why is cluster3 so slow? The hardware is quite new (last year) in comparison with cluster1 and cluster2 (hardware and software from 2011).
Is it a problem with the code above? Or a problem of optimization?
any help or suggestions would be appreciated.
Best regards,
Guillaume De Nayer