Quantcast
Channel: Intel® Fortran Compiler
Viewing all articles
Browse latest Browse all 3270

Mixed-programming with CUDA C to create DLL for Excel

$
0
0

Hello All,

In the past, I have successfully created Fortran DLLs with OpenMP for use with Excel VBA. However, I would now like to integrate some CUDA C GPU code.  I am trying to use the Fortran 2003 C interoperability features to make Intel Fortran talk to CUDA C.  I have been able to create an executable which shows the expected behavior.  However, when I compile it as a DLL and use inside Excel, it crashes without warning.  There is no diagnostic information whatsoever.  If anyone has observed this behavior and found a workaround, I would be glad to get any kind of help.  My development configuration and test code are as follows.

Thanks in advance,

Sam V

 

Build setup: Win 6 x64; Microsoft Excel 2010 VBA; Intel Composer XE 2013 IA-32 with Visual Studio 2008; NVIDIA CUDA C v5.5

Example code:

Fortran code (excelcuda.f90)
uncommenting/commenting relevant lines for compilation as an executable)

!program main
!implicit none
!real*4::xx(4),yy(4)
!xx=1.D0
!yy=2.D0
!write(*,*) xx, yy
!call myarrtest(xx,yy,4)
!write(*,*) xx, yy
!end program


subroutine myarrtest(arrin,arrout,sz1)

!DEC$ ATTRIBUTES DLLEXPORT,STDCALL,REFERENCE,DECORATE,ALIAS:'myarrtest'::myarrtest
!DEC$ ATTRIBUTES REFERENCE::arrin,arrout,sz1

USE, INTRINSIC :: ISO_C_BINDING
implicit none

INTERFACE
    SUBROUTINE kernel_wrapper (flt_a, flt_b, int_n) BIND(C)
    IMPORT
    INTEGER(C_INT), INTENT(IN) :: int_n
    REAL(C_FLOAT), INTENT(IN) :: flt_a(int_n), flt_b(int_n)
    END SUBROUTINE kernel_wrapper
END INTERFACE

integer*4::i
integer*4,intent(in)::sz1
real*4,dimension(sz1),intent(in)::arrin
real*4,dimension(sz1),intent(out)::arrout

!do i=1,sz1
!arrout(i)=arrin(i)+arrout(i)
!end do

CALL kernel_wrapper(arrout, arrin, sz1)

end subroutine

CUDA C kernel (cudakernel.cu)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cuda.h>
#include <cuda_runtime.h>


// simple kernel function that adds two vectors
__global__ void vect_add(float *a, float *b, int N)
{
   int idx = threadIdx.x;
   if (idx<N) a[idx] = a[idx] + b[idx];
}

// function called from main fortran program
extern "C" void kernel_wrapper(float *a, float *b, int *Np)
{
   float  *a_d, *b_d;  // declare GPU vector copies

   int blocks = 1;     // uses 1 block of
   int N = *Np;        // N threads on GPU

   // Allocate memory on GPU
   cudaMalloc( (void **)&a_d, sizeof(float) * N );
   cudaMalloc( (void **)&b_d, sizeof(float) * N );

   // copy vectors from CPU to GPU
   cudaMemcpy( a_d, a, sizeof(float) * N, cudaMemcpyHostToDevice );
   cudaMemcpy( b_d, b, sizeof(float) * N, cudaMemcpyHostToDevice );

   // call function on GPU
   vect_add<<< blocks, N >>>( a_d, b_d, N);

   // copy vectors back from GPU to CPU
   cudaMemcpy( a, a_d, sizeof(float) * N, cudaMemcpyDeviceToHost );
   cudaMemcpy( b, b_d, sizeof(float) * N, cudaMemcpyDeviceToHost );

   // free GPU memory
   cudaFree(a_d);
   cudaFree(b_d);
   return;
}

 

The above pieces of code was compiled using the following commands

nvcc -c -m32 -O3 cudakernel.cu
ifort -dll -libs:dll -iface:stdcall excelcuda.f90 cudakernal.obj cuda.lib cudart.lib 

The resulting DLL is used within Excel VBA using the following statements

Declare Sub myarrtest Lib "excelcuda.dll" (ByRef x As Single, ByRef y As Single, ByRef n As Long)
...
...
Call myarrtest(vbarr(1), fortarr(1), n1)
...
...

 


Viewing all articles
Browse latest Browse all 3270

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>