- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
In the past, I have successfully created Fortran DLLs with OpenMP for use with Excel VBA. However, I would now like to integrate some CUDA C GPU code. I am trying to use the Fortran 2003 C interoperability features to make Intel Fortran talk to CUDA C. I have been able to create an executable which shows the expected behavior. However, when I compile it as a DLL and use inside Excel, it crashes without warning. There is no diagnostic information whatsoever. If anyone has observed this behavior and found a workaround, I would be glad to get any kind of help. My development configuration and test code are as follows.
Thanks in advance,
Sam V
Build setup: Win 6 x64; Microsoft Excel 2010 VBA; Intel Composer XE 2013 IA-32 with Visual Studio 2008; NVIDIA CUDA C v5.5
Example code:
Fortran code (excelcuda.f90)
uncommenting/commenting relevant lines for compilation as an executable)
!program main !implicit none !real*4::xx(4),yy(4) !xx=1.D0 !yy=2.D0 !write(*,*) xx, yy !call myarrtest(xx,yy,4) !write(*,*) xx, yy !end program subroutine myarrtest(arrin,arrout,sz1) !DEC$ ATTRIBUTES DLLEXPORT,STDCALL,REFERENCE,DECORATE,ALIAS:'myarrtest'::myarrtest !DEC$ ATTRIBUTES REFERENCE::arrin,arrout,sz1 USE, INTRINSIC :: ISO_C_BINDING implicit none INTERFACE SUBROUTINE kernel_wrapper (flt_a, flt_b, int_n) BIND(C) IMPORT INTEGER(C_INT), INTENT(IN) :: int_n REAL(C_FLOAT), INTENT(IN) :: flt_a(int_n), flt_b(int_n) END SUBROUTINE kernel_wrapper END INTERFACE integer*4::i integer*4,intent(in)::sz1 real*4,dimension(sz1),intent(in)::arrin real*4,dimension(sz1),intent(out)::arrout !do i=1,sz1 !arrout(i)=arrin(i)+arrout(i) !end do CALL kernel_wrapper(arrout, arrin, sz1) end subroutine
CUDA C kernel (cudakernel.cu)
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <cuda.h> #include <cuda_runtime.h> // simple kernel function that adds two vectors __global__ void vect_add(float *a, float *b, int N) { int idx = threadIdx.x; if (idx<N) a[idx] = a[idx] + b[idx]; } // function called from main fortran program extern "C" void kernel_wrapper(float *a, float *b, int *Np) { float *a_d, *b_d; // declare GPU vector copies int blocks = 1; // uses 1 block of int N = *Np; // N threads on GPU // Allocate memory on GPU cudaMalloc( (void **)&a_d, sizeof(float) * N ); cudaMalloc( (void **)&b_d, sizeof(float) * N ); // copy vectors from CPU to GPU cudaMemcpy( a_d, a, sizeof(float) * N, cudaMemcpyHostToDevice ); cudaMemcpy( b_d, b, sizeof(float) * N, cudaMemcpyHostToDevice ); // call function on GPU vect_add<<< blocks, N >>>( a_d, b_d, N); // copy vectors back from GPU to CPU cudaMemcpy( a, a_d, sizeof(float) * N, cudaMemcpyDeviceToHost ); cudaMemcpy( b, b_d, sizeof(float) * N, cudaMemcpyDeviceToHost ); // free GPU memory cudaFree(a_d); cudaFree(b_d); return; }
The above pieces of code was compiled using the following commands
nvcc -c -m32 -O3 cudakernel.cu ifort -dll -libs:dll -iface:stdcall excelcuda.f90 cudakernal.obj cuda.lib cudart.lib
The resulting DLL is used within Excel VBA using the following statements
Declare Sub myarrtest Lib "excelcuda.dll" (ByRef x As Single, ByRef y As Single, ByRef n As Long) ... ... Call myarrtest(vbarr(1), fortarr(1), n1) ... ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You don't want to use -iface stdcall - you have the ATTRIBUTES for the DLL routine and that is sufficient.
You can debug this by specifying Excel as the program to run for your DLL project under Debugging and set a breakpoint at your DLL routine. This may give you a clue as to where the problem occurs. You may also want to create an executable that links to the DLL (specifying STDCALL for the DLL routine) and see how that works.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You don't want to use -iface stdcall - you have the ATTRIBUTES for the DLL routine and that is sufficient.
You can debug this by specifying Excel as the program to run for your DLL project under Debugging and set a breakpoint at your DLL routine. This may give you a clue as to where the problem occurs. You may also want to create an executable that links to the DLL (specifying STDCALL for the DLL routine) and see how that works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Steve,
Thanks for your prompt response. Removing the -iface:stdcall option fixed the issue. This comes as a little surprise, as I was able to compile a DLL using Fortran+C (without CUDA) and there was not issue at all. Also, if I compiled an EXE, there was no problem. Only when I try to compile the DLL with the -iface:stdcall (in addition to the attribute), does Excel crash. Nonetheless, the issue is solved. I am able to sucessfully integrate the CUDA routines with Excel now - thanks to you.
On a different note, I tried debugging by calling excel as an exe (devenv /debugexe <path-to-excel.exe> <workbook.xlsm>) - however it appear that once cannot debug the CUDA side without NVIDIA Nsight plug-in in Visual Studio. But I got the general idea. Thanks for the tip.
Regards,
Sam

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page