I'm working on a predominantly serial (single thread) code with the aim of speeding it up via multithreading in cpu and gpu.
So far, I've been able to achieve significant speedups using OPENMP and a few strategically placed parallel do statements.
I'd like to next work with GPU as the bottlenecking loops seem to be easily distributable. I've been trying to find a solution for the last two weeks or so but to no avail. I've considered CUDA, OpenCL (fortran bindings), and OPENMP GPU offload but I'm not sure if these support my development environment and hardware (ifort and/or ifort beta, NVIDIA 1080ti, and Windows Server 2012).
I was wondering if anyone has a suggestion to achieve GPU parallelization for my case. I'd like to stay in Fortran primarily but can consider other languages if they are the only solution.
If you're using Intel compilers, then OpenCL is currently your only option. I have done this in the past (with a toy program.) At some point in the future, the oneAPI library should be able to do this as well, but that day is not here yet.
Thanks for the reply Steve.
Do you have that toy program or more information laying around somewhere? Did you use Fortran bindings for OpenCL and if so which one?
I've seen earlier posts from you referring to a forum post here but the link seems to be broken now.