I have some standard code that I OMPized (and MPIized, and GPUized and MICized and...) and figured I should try DO CONCURRENT as well. Now, my first naive attempt was to replace:
!$omp parallel do default(private) & !$omp shared(m,np,ict,icb,nb,overcast) & ... !$omp shared(caib, caif) RUN_LOOP: do i=1,m
RUN_LOOP: do concurrent (i=1:m)
Now, in doing so, the code does run, but it isn't parallel at all. I can setenv OMP_NUM_THREADS to 4 or 28 and no difference in speed.
This was compiling with -qopenmp. In my desire to make some effect, I tried using -qopenmp -parallel. Now, this definitely spawned threads, but it did so in a bad way: OMP_NUM_THREADS=1 took ~5 seconds, OMP_NUM_THREADS=4 took ~12 seconds.
So, is there a nice standard treatise/tutorial on how to take a code that works with OpenMP and convert to use DO CONCURRENT?
Note: I add that I did have to ensure the compiler all my subroutines were pure. Which I'm fairly certain they are (all nicely INTENTed and everything), but they are big subroutines...
In Intel's compiler, DO CONCURRENT does not parallelize unless -parallel is set. But there's no guarantee of parallelization even so - it depends on whether the compiler thinks it is safe and effective. Unlike with OpenMP, there is not (yet - coming in Fortran 2018) syntax to specify the locality of variables within the loop. That you have a number of shared variables makes me suspect that the compiler did not think parallelization was safe.
Thanks. That was our thought. We tried having fun with '-par-threshold=0' and other options, but it just never worked. I mean, it changed the optrpt, but nothing else.
As an aside, when 2018 is supported, what exactly will the spec look like? A colleague and I tried to parse the standard and we think:
DO CONCURRENT (i=1:m) LOCAL(x,y,z) LOCAL_INIT(q,r,s) SHARED(g,h,i)
but we aren't sure. I guess I need a tldr (https://github.com/tldr-pages/tldr) for the man page that is the Standard...which I guess are the Metcalf or Brainerd books. :)