- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All,
I have some standard code that I OMPized (and MPIized, and GPUized and MICized and...) and figured I should try DO CONCURRENT as well. Now, my first naive attempt was to replace:
!$omp parallel do default(private) & !$omp shared(m,np,ict,icb,nb,overcast) & ... !$omp shared(caib, caif) RUN_LOOP: do i=1,m
with:
RUN_LOOP: do concurrent (i=1:m)
Now, in doing so, the code does run, but it isn't parallel at all. I can setenv OMP_NUM_THREADS to 4 or 28 and no difference in speed.
This was compiling with -qopenmp. In my desire to make some effect, I tried using -qopenmp -parallel. Now, this definitely spawned threads, but it did so in a bad way: OMP_NUM_THREADS=1 took ~5 seconds, OMP_NUM_THREADS=4 took ~12 seconds.
So, is there a nice standard treatise/tutorial on how to take a code that works with OpenMP and convert to use DO CONCURRENT?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note: I add that I did have to ensure the compiler all my subroutines were pure. Which I'm fairly certain they are (all nicely INTENTed and everything), but they are big subroutines...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In Intel's compiler, DO CONCURRENT does not parallelize unless -parallel is set. But there's no guarantee of parallelization even so - it depends on whether the compiler thinks it is safe and effective. Unlike with OpenMP, there is not (yet - coming in Fortran 2018) syntax to specify the locality of variables within the loop. That you have a number of shared variables makes me suspect that the compiler did not think parallelization was safe.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
Thanks. That was our thought. We tried having fun with '-par-threshold=0' and other options, but it just never worked. I mean, it changed the optrpt, but nothing else.
As an aside, when 2018 is supported, what exactly will the spec look like? A colleague and I tried to parse the standard and we think:
DO CONCURRENT (i=1:m) LOCAL(x,y,z) LOCAL_INIT(q,r,s) SHARED(g,h,i)
but we aren't sure. I guess I need a tldr (https://github.com/tldr-pages/tldr) for the man page that is the Standard...which I guess are the Metcalf or Brainerd books. :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that's the correct syntax.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page