- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
I used openmp for my outer do loops. Can I apply also openmp to my inner loops the same way or openmp is limited to outer loops only. How can I parallelize my inner loops without affecting the openmp of the outer loops.
Appreciate the advise on this matter.
Best regards.
I used openmp for my outer do loops. Can I apply also openmp to my inner loops the same way or openmp is limited to outer loops only. How can I parallelize my inner loops without affecting the openmp of the outer loops.
Appreciate the advise on this matter.
Best regards.
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ash1
I used openmp for my outer do loops. Can I apply also openmp to my inner loops the same way or openmp is limited to outer loops only. How can I parallelize my inner loops without affecting the openmp of the outer loops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
This is possible, with the support engaged by OMP_NESTED. The reasons for doing it would be unusual and specialized. The more usual way of parallelizing inner loop is with vectorization but not threading.
Is it possible to use vectorization of inner loops in conjunction with openmp of outer loops and how. The best speed improvement I got by utilizing openmp on outer loop was twice the speed. What can I do to obtain further speed and what is the limit.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - ash1
Is it possible to use vectorization of inner loops in conjunction with openmp of outer loops and how. The best speed improvement I got by utilizing openmp on outer loop was twice the speed. What can I do to obtain further speed and what is the limit.
Thank you.
The compiler has little freedom under -openmp to swap loop nest levels. If it succeeds in auto-vectorization when -openmp is not set, but not with -openmp, you would look for ways to optimize the source and gain the combined optimizations.
Memory bandwidth often sets a limit on performance. When that is not a factor, you may consider a goal of linear speedup according to number of cores and the width of the vector parallel instructions (4 for single, 2 for double precision), when the loop lengths are suitable. If memory bandwidth is a factor, performance will depend strongly on minimizing extra data movement.
The compiler targets loop length 100 for vectorization, unless sufficient information on length is present in the source code. Outer loop lengths of several hundred may give best OpenMP speedup with vectorized inner loops.
As you will see in my examples http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors you may require the OpenMP if clause to restrict threaded parallel to the case where the outer loop is sufficiently long.
For the -parallel option, the compiler assumes a small outer loop trip count, which typically induces it not to parallelize when a vectorizable inner loop is assumed of length 100.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page