- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have implemented a multithread option in my code. It runs and gives the right answer, but it is many times slower than the single-thread version. How much overhead is wasted in creating and exiting from threads? That seems to take quite a while. Are there any references I can use for guidance?
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When your loop iterations is relatively small and the work done is light weight
do i=1,100
a(i) = 0.0
end do
Then the overhead of threading the loop exceeds the runtime of the loop for one thread.
If you choose to use the auto-parallel feature, then add command line switches and/orcompiler dirrectives to control when and where parallelization is to occure.
It tends to be better to add parallization by way of OpenMP directives.
Jim Dermpsey
do i=1,100
a(i) = 0.0
end do
Then the overhead of threading the loop exceeds the runtime of the loop for one thread.
If you choose to use the auto-parallel feature, then add command line switches and/orcompiler dirrectives to control when and where parallelization is to occure.
It tends to be better to add parallization by way of OpenMP directives.
Jim Dermpsey
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The overhead of creating threads tends to be higher in Windows than on linux, for which many of the references you will find are written. Threading errors such as false sharing are more likely to produce symptoms such as you describe. Those can be difficult to diagnose when you aren't familiar with the application.
It's always a goal of analysis such as Parallel Studio to help in such diagnosis. You might also turn off your own threading and see whether /Qparallel with /Qpar-report at various levels gives you any clues about where it can and cannot parallelize. That option may perform some loop interchanges to accomplish its job.
It's always a goal of analysis such as Parallel Studio to help in such diagnosis. You might also turn off your own threading and see whether /Qparallel with /Qpar-report at various levels gives you any clues about where it can and cannot parallelize. That option may perform some loop interchanges to accomplish its job.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When your loop iterations is relatively small and the work done is light weight
do i=1,100
a(i) = 0.0
end do
Then the overhead of threading the loop exceeds the runtime of the loop for one thread.
If you choose to use the auto-parallel feature, then add command line switches and/orcompiler dirrectives to control when and where parallelization is to occure.
It tends to be better to add parallization by way of OpenMP directives.
Jim Dermpsey
do i=1,100
a(i) = 0.0
end do
Then the overhead of threading the loop exceeds the runtime of the loop for one thread.
If you choose to use the auto-parallel feature, then add command line switches and/orcompiler dirrectives to control when and where parallelization is to occure.
It tends to be better to add parallization by way of OpenMP directives.
Jim Dermpsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page