- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I just started to use openmp for my code. but I can't understand the meaning of thread number.
As far as I know, the uni-processor has only one thread. however, when I tested my code varying the OMP_NUM_THREADS from 1 to 64, I can see the computational time gets smaller or sometimes larger. What's the relationship between the physical number of thread in processor and the number of OMP_NUM_THREADS ?
One more question is a little bit funny situation. When I comparedtheelasped time of openmp-codewith OMP_NUM_THREADS=64 in both uni-processor and dual-processor, surprisingly the time of uni-processor is shorter than dual-processor!!! Isn't there any person that may give me the explanation ? ( I used dual-processor linux-box for testing. For uni-processor, I rebooted my linux box without SMP kernel and tested, and for dual-processor I chose the SMP kernel and tested)
I need your help and your help will be greatly appreciated.
Thanks.
Isaac
- Tags:
- Parallel Computing
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you set OMP_NUM_THREADS, you over-ride any relationship between the number of processors and the number of threads chosen by OpenMP. If you set more threads than processors, you have threads waiting for an opportunity to run. You would expect performance to degrade when you choose too many threads.
If you do any reading about threading or OpenMP, you will see there are several common problems which could block speedup by threaded parallelism. If it were not so, there wouldn't be justification for supporting tools such as Intel Thread Profiler.
OpenMP compilers generally don't pay any attention to false sharing, even in simple cases where it might be possible to diagnose. On modern processors which have significant Instruction Level Parallelism when running a single thread, you could easily find ways to defeat ILP with threaded parallelism. These ideas hardly make a dent in the list of possibilities.
If you do any reading about threading or OpenMP, you will see there are several common problems which could block speedup by threaded parallelism. If it were not so, there wouldn't be justification for supporting tools such as Intel Thread Profiler.
OpenMP compilers generally don't pay any attention to false sharing, even in simple cases where it might be possible to diagnose. On modern processors which have significant Instruction Level Parallelism when running a single thread, you could easily find ways to defeat ILP with threaded parallelism. These ideas hardly make a dent in the list of possibilities.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page