- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I currently try to paralelize a old sequential project. This project is made of two static libraries and one main program. Using vtunes I managed to found the costly loops so I decided to go for the openmp solution. I added several omp directives (all are omp parallel do) and I compiled my project with /Qopenmp option and ran it with OMP_NUM_THREADS=1, I would have expected a small overhead but it actaully reached 30%. Using vtune I found that most of the time the libiomp is doing kmp_fork_call. I thought the omp threads were creating only once for the all program execution but my measures say the opposite. Anyone can explain this? and even better, tell me how to solve it. I hope some compiler option can solve my problem.
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - piv-fr
I currently try to paralelize a old sequential project. This project is made of two static libraries and one main program. Using vtunes I managed to found the costly loops so I decided to go for the openmp solution. I added several omp directives (all are omp parallel do) and I compiled my project with /Qopenmp option and ran it with OMP_NUM_THREADS=1, I would have expected a small overhead but it actaully reached 30%. Using vtune I found that most of the time the libiomp is doing kmp_fork_call. I thought the omp threads were creating only once for the all program execution but my measures say the opposite. Anyone can explain this? and even better, tell me how to solve it. I hope some compiler option can solve my problem.
Yes, the OpenMP threads are created once and live in a thread pool for the life of the program. In fact, there is a "hot team" of threads that actively spin, ready for dispatch at the next (non-nested) parallel region.
Environment variable KMP_BLOCKTIME controls how long threads will spin before sleeping. Try setting this to a large value, or even to 'infinite' for an unlimitedspin time, for example, on Windows*:
set KMP_BLOCKTIME=infinite
The variable defaults to 200 ms, and if the value is too small, kmp_fork_call() will definitely show up in the Vtune profile.
Patrick Kennedy
Intel Developer Support

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page