- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
If one process launches two threads (A1 and A2), can one of the threads (say A2) launches 8 threads (B1, ... B8) again such that the total 9 threads running in parallel?
currently, my simple testing codes show that executing A1, finishing it, then executing A2( launches 8 threads) is much faster than launch A1 and A2 simultaneously. But i am not sure my codes use the correct ways or not and how to use nested omp efficiently.
thanks,
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How many hardware threads are available on your system?
Can you provide a code sketch or sample program?
Are you timing the 1st time performing the nested calls or multiple times?
(discard the 1st time, average or pick smallest of next 5 times).
Jim Dempsey
Can you provide a code sketch or sample program?
Are you timing the 1st time performing the nested calls or multiple times?
(discard the 1st time, average or pick smallest of next 5 times).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have four core cpu. The testing codes is like this:
sequential call:
---------------------------------------------------------------------------------------
double start=omp_get_wtime();
myHeavyFunction();
omp_set_num_threads(4);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
func();
if(thread_id==1)
func();
if(thread_id==2)
func();
if(thread_id==3)
func();
}
printf("test time is %e\n",finish-start);
---------------------------------------------------------------------------------------
nested call:
---------------------------------------------------------------------------------------
double start=omp_get_wtime();
omp_set_num_threads(2);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
myHeavyFunc();
if(thread_id==1){
omp_set_num_threads(4);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
func();
if(thread_id==1)
func();
if(thread_id==2)
func();
if(thread_id==3)
func();
}
}
}
printf("test time is %e\n",finish-start);
---------------------------------------------------------------------------------------
the sequential call is faster based on my timing provided in the above pseudocode. But when I use 3 threads in the inner omp region of nested call, it gets faster as expected (suppose that four cores occpuied by four threads is the best case).
In my real codes, in fact, myHeavyFunc() is doing nothing but just launch GPU kernel. So although it is "heavy", the work is done on the GPU side. That thread is supposed not occupy any cpu rescource. I dont know whether the OS will put that thread in the pool but allocate the hardware resources to other CPU computing threads.
hope this can give you a rough idea what i am doing. thanks for the help!
I have four core cpu. The testing codes is like this:
sequential call:
---------------------------------------------------------------------------------------
double start=omp_get_wtime();
myHeavyFunction();
omp_set_num_threads(4);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
func();
if(thread_id==1)
func();
if(thread_id==2)
func();
if(thread_id==3)
func();
}
printf("test time is %e\n",finish-start);
---------------------------------------------------------------------------------------
nested call:
---------------------------------------------------------------------------------------
double start=omp_get_wtime();
omp_set_num_threads(2);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
myHeavyFunc();
if(thread_id==1){
omp_set_num_threads(4);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
func();
if(thread_id==1)
func();
if(thread_id==2)
func();
if(thread_id==3)
func();
}
}
}
printf("test time is %e\n",finish-start);
---------------------------------------------------------------------------------------
the sequential call is faster based on my timing provided in the above pseudocode. But when I use 3 threads in the inner omp region of nested call, it gets faster as expected (suppose that four cores occpuied by four threads is the best case).
In my real codes, in fact, myHeavyFunc() is doing nothing but just launch GPU kernel. So although it is "heavy", the work is done on the GPU side. That thread is supposed not occupy any cpu rescource. I dont know whether the OS will put that thread in the pool but allocate the hardware resources to other CPU computing threads.
hope this can give you a rough idea what i am doing. thanks for the help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Two things:
1) how many cores (or HT hw threads)are on your system?
2) add in front of your timed section of code
---------------------------------------------------------------------------------------
omp_set_num_threads(2);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
doNothing();
if(thread_id==1){
omp_set_num_threads(4);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
doNothing();
if(thread_id==1)
doNothing();
if(thread_id==2)
doNothing();
if(thread_id==3)
doNothing();
}
}
}
---------------------------------------------------------------------
Now run your timed section of code.
The next thing to do is to time each thread, use an array, be wary of reuse of thread_id.
Jim Dempsey
1) how many cores (or HT hw threads)are on your system?
2) add in front of your timed section of code
---------------------------------------------------------------------------------------
omp_set_num_threads(2);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
doNothing();
if(thread_id==1){
omp_set_num_threads(4);
#pragma omp parallel
{
unsigned int thread_id = omp_get_thread_num();
if(thread_id==0)
doNothing();
if(thread_id==1)
doNothing();
if(thread_id==2)
doNothing();
if(thread_id==3)
doNothing();
}
}
}
---------------------------------------------------------------------
Now run your timed section of code.
The next thing to do is to time each thread, use an array, be wary of reuse of thread_id.
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page