- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm doing a project about image processing using openmp. I have a simple code as follows. The program ran smoothly on my linux platform with gcc4.3.3. But the program ran incredibly slow on xp platform(visual studio 2005 with Parallel studio 2011). After running some hotspot analysis, the bottleneck was SleepEx in kernel32.dll
any idea ?
unsigned char **a_data,
**b_data,
**c_data,
*p,
*p_a,
*p_b,
*p_c;
unsigned long nr,
nc;
nr = nc = 64;
a_data = (unsigned char **) malloc(nr*sizeof(unsigned char *));
p = (unsigned char *) malloc(nr*nc*sizeof(unsigned char));
for(int i=0; i {
a_data = p + i*nr;
}
b_data = (unsigned char **) malloc(nr*sizeof(unsigned char *));
p = (unsigned char *) malloc(nr*nc*sizeof(unsigned char));
for(int i=0; i {
b_data = p + i*nr;
}
c_data = (unsigned char **) malloc(nr*sizeof(unsigned char *));
p = (unsigned char *) malloc(nr*nc*sizeof(unsigned char));
for(int i=0; i {
c_data = p + i*nr;
}
for(int i=0; i {
p_a = a_data;
p_b = b_data;
p_c = c_data;
#pragma omp parallel for
for(int j=0; j {
p_a = p_b + p_c;
}
}
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your parallel for loop is too small to perform anything useful in a parallel manner. Theiteration space is nc=64 and the work performed is the addition of 2 char values.
If you enclosed the posted code into a subroutine, then timed many calls to this subroutine, then the preponderance of the time will be in the malloc (preceeding your loop).
Jim Dempsey
If you enclosed the posted code into a subroutine, then timed many calls to this subroutine, then the preponderance of the time will be in the malloc (preceeding your loop).
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page