Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

openmp generates large overhead in kernel32.dll(SleepEx)

intelbenz
Beginner
488 Views

I'm doing a project about image processing using openmp. I have a simple code as follows. The program ran smoothly on my linux platform with gcc4.3.3. But the program ran incredibly slow on xp platform(visual studio 2005 with Parallel studio 2011). After running some hotspot analysis, the bottleneck was SleepEx in kernel32.dll

any idea ?




unsigned char **a_data,
**b_data,
**c_data,
*p,
*p_a,
*p_b,
*p_c;
unsigned long nr,
nc;
nr = nc = 64;

a_data = (unsigned char **) malloc(nr*sizeof(unsigned char *));
p = (unsigned char *) malloc(nr*nc*sizeof(unsigned char));
for(int i=0; i{
a_data = p + i*nr;
}
b_data = (unsigned char **) malloc(nr*sizeof(unsigned char *));
p = (unsigned char *) malloc(nr*nc*sizeof(unsigned char));
for(int i=0; i{
b_data = p + i*nr;
}
c_data = (unsigned char **) malloc(nr*sizeof(unsigned char *));
p = (unsigned char *) malloc(nr*nc*sizeof(unsigned char));
for(int i=0; i{
c_data = p + i*nr;
}

for(int i=0; i{
p_a = a_data;
p_b = b_data;
p_c = c_data;
#pragma omp parallel for
for(int j=0; j {
p_a = p_b + p_c;
}
}
0 Kudos
1 Reply
jimdempseyatthecove
Honored Contributor III
488 Views
Your parallel for loop is too small to perform anything useful in a parallel manner. Theiteration space is nc=64 and the work performed is the addition of 2 char values.

If you enclosed the posted code into a subroutine, then timed many calls to this subroutine, then the preponderance of the time will be in the malloc (preceeding your loop).

Jim Dempsey
0 Kudos
Reply