Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Fast "sum" routine

AndrewC
New Contributor III
678 Views
I need to efficiently compute the element sum of a double precision vector (a[0]+a[1]+..a[n-1]) . Is there a routine in MKL for this. The BLAS ?asum compute sum of the magnitudes, unfortunately.

Andrew
0 Kudos
4 Replies
TimP
Honored Contributor III
678 Views
Intel compiler optimizations do this effectively.
0 Kudos
AndrewC
New Contributor III
678 Views
I am using Intel 9.0, so I gather you are suggesting just a simple "for" loop. Any specific optimization directives I should use?
This does seem like multithreading/paralellization would help here as well...

Andrew
0 Kudos
TimP
Honored Contributor III
678 Views
A loop such as
for(int i=0, sum=0;i < n;++n)sum += a;
(with sum as a local variable declared the same type as a[])
should optimize easily. For example, on Xeon or P4, use options
icc -O -xW
or, for an SSE3 machine -xP.
-O1 may be superior to -O2 for loops of moderate length.
0 Kudos
AndrewC
New Contributor III
678 Views
As a little test, I tried this on a Pentium D with /Qopenmp and OMP_NUM_THREADS=2 and saw 100 percent CPU usage. Very nice...

The build log window did say OpenMP defined loop was parallelized


double result=0.0;

const double *data=s.data();

int nEntries=s.rows()*s.cols();

#pragma omp parallel for reduction(+:result)

for (int i=0;i

result+=data;

}
0 Kudos
Reply