Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

Fast "sum" routine

AndrewC
New Contributor III
876 Views
I need to efficiently compute the element sum of a double precision vector (a[0]+a[1]+..a[n-1]) . Is there a routine in MKL for this. The BLAS ?asum compute sum of the magnitudes, unfortunately.

Andrew
0 Kudos
4 Replies
TimP
Honored Contributor III
876 Views
Intel compiler optimizations do this effectively.
0 Kudos
AndrewC
New Contributor III
876 Views
I am using Intel 9.0, so I gather you are suggesting just a simple "for" loop. Any specific optimization directives I should use?
This does seem like multithreading/paralellization would help here as well...

Andrew
0 Kudos
TimP
Honored Contributor III
876 Views
A loop such as
for(int i=0, sum=0;i < n;++n)sum += a;
(with sum as a local variable declared the same type as a[])
should optimize easily. For example, on Xeon or P4, use options
icc -O -xW
or, for an SSE3 machine -xP.
-O1 may be superior to -O2 for loops of moderate length.
0 Kudos
AndrewC
New Contributor III
876 Views
As a little test, I tried this on a Pentium D with /Qopenmp and OMP_NUM_THREADS=2 and saw 100 percent CPU usage. Very nice...

The build log window did say OpenMP defined loop was parallelized


double result=0.0;

const double *data=s.data();

int nEntries=s.rows()*s.cols();

#pragma omp parallel for reduction(+:result)

for (int i=0;i

result+=data;

}
0 Kudos
Reply