topic Re: Fast "sum" routine in Intel® oneAPI Math Kernel Library

Fast "sum" routine

AndrewC — Tue, 15 Nov 2005 01:08:24 GMT

I need to efficiently compute the element sum of a double precision vector (a[0]+a[1]+..a[n-1]) . Is there a routine in MKL for this. The BLAS ?asum compute sum of the magnitudes, unfortunately.

Andrew

Re: Fast "sum" routine

TimP — Tue, 15 Nov 2005 01:20:54 GMT

Intel compiler optimizations do this effectively.

Re: Fast "sum" routine

AndrewC — Tue, 15 Nov 2005 01:55:36 GMT

I am using Intel 9.0, so I gather you are suggesting just a simple "for" loop. Any specific optimization directives I should use?
This does seem like multithreading/paralellization would help here as well...

Andrew

Re: Fast "sum" routine

TimP — Tue, 15 Nov 2005 03:21:12 GMT

A loop such as
for(int i=0, sum=0;i < n;++n)sum += a;
(with sum as a local variable declared the same type as a[])
should optimize easily. For example, on Xeon or P4, use options
icc -O -xW
or, for an SSE3 machine -xP.
-O1 may be superior to -O2 for loops of moderate length.

Re: Fast "sum" routine

AndrewC — Tue, 15 Nov 2005 05:20:24 GMT

As a little test, I tried this on a Pentium D with /Qopenmp and OMP_NUM_THREADS=2 and saw 100 percent CPU usage. Very nice...

The build log window did say OpenMP defined loop was parallelized

double result=0.0;

const double *data=s.data();

int nEntries=s.rows()*s.cols();

#pragma omp parallel for reduction(+:result)

for (int i=0;i

result+=data;

}