- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I need to efficiently compute the element sum of a double precision vector (a[0]+a[1]+..a[n-1]) . Is there a routine in MKL for this. The BLAS ?asum compute sum of the magnitudes, unfortunately.
Andrew
Andrew
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel compiler optimizations do this effectively.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using Intel 9.0, so I gather you are suggesting just a simple "for" loop. Any specific optimization directives I should use?
This does seem like multithreading/paralellization would help here as well...
Andrew
This does seem like multithreading/paralellization would help here as well...
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A loop such as
for(int i=0, sum=0;i < n;++n)sum += a;
(with sum as a local variable declared the same type as a[])
should optimize easily. For example, on Xeon or P4, use options
icc -O -xW
or, for an SSE3 machine -xP.
-O1 may be superior to -O2 for loops of moderate length.
for(int i=0, sum=0;i < n;++n)sum += a;
(with sum as a local variable declared the same type as a[])
should optimize easily. For example, on Xeon or P4, use options
icc -O -xW
or, for an SSE3 machine -xP.
-O1 may be superior to -O2 for loops of moderate length.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As a little test, I tried this on a Pentium D with /Qopenmp and OMP_NUM_THREADS=2 and saw 100 percent CPU usage. Very nice...
The build log window did say OpenMP defined loop was parallelized
double result=0.0;
const double *data=s.data();
int nEntries=s.rows()*s.cols();
#pragma omp parallel for reduction(+:result)
for (int i=0;i
result+=data;
}
The build log window did say OpenMP defined loop was parallelized
double result=0.0;
const double *data=s.data();
int nEntries=s.rows()*s.cols();
#pragma omp parallel for reduction(+:result)
for (int i=0;i
result+=data;
}

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page