- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The loop is simple
void loop(int n, double* a, double const* b)
{
#pragma ivdep
for (int i = 0; i < n; ++i, ++a, ++b)
*a *= *b;
}
I am using intel c++ compiler and using #pragma ivdep
for optimization currently. Any way to make it perform better like using multicore and vectorization together, or other techniques?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If that loop is long enough (e.g. count > 10000) to benefit from multi-core parallel as well as vectorization, you could try
#pragma omp for simd
(with -qopenmp compile option), or equivalent auto-parallelization options including reducing par-threshold, along with setting appropriate OMP_NUM_THREADS and OMP_PLACES.
Ostensibly, cilk_for simd might do it, although it may not improve performance significantly.
In many realistic situations, nested loops with threaded parallel outer and simd vector inner loops are needed to take advantage of multi-core.
Note that the loop you quote appears eligible for compiler substitution of fast_memcpy, involving run-time selection of aligned nontemporal stores where possible, which you won't see detailed in opt-report.
MKL dcopy() is a more ancient remedies which should do what you request (also using the OMP or MKL environment variables).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Avoid incrementing pointers. Use [subscripts] instead.
void loop(int n, double* a, double const* b) { #pragma ivdep for (int i = 0; i < n; ++i) a *= b; }
The compiler optimizer prefers this syntax.
If (when) n is large enough to amortize the thread pool setup, then you might consider using a #pragma omp parallel for. The body of your loop has little complexity. For this case the benefit of parallelization may come in with n .gt. 10000.
Look at your reports to assure you attain complete vectorization.
Note, if a and b are known to be aligned, then specifying that they are will provide for additional opportunities of optimization.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are sure that both pointers a and b do not overlap then you can add restrict qualifier. In your example b is declared as a const pointer so I am not sure how important for the compiler optimization will be addition of restrict to b.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page