- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks
Joan
Compilation output:
[jpuig@new-host snipets]$ /opt/intel/cc/9.0/bin/icc compilerTest3.cxx -ipo -O3 -xN -vec-report3 -opt-report-levelmax -parallel; ./a.out
IPO: performing single-file optimizations
IPO: generating object file /home/jpuig/tmp/ipo_iccYuQuxV.o
compilerTest3.cxx(101) : (col. 4) remark: loop was not vectorized: unsupported loop structure.
compilerTest3.cxx(107) : (col. 2) remark: LOOP WAS VECTORIZED.
Execution times of the loops:
Elapsed time: 0.907152 seconds
Elapsed time: 0.73452 seconds
Compiler info:
[jpuig@new-host snipets]$ /opt/intel/cc/9.0/bin/icc -V
Intel C Compiler for 32-bit applications, Version 9.0 Build 20050809Z Package ID:
Copyright (C) 1985-2005 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY
Message Edited by gte237z@mail.gatech.edu on 01-06-2006 07:17 AM
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear gte,
This is unfortunately an issue with the conservative assumptions the vectorizer sometimes has to make. One way of enabling vectorization is to make local copies of the upper bound and the pointer, as in:
inlinevoid setAllElements(double value){
int lup = numel;
double *ldat = data;
for (int i = 0; i
}
};
When I am back from my sabbatical, I hope to have a look at how the compiler analysis itself can be further improved to minimize the amount of rewriting required.
Hope this helps you for now.
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear gte,
Because the compiler can statically determine the trip-count for the C-style loop, nontemporal stores are generated automatically, yielding better performance. If you add a #pragma vector nontemporal to the C++-style loop, then setAllElements() performs equally well.
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 01-09-2006 07:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for your help, this little tweaks will have very big performance implications throughout the code I write.
Thanks,
Joan
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page