- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Note 1 ]
Loop-Blocking Optimization Technique is well described in Intel Software Development Manual and Intel C++ compiler User and Reference Guides. After extensive testing I could say that it is very important to select a right Block Size for the last for-loop and its optimal size depends on a size of L1 cache line of a CPU.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By Loop - Blocking Optimization techniques do you mean dividing data block int cache lines (32-bytes) long and inner loop iteration on every double or float value(inside cache line)?
Here is an example:
void arrayAdditionTest2(double (*input)[MAX_SIZE],double (*output)[MAX_SIZE]){
double _in[MAX_SIZE][MAX_SIZE],_out[MAX_SIZE][MAX_SIZE],result[MAX_SIZE][MAX_SIZE];
double (*res)[MAX_SIZE];
if(input == NULL || output == NULL)
return;
input = _in;
output = _out;
res = result;
for(int i = 0;i < MAX_SIZE;i++){
for(int j = 0;j < MAX_SIZE;j++){
printf("array input[] = %.17f \n",*(*(input+i)+j));
}
}
for(int i = 0;i < MAX_SIZE;i+=CACHE_LINE){
for(int j = 0;j < MAX_SIZE;j+=CACHE_LINE){
for(int ii = i;ii <i + CACHE_LINE;ii++){
for(int jj = j;jj <j + CACHE_LINE;jj++){
*(*(res+ii)+jj) = *(*(output+ii)+jj) + *(*(input+ii)+jj);
printf("Loop Blocking test2 = %.17f %.17f \n",*(*(res+ii)+jj));
}
}
}
}
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to gcc docs, -fomit-frame-pointer is implied by -O, for cases where it is possible (-g would turn it off). It seems it would be important mainly for 32-bit mode.
IIRC -fprefetch-loop-arrays was designed for AMD athlon-32 CPUs. On any current CPU, it could be useful only for specialized cases, such as where the limit on hardware prefetched streams is exceeded, or DTLB misses can be mitigated without premature cache eviction.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>fomit-frame-pointer>>>
Is that option used to load ebp register with arbitrary data?So call stack frames are accessed with esp register.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yep that true, but FPO complicates debugging.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FPO = frame pointer omittion.
Sorry for offtopic post.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page