- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I amattempting to vectorize few template functions,but without much success so far. The code below highlights the problem I have. Essentially, I havea template class thathandle images (i.e. large matrices). The template parameter represent the data type held by the image (i.e. uint8, float, ...).
I have many functions that apply certain filters to these images, and I would like to vectorize them. Obviously, since the argument of the functions is an instance of a template class (the image), the function itself is a template. The name of the problematic function is 'test', defined towards the bottom of the code.'test' is called in 2 different ways.
First, in the main, I defined 2 images, 'gray' and 'tmp', and call 'test' using them as argument. Now when I compile with -QxN, the inner loop within 'test' is vectorized. Good.
However,I cannotalways have the definition of the template function in the same file as the main, because they are simply too many template functions defined.Hence I define the template in another source file,but I have to instantiate it with the actual parameters it is going to be used, otherwise the linker will notfind thecode. I tried to reproduce this organizationina single file. Just belowthe definition of 'test', I inserted a line that instantiates 'test' for the parameters with which it is going to be used. The compiler understands that correctly, but now says that it cannot vectorize the inner loop! The message is : "loop was not vectorized: deference too complex". Why is the deference now too complex, when the compiler handled it just fine for the other instantiation in the main?!
Any help would be much appreciated.
Alex
##### CUT #####
#include
#include
#include
typedef
unsigned char u8;typedef
float f32;////////////////////
///// MEMORY ROUTINES
////////////////////
enum
{ MemoryAlignment=64};void
* AllocateMemory(size_t size){
return _aligned_malloc(size, MemoryAlignment);}
void
ReleaseMemory(void *memblock){
return _aligned_free(memblock);}
int
ComputeAlignedWidth(int width){
in t alignment_needed = MemoryAlignment / sizeof(float); return (int)ceil((float)width/(float)alignment_needed) * alignment_needed;}
////////////////////
///// CLASS DECLARATION
////////////////////
template
<typename T>struct
Image{
public
: // members // std information int width, height, depth; // actual width of the buffer // buffer holding image data is padded to be a multiple // of MemoryAlignment for optimisation purposes int width_padded; // dimensions helper int firstRow, lastRow, firstCol, lastCol; // pointer to the image dataT* data;
public
: // methods // ctorImage():
width(0),height(0),depth(0),
width_padded(0),
firstRow(0), lastRow(0), firstCol(0), lastCol(0),
data(NULL)
{
}
// dtor~Image()
{
}
// memory management void Allocate() { data = static_castheight = h;
width = w;
depth = d;
width_padded = ComputeAlignedWidth(width);
firstRow = 0;
firstCol = 0;
lastRow = height-1;
lastCol = width-1;
}
// size information int GetTotalSize(bool padded=false){ if (padded) return width_padded*height*depth*sizeof(T); else return width *height*depth*sizeof(T);}
int GetImageSize(bool padded=false){ if (padded) return width_padded*height*depth; else return width *height*depth;}
int GetPlaneSize(bool padded=false){ if (padded) return width_padded*height; else return width *height;}
};
template
<typename T>struct
GrayImage : public Image{
public
: // methods // ctorGrayImage():
Image()
{
depth=1;
}
// pixel accessT&
operator() (int row, int col){
return data[row*width_padded + col];}
};
template
<typename T>void
test(GrayImage{
#pragma
ivdep for(int col=firstC ; col<=lastC ; ++col){ //for(int row=input.firstRow ; row<=input.lastRow ; ++row){ // for(int col=input.firstCol ; coloutput(row, col) = input(row, col) + 1;
}
}
}
template
void testint
main(int argc, char* argv[]){
GrayImage
gray.SetDimensions(2000, 2000, 1); gray.Allocate();
tmp.SetDimensions(gray.height, gray.width, 1); tmp.Allocate();
test(gray, tmp);
gray.Release(); tmp.Release();
return 0;}
##### CUT #####
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Please try the "-Qansi_alias" compiler option!
Regards,
Lars Petter Endresen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
again, thanks for the answer. Where can I find more information about this ANSI aliasing "concept"?
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
points out how the original C (and C++) standards invalidated a lot of legacy code, by disallowing the possibility of multiple incompatible typed pointers to the same object. For compatibility with Microsoft, Intel compilers require setting -Qansi_alias in order for optimization to take advantage of this. It avoids a lot of obscure cases where modifying the target of a pointer has unpredictable side effects. In principle, the compiler should notify you of visible violations, but many violations can't be detected during normal compiler analysis.
The Microsoft style option -Oa has even stricter requirements. As in Fortran, it requires that no 2 references can access the same data region, even when the types are compatible.
Much of the discussion you will find in a web search relates to compilers which work the other way; you must set an option like -fno-strict-aliasing in order to compile legacy code which violates the standard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page