Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Vectorization of template functions

Intel_C_Intel
Employee
306 Views

Hello all,

I amattempting to vectorize few template functions,but without much success so far. The code below highlights the problem I have. Essentially, I havea template class thathandle images (i.e. large matrices). The template parameter represent the data type held by the image (i.e. uint8, float, ...).

I have many functions that apply certain filters to these images, and I would like to vectorize them. Obviously, since the argument of the functions is an instance of a template class (the image), the function itself is a template. The name of the problematic function is 'test', defined towards the bottom of the code.'test' is called in 2 different ways.

First, in the main, I defined 2 images, 'gray' and 'tmp', and call 'test' using them as argument. Now when I compile with -QxN, the inner loop within 'test' is vectorized. Good.

However,I cannotalways have the definition of the template function in the same file as the main, because they are simply too many template functions defined.Hence I define the template in another source file,but I have to instantiate it with the actual parameters it is going to be used, otherwise the linker will notfind thecode. I tried to reproduce this organizationina single file. Just belowthe definition of 'test', I inserted a line that instantiates 'test' for the parameters with which it is going to be used. The compiler understands that correctly, but now says that it cannot vectorize the inner loop! The message is : "loop was not vectorized: deference too complex". Why is the deference now too complex, when the compiler handled it just fine for the other instantiation in the main?!

Any help would be much appreciated.

Alex

##### CUT #####

#include

#include

#include

typedef

unsigned char u8;

typedef

float f32;

////////////////////

///// MEMORY ROUTINES

////////////////////

enum

{ MemoryAlignment=64};

void

* AllocateMemory(size_t size)

{

return _aligned_malloc(size, MemoryAlignment);

}

void

ReleaseMemory(void *memblock)

{

return _aligned_free(memblock);

}

int

ComputeAlignedWidth(int width)

{

in t alignment_needed = MemoryAlignment / sizeof(float);

return (int)ceil((float)width/(float)alignment_needed) * alignment_needed;

}

////////////////////

///// CLASS DECLARATION

////////////////////

template

<typename T>

struct

Image

{

public

: // members

// std information

int width, height, depth;

// actual width of the buffer

// buffer holding image data is padded to be a multiple

// of MemoryAlignment for optimisation purposes

int width_padded;

// dimensions helper

int firstRow, lastRow, firstCol, lastCol;

// pointer to the image data

T* data;

public

: // methods

// ctor

Image():

width(0),height(0),depth(0),

width_padded(0),

firstRow(0), lastRow(0), firstCol(0), lastCol(0),

data(NULL)

{

}

// dtor

~Image()

{

}

// memory management

void Allocate() { data = static_cast(AllocateMemory(width_padded*height*depth*sizeof(T)));}

void Release () { ReleaseMemory(data);}

// pixel access

// virtual T& operator() (int row, int col)

// dimensions management

void SetDimensions(int h, int w, int d){

height = h;

width = w;

depth = d;

width_padded = ComputeAlignedWidth(width);

firstRow = 0;

firstCol = 0;

lastRow = height-1;

lastCol = width-1;

}

// size information

int GetTotalSize(bool padded=false){

if (padded) return width_padded*height*depth*sizeof(T);

else return width *height*depth*sizeof(T);

}

int GetImageSize(bool padded=false){

if (padded) return width_padded*height*depth;

else return width *height*depth;

}

int GetPlaneSize(bool padded=false){

if (padded) return width_padded*height;

else return width *height;

}

};

template

<typename T>

struct

GrayImage : public Image

{

public

: // methods

// ctor

GrayImage():

Image()

{

depth=1;

}

// pixel access

T&

operator() (int row, int col)

{

return data[row*width_padded + col];

}

};

template

<typename T>

void

test(GrayImage &input, GrayImage &output)

{

int lastR = input.lastRow, firstR = input.firstRow;

int lastC = input.lastCol, firstC = input.firstCol;

for(int row=firstR ; row<=lastR ; ++row){

#pragma

ivdep

for(int col=firstC ; col<=lastC ; ++col){

//for(int row=input.firstRow ; row<=input.lastRow ; ++row){

// for(int col=input.firstCol ; col

output(row, col) = input(row, col) + 1;

}

}

}

template

void test(GrayImage &input, GrayImage &output);

int

main(int argc, char* argv[])

{

GrayImage gray, tmp;

gray.SetDimensions(2000, 2000, 1); gray.Allocate();

tmp.SetDimensions(gray.height, gray.width, 1); tmp.Allocate();

test(gray, tmp);

gray.Release(); tmp.Release();

return 0;

}

##### CUT #####

0 Kudos
4 Replies
Lars_Petter_E_1
Beginner
306 Views

Hello,

Please try the "-Qansi_alias" compiler option!

Regards,

Lars Petter Endresen

0 Kudos
Intel_C_Intel
Employee
306 Views

again, thanks for the answer. Where can I find more information about this ANSI aliasing "concept"?

Alex

0 Kudos
TimP
Honored Contributor III
306 Views
this article
points out how the original C (and C++) standards invalidated a lot of legacy code, by disallowing the possibility of multiple incompatible typed pointers to the same object. For compatibility with Microsoft, Intel compilers require setting -Qansi_alias in order for optimization to take advantage of this. It avoids a lot of obscure cases where modifying the target of a pointer has unpredictable side effects. In principle, the compiler should notify you of visible violations, but many violations can't be detected during normal compiler analysis.
The Microsoft style option -Oa has even stricter requirements. As in Fortran, it requires that no 2 references can access the same data region, even when the types are compatible.
Much of the discussion you will find in a web search relates to compilers which work the other way; you must set an option like -fno-strict-aliasing in order to compile legacy code which violates the standard.
0 Kudos
Intel_C_Intel
Employee
306 Views
Thanks!
0 Kudos
Reply