- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I have a row-major iterator to a 2D array, with dereference operator as follows:

int& Iterator::operator*()

{

return matrix_[y_][x_]; //matrix_ has type int**

}

The (prefix) increment operator is as follows:

Iterator& Iterator::operator++()

{

//ys_, xs_ are dimensions of matrix

if((++x_ == xs_) && (++y_ != ys_)) x_ = 0;

return *this;

}

I can use this iterator with an optimised version of std::transform (mine doesn't return an un-needed result, in order to save a few instructions)

template < class InputIterator, class OutputIterator, class UnaryOperator >

inline void MyTransform( InputIterator first1, InputIterator last1,OutputIterator result,

UnaryOperator op )

{

for (; first1 != last1; ++first1, ++result)

*result = op(*first1);

}

calling it thus:

MyTransform(matrix1.begin(),matrix1.end(),matrix2.begin(), MyFunctor());

However, when I compare the performance to a classic, nested for-loop, to wit:

MyFunctor() f;

for (int y=0; y

the iterator-based solution is approx. 25% slower than the nested for-loop solution.

Now the problem does not seem to be with the iterator increment operator, as if I do the following (ugly) hybrid solution combining iterator-traversal and raw array access (the arrays being indexed using the iterators' internal counts):

MyFunctor func;

for (; mat1Begin != mat1End; ++mat1Begin, ++mat2Begin)

{

//mat1 and mat2 are type int**

mat2[mat2Begin.y_][mat2Begin.x_] =

func(mat1[mat1Begin.y_][mat1Begin.x_]);

}

this hybrid is actually a little faster than the nested for-loop solution, suggesting the actual iterator traversal is fast. This suggests to me that the performance hit is in the iterator's dereferencing when doing the assignment.

My question is, why does dereferencing the iterators in the assignment

*result = op(*first1);

incur such a massive performance hit, relative to raw array access? Is there any technique I can use for this simple design, to get performance (almost) equivalent to the raw array version? I much prefer to use iterators where possible but this is for a very performance-sensitive piece of code, for which such a large hit is not tolerable.

Any help much appreciated.

(I'm using evaluation version of Intel Parallel Studio XE 2011 with Visual Studio 10, with default release configuration optimisations.)

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

...

for (int y=0; y

the iterator-based solution is approx. 25% slower than the nested for-loop solution...

That performance hit is not a surprise for me because if you really want to achieve as better as possible

performance it is notrecommended to use any C++ operators. I usetwo pointers in order to access

2-D data sets and they are initialized as follows:

*= ptData; ptData += m_uiCols; } ... [/cpp]*

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi, Arthur,

The code in your response short-circuits if the first Boolean evaluates to false, so actually is fairly optimal. In any case, that code is from the incrementing operator.As stated in my original post, that is not where the problem lies, it is with the peformance of the dereferencing operator.

I have modified the code so that the outer counter of the loop is cached, so the code now looks as follows:

int& Iterator::operator*()

{

return column_[x_];

}

//prefix incr.

Iterator& Iterator::operator++()

{

if(++x_ == xs_)

{

if(++y_ != ys_)

{

x_ = 0;

column_ = matrix_[y_];

}

}

return *this;

}

This improves performance to ~85% of the raw 2D array performance.

When I convert the code to using pointer arithmetic (again caching the column) as follows, the performance is significantly worse than the raw 2D array (~ 55%):

//dereference

int& Image32Iterator::operator*()

{

return *ptr_;

}

//prefix

Image32Iterator& Image32Iterator::operator++()

{

if(++ptr_ == ptrEnd_)

{

if(++row_ != rowEnd_)

{

ptrEnd_ = (ptr_ = *row_) + xs_;

}

}

return *this;

}

I am surprised that the pointer arithmetic solution performs so much worse, and don't quite understand. I was trying to use pointer arithmetic to see if I could get > 85%!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks, that is actually the structure I use internally for my 2D array, however I wanted to use a row-majoriterator pattern as then it can be used with STL algorithms such as std::transform. As stated in my original post, the C++ increment operator I implemented performed well, it was just the dereferencing operator which underperformed. I am prepared to sacrifice a little performance, to make code moregeneric and typesafe, the question is how much should I expect to have to sacrifice, which is what my investigations are aiming at finding out.

As discussed in my response to Arthur's post, I improved performance by caching the outer row counter, which improved performance to ~85%.

Strangely, however, when I use pointer arithmetic (plus caching outer counter) in my iterator, the performance is actually worse, at around 55 - 60%.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

class Array2D{int* pArray;int size, width;public:Array2D(int xs, int ys): width(xs), size(xs*ys), pArray(new int[xs*ys]){}. . .class iterator{int* ptr;public:iterator(Array2D& arr, int offs=0): ptr( arr.pArray + offs ){}inline iterator& operator++() { ++ptr; return *this; }inline int& operator*() { return *ptr; }};friend class iterator;inline iterator begin() { return iterator(*this, 0); }inline iterator end() { return iterator(*this, size); }};

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

**IPP forum**: Reformatting data using strides

http://software.intel.com/en-us/forums/showpost.php?p=174994

The C++ class has a different kind of 'transform' methodcompared to STL 'transform' method and it simply

demonstrates how a data set is initialized, transformed, released, etc.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

*A whileago I've provided a prototype of C++ class that allows to do transforms with data sets...*

// **Simple 2D Data Set class** ( allows transforms )

//

// **Notes:**

// - This is a prototype I used for a template based 2Ddata set class ( it's avery different but idea is the same )

// - An underlying 1D array for a 2D array is a CONTIGUOUS

// - A Transform functionality assumes that the underlying 1D array is not reallocated

// - You could easily add methods like 'SetValue', 'Clear', 'LoadData', C++ operators, etc

class **CDataSet2D**

{

public:

**CDataSet2D**()

{

Init();

};

virtual ~**CDataSet2D**()

{

Free();

};

private:

void **Init**( void )

{

m_iRows = 0;

m_iCols = 0;

m_piData1D = NULL;

m_piData2D = NULL;

};

public:

int **Allocate**( int iRows, int iCols )

{

if( iRows <= 0 || iCols <= 0 )

return ( int )0;

m_iRows = iRows;

m_iCols = iCols;

m_piData1D = ( int * )malloc( ( m_iRows * m_iCols ) * sizeof( int ) );

if( m_piData1D == NULL )

return ( int )0;

memset( m_piData1D, 0x0, ( m_iRows * m_iCols ) * sizeof( int ) );

m_piData2D = ( int ** )malloc( m_iRows * sizeof( int * ) );

if( m_piData2D == NULL )

return ( int )0;

int *piData = m_piData1D;

for( int i = 0; i < m_iRows; i++ )

{

m_piData2D* = piData; piData += m_iCols; }*

return ( int )1;

};

void **Free**( void )

{

if( m_piData2D != NULL )

{

free( m_piData2D );

m_piData2D = NULL;

}

if( m_piData1D != NULL )

{

free( m_piData1D );

m_piData1D = NULL;

}

};

int **Transform**( int iRows, int iCols )

{

if( iRows <= 0 || iCols <= 0 )

return ( int )0;

if( m_iRows <= 0 || m_iCols <= 0 )

return ( int )0;

if( ( m_iRows * m_iCols ) != ( iRows * iCols ) )

return ( int )0;

if( m_piData1D == NULL )

return ( int )0;

if( m_piData2D == NULL )

return ( int )0;

m_iRows = iRows;

m_iCols = iCols;

if( m_piData2D != NULL )

{

free( m_piData2D );

m_piData2D = NULL;

}

m_piData2D = ( int ** )malloc( m_iRows * sizeof( int * ) );

if( m_piData2D == NULL )

return ( int )0;

int *piData = m_piData1D;

for( int i = 0; i < m_iRows; i++ )

{

m_piData2D* = piData; piData += m_iCols; }*

return ( int )1;

};

protected:

int m_iRows;

int m_iCols;

int *m_piData1D;

int **m_piData2D;

};

void **main**( void )

{

int iRetCode = -1;

CDataSet2D ds2D;

iRetCode = ds2D.Allocate( 5, 5 ); // Initialized as Matrix 5x5

iRetCode = ds2D.Transform( 1, 25 ); // Transformed to Array 1x25

iRetCode = ds2D.Transform( 25, 1 ); // Transformed to Vector 25x1

iRetCode = ds2D.Transform( 7, 7 ); // Attempt to Transform to Matrix 7x7 - Invalid input

}

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page