<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Performance of C++ 2D array iterator dereferencing  in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792516#M570</link>
    <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1341620662812="60" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=353541" href="https://community.intel.com/en-us/profile/353541/" class="basic"&gt;Sergey Kostrov&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;A whileago I've provided a prototype of C++ class that allows to do transforms with data sets...&lt;/I&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;P&gt;// &lt;STRONG&gt;Simple 2D Data Set class&lt;/STRONG&gt; ( allows transforms )&lt;BR /&gt;//&lt;BR /&gt;// &lt;STRONG&gt;Notes:&lt;/STRONG&gt;&lt;BR /&gt;// - This is a prototype I used for a template based 2Ddata set class ( it's avery different but idea is the same )&lt;BR /&gt;// - An underlying 1D array for a 2D array is a CONTIGUOUS&lt;BR /&gt;// - A Transform functionality assumes that the underlying 1D array is not reallocated&lt;BR /&gt;// - You could easily add methods like 'SetValue', 'Clear', 'LoadData', C++ operators, etc&lt;/P&gt;&lt;P&gt;class &lt;STRONG&gt;CDataSet2D&lt;/STRONG&gt;&lt;BR /&gt;{&lt;BR /&gt;public:&lt;BR /&gt; &lt;STRONG&gt;CDataSet2D&lt;/STRONG&gt;()&lt;BR /&gt; {&lt;BR /&gt; Init();&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt; virtual ~&lt;STRONG&gt;CDataSet2D&lt;/STRONG&gt;()&lt;BR /&gt; {&lt;BR /&gt; Free();&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt;private:&lt;BR /&gt; void &lt;STRONG&gt;Init&lt;/STRONG&gt;( void )&lt;BR /&gt; {&lt;BR /&gt; m_iRows = 0;&lt;BR /&gt; m_iCols = 0;&lt;/P&gt;&lt;P&gt; m_piData1D = NULL;&lt;BR /&gt; m_piData2D = NULL;&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt;public:&lt;BR /&gt; int &lt;STRONG&gt;Allocate&lt;/STRONG&gt;( int iRows, int iCols )&lt;BR /&gt; {&lt;BR /&gt; if( iRows &amp;lt;= 0 || iCols &amp;lt;= 0 )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; m_iRows = iRows;&lt;BR /&gt; m_iCols = iCols;&lt;/P&gt;&lt;P&gt; m_piData1D = ( int * )malloc( ( m_iRows * m_iCols ) * sizeof( int ) );&lt;BR /&gt; if( m_piData1D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; memset( m_piData1D, 0x0, ( m_iRows * m_iCols ) * sizeof( int ) );&lt;/P&gt;&lt;P&gt; m_piData2D = ( int ** )malloc( m_iRows * sizeof( int * ) );&lt;BR /&gt; if( m_piData2D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; int *piData = m_piData1D;&lt;/P&gt;&lt;P&gt; for( int i = 0; i &amp;lt; m_iRows; i++ )&lt;BR /&gt; {&lt;BR /&gt; m_piData2D&lt;I&gt; = piData;&lt;BR /&gt; piData += m_iCols;&lt;BR /&gt; }&lt;/I&gt;&lt;/P&gt;&lt;P&gt; return ( int )1;&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt; void &lt;STRONG&gt;Free&lt;/STRONG&gt;( void )&lt;BR /&gt; {&lt;BR /&gt; if( m_piData2D != NULL )&lt;BR /&gt; {&lt;BR /&gt; free( m_piData2D );&lt;BR /&gt; m_piData2D = NULL;&lt;BR /&gt; }&lt;/P&gt;&lt;P&gt; if( m_piData1D != NULL )&lt;BR /&gt; {&lt;BR /&gt; free( m_piData1D );&lt;BR /&gt; m_piData1D = NULL;&lt;BR /&gt; }&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt; int &lt;STRONG&gt;Transform&lt;/STRONG&gt;( int iRows, int iCols )&lt;BR /&gt; {&lt;BR /&gt; if( iRows &amp;lt;= 0 || iCols &amp;lt;= 0 )&lt;BR /&gt; return ( int )0;&lt;BR /&gt; if( m_iRows &amp;lt;= 0 || m_iCols &amp;lt;= 0 )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; if( ( m_iRows * m_iCols ) != ( iRows * iCols ) )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; if( m_piData1D == NULL )&lt;BR /&gt; return ( int )0;&lt;BR /&gt; if( m_piData2D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; m_iRows = iRows;&lt;BR /&gt; m_iCols = iCols;&lt;/P&gt;&lt;P&gt; if( m_piData2D != NULL )&lt;BR /&gt; {&lt;BR /&gt; free( m_piData2D );&lt;BR /&gt; m_piData2D = NULL;&lt;BR /&gt; }&lt;/P&gt;&lt;P&gt; m_piData2D = ( int ** )malloc( m_iRows * sizeof( int * ) );&lt;BR /&gt; if( m_piData2D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; int *piData = m_piData1D;&lt;/P&gt;&lt;P&gt; for( int i = 0; i &amp;lt; m_iRows; i++ )&lt;BR /&gt; {&lt;BR /&gt; m_piData2D&lt;I&gt; = piData;&lt;BR /&gt; piData += m_iCols;&lt;BR /&gt; }&lt;/I&gt;&lt;/P&gt;&lt;P&gt; return ( int )1;&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt;protected:&lt;BR /&gt; int m_iRows;&lt;BR /&gt; int m_iCols;&lt;/P&gt;&lt;P&gt; int *m_piData1D;&lt;BR /&gt; int **m_piData2D;&lt;BR /&gt;};&lt;/P&gt;&lt;P&gt;void &lt;STRONG&gt;main&lt;/STRONG&gt;( void )&lt;BR /&gt;{&lt;BR /&gt; int iRetCode = -1;&lt;/P&gt;&lt;P&gt; CDataSet2D ds2D;&lt;/P&gt;&lt;P&gt; iRetCode = ds2D.Allocate( 5, 5 ); // Initialized as Matrix 5x5&lt;BR /&gt; iRetCode = ds2D.Transform( 1, 25 ); // Transformed to Array 1x25&lt;BR /&gt; iRetCode = ds2D.Transform( 25, 1 ); // Transformed to Vector 25x1&lt;BR /&gt; iRetCode = ds2D.Transform( 7, 7 ); // Attempt to Transform to Matrix 7x7 - Invalid input&lt;BR /&gt;}&lt;/P&gt;&lt;/DIV&gt;</description>
    <pubDate>Sat, 07 Jul 2012 00:20:56 GMT</pubDate>
    <dc:creator>SergeyKostrov</dc:creator>
    <dc:date>2012-07-07T00:20:56Z</dc:date>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792509#M563</link>
      <description>&lt;P&gt;I have a row-major iterator to a 2D array, with dereference operator as follows:&lt;/P&gt;&lt;P&gt;int&amp;amp; Iterator::operator*()&lt;BR /&gt;{ &lt;BR /&gt; return matrix_[y_][x_]; //matrix_ has type int**&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;The (prefix) increment operator is as follows:&lt;/P&gt;&lt;P&gt;Iterator&amp;amp; Iterator::operator++()&lt;BR /&gt;{&lt;BR /&gt; //ys_, xs_ are dimensions of matrix&lt;BR /&gt; if((++x_ == xs_) &amp;amp;&amp;amp; (++y_ != ys_)) x_ = 0;&lt;BR /&gt; return *this;&lt;BR /&gt;} &lt;/P&gt;&lt;P&gt;I can use this iterator with an optimised version of std::transform (mine doesn't return an un-needed result, in order to save a few instructions)&lt;/P&gt;&lt;P&gt;template &amp;lt; class InputIterator, class OutputIterator, class UnaryOperator &amp;gt;&lt;BR /&gt;inline void MyTransform( InputIterator first1, InputIterator last1,OutputIterator result,&lt;BR /&gt; UnaryOperator op )&lt;BR /&gt;{&lt;BR /&gt; for (; first1 != last1; ++first1, ++result)&lt;BR /&gt; *result = op(*first1);&lt;BR /&gt;} &lt;/P&gt;&lt;P&gt;calling it thus:&lt;/P&gt;&lt;P&gt;MyTransform(matrix1.begin(),matrix1.end(),matrix2.begin(), MyFunctor());&lt;/P&gt;&lt;P&gt;However, when I compare the performance to a classic, nested for-loop, to wit:&lt;/P&gt;&lt;P&gt; MyFunctor() f;&lt;/P&gt;&lt;P&gt; for (int y=0; y&lt;YSIZE&gt;&lt;/YSIZE&gt; for (int x=0; x&lt;XSIZE&gt;&lt;/XSIZE&gt; matrix2.&lt;Y&gt;&lt;X&gt; = f(matrix1.&lt;Y&gt;&lt;X&gt;);&lt;/X&gt;&lt;/Y&gt;&lt;/X&gt;&lt;/Y&gt;&lt;/P&gt;&lt;P&gt;the iterator-based solution is approx. 25% slower than the nested for-loop solution. &lt;/P&gt;&lt;P&gt;Now the problem does not seem to be with the iterator increment operator, as if I do the following (ugly) hybrid solution combining iterator-traversal and raw array access (the arrays being indexed using the iterators' internal counts):&lt;/P&gt;&lt;P&gt;MyFunctor func;&lt;/P&gt;&lt;P&gt;for (; mat1Begin != mat1End; ++mat1Begin, ++mat2Begin)&lt;BR /&gt;{ &lt;BR /&gt; //mat1 and mat2 are type int**&lt;BR /&gt; mat2[mat2Begin.y_][mat2Begin.x_] =&lt;BR /&gt; func(mat1[mat1Begin.y_][mat1Begin.x_]);&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;this hybrid is actually a little faster than the nested for-loop solution, suggesting the actual iterator traversal is fast. This suggests to me that the performance hit is in the iterator's dereferencing when doing the assignment. &lt;/P&gt;&lt;P&gt;My question is, why does dereferencing the iterators in the assignment &lt;/P&gt;&lt;P&gt; *result = op(*first1);&lt;/P&gt;&lt;P&gt;incur such a massive performance hit, relative to raw array access? Is there any technique I can use for this simple design, to get performance (almost) equivalent to the raw array version? I much prefer to use iterators where possible but this is for a very performance-sensitive piece of code, for which such a large hit is not tolerable.&lt;/P&gt;&lt;P&gt;Any help much appreciated.&lt;/P&gt;&lt;P&gt;(I'm using evaluation version of Intel Parallel Studio XE 2011 with Visual Studio 10, with default release configuration optimisations.) &lt;/P&gt;</description>
      <pubDate>Fri, 29 Jun 2012 16:49:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792509#M563</guid>
      <dc:creator>tj1</dc:creator>
      <dc:date>2012-06-29T16:49:21Z</dc:date>
    </item>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792510#M564</link>
      <description>I suppose that performance lose is here:&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;if((++x_ == xs_) &amp;amp;&amp;amp; (++y_ != ys_)) x_ = 0;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;You do this check twice inside each loop pass.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I would suggest to represent 2D array as simple int* pointing to xs_*ys_*sizeof(int) allocated memory. Than for increment you just increment iterator's (aggregated) pointer. You will save time for checks and calculating offset twice.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 02 Jul 2012 08:41:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792510#M564</guid>
      <dc:creator>Arthur_Moroz</dc:creator>
      <dc:date>2012-07-02T08:41:07Z</dc:date>
    </item>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792511#M565</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1341273226421="58" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=568809" href="https://community.intel.com/en-us/profile/568809/" class="basic"&gt;tj1&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;&lt;P&gt;...&lt;BR /&gt; for (int y=0; y&lt;YSIZE&gt;&lt;/YSIZE&gt; for (int x=0; x&lt;XSIZE&gt;&lt;/XSIZE&gt; matrix2.&lt;Y&gt;&lt;X&gt; = f(matrix1.&lt;Y&gt;&lt;X&gt;);&lt;/X&gt;&lt;/Y&gt;&lt;/X&gt;&lt;/Y&gt;&lt;/P&gt;&lt;P&gt;the iterator-based solution is approx. 25% slower than the nested for-loop solution...&lt;/P&gt;&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;That performance hit is not a surprise for me because if you really want to achieve as better as possible&lt;BR /&gt;performance it is notrecommended to use any C++ operators. I usetwo pointers in order to access&lt;BR /&gt;2-D data sets and they are initialized as follows:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;[cpp]	...
	T *m_ptData1D;
	T **m_ptData2D;
	...

	...
	m_ptData1D = ( T * )CrtMalloc( m_uiSize * sizeof( T ) );
	if( m_ptData1D == RTnull )
		return ( RTbool )RTfalse;

	m_ptData2D = ( T ** )CrtMalloc( m_uiRows * sizeof( T * ) );
	if( m_ptData2D == RTnull )
		return ( RTbool )RTfalse;

	T *ptData = m_ptData1D;

	for( RTuint i = 0; i &amp;lt; m_uiRows; i++ )
	{
		m_ptData2D&lt;I&gt; = ptData;
		ptData += m_uiCols;
	}
	...
[/cpp]&lt;/I&gt;</description>
      <pubDate>Tue, 03 Jul 2012 00:03:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792511#M565</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-07-03T00:03:52Z</dc:date>
    </item>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792512#M566</link>
      <description>&lt;P&gt;Hi, Arthur,&lt;/P&gt;&lt;P&gt;The code in your response short-circuits if the first Boolean evaluates to false, so actually is fairly optimal. In any case, that code is from the incrementing operator.As stated in my original post, that is not where the problem lies, it is with the peformance of the dereferencing operator. &lt;BR /&gt;&lt;BR /&gt;I have modified the code so that the outer counter of the loop is cached, so the code now looks as follows:&lt;/P&gt;&lt;P&gt;int&amp;amp; Iterator::operator*()&lt;BR /&gt;{&lt;BR /&gt; return column_[x_];&lt;BR /&gt;} &lt;/P&gt;&lt;P&gt;//prefix incr.&lt;BR /&gt;Iterator&amp;amp; Iterator::operator++()&lt;BR /&gt;{&lt;BR /&gt; if(++x_ == xs_)&lt;BR /&gt; {&lt;BR /&gt; if(++y_ != ys_)&lt;BR /&gt; { &lt;BR /&gt; x_ = 0;&lt;BR /&gt; column_ = matrix_[y_];&lt;BR /&gt; }&lt;BR /&gt; }&lt;BR /&gt; return *this;&lt;BR /&gt;} &lt;BR /&gt;&lt;BR /&gt;This improves performance to ~85% of the raw 2D array performance. &lt;BR /&gt;&lt;BR /&gt;When I convert the code to using pointer arithmetic (again caching the column) as follows, the performance is significantly worse than the raw 2D array (~ 55%):&lt;/P&gt;&lt;P&gt;//dereference&lt;BR /&gt;int&amp;amp; Image32Iterator::operator*()&lt;BR /&gt;{&lt;BR /&gt; return *ptr_;&lt;BR /&gt;} &lt;/P&gt;&lt;P&gt;//prefix&lt;BR /&gt;Image32Iterator&amp;amp; Image32Iterator::operator++()&lt;BR /&gt;{&lt;BR /&gt; if(++ptr_ == ptrEnd_)&lt;BR /&gt; {&lt;/P&gt;&lt;P&gt; if(++row_ != rowEnd_)&lt;BR /&gt; { &lt;BR /&gt; ptrEnd_ = (ptr_ = *row_) + xs_;&lt;BR /&gt; }&lt;BR /&gt; }&lt;BR /&gt; return *this;&lt;BR /&gt;} &lt;BR /&gt;&lt;BR /&gt;I am surprised that the pointer arithmetic solution performs so much worse, and don't quite understand. I was trying to use pointer arithmetic to see if I could get &amp;gt; 85%!&lt;/P&gt;</description>
      <pubDate>Tue, 03 Jul 2012 07:17:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792512#M566</guid>
      <dc:creator>tj1</dc:creator>
      <dc:date>2012-07-03T07:17:45Z</dc:date>
    </item>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792513#M567</link>
      <description>Hi, Sergey,&lt;BR /&gt;Thanks, that is actually the structure I use internally for my 2D array, however I wanted to use a row-majoriterator pattern as then it can be used with STL algorithms such as std::transform. As stated in my original post, the C++ increment operator I implemented performed well, it was just the dereferencing operator which underperformed. I am prepared to sacrifice a little performance, to make code moregeneric and typesafe, the question is how much should I expect to have to sacrifice, which is what my investigations are aiming at finding out.&lt;BR /&gt;&lt;BR /&gt;As discussed in my response to Arthur's post, I improved performance by caching the outer row counter, which improved performance to ~85%. &lt;BR /&gt;&lt;BR /&gt;Strangely, however, when I use pointer arithmetic (plus caching outer counter) in my iterator, the performance is actually worse, at around 55 - 60%.</description>
      <pubDate>Tue, 03 Jul 2012 07:55:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792513#M567</guid>
      <dc:creator>tj1</dc:creator>
      <dc:date>2012-07-03T07:55:36Z</dc:date>
    </item>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792514#M568</link>
      <description>Actually, my main idea was to use plain continuous memory, you don't need to allocate an array per column (or row). Just allocate Xs * Ys elements.&lt;DIV&gt;&lt;/DIV&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;class Array2D&lt;/DIV&gt;&lt;DIV&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;int* pArray;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;int size, width;&lt;/DIV&gt;&lt;DIV&gt;public:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;Array2D(int xs, int ys)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;: width(xs), size(xs*ys), pArray(new int[xs*ys])&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;{}&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;. . .&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;class iterator&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;{&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;int*&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;ptr;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;public:&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;iterator(Array2D&amp;amp; arr, int offs=0)&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;			&lt;/SPAN&gt;: ptr( arr.pArray + offs )&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;{}&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;inline iterator&amp;amp; operator++() { ++ptr; return *this; }&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;		&lt;/SPAN&gt;inline int&amp;amp; operator*() { return *ptr; }&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;};&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;friend class iterator;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;inline iterator begin() { return iterator(*this, 0); }&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="white-space: pre;"&gt;	&lt;/SPAN&gt;inline iterator end() { return iterator(*this, size); }&lt;/DIV&gt;};&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;It's just a scheme, you will need to add ctors, dtors and validity checks. But I hope this scheme will explain better what I've tried to say in words.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 06 Jul 2012 09:32:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792514#M568</guid>
      <dc:creator>Arthur_Moroz</dc:creator>
      <dc:date>2012-07-06T09:32:01Z</dc:date>
    </item>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792515#M569</link>
      <description>A whileago I've provided a prototype of C++ class that allows to do transforms with data sets. Please take a look:&lt;BR /&gt;&lt;BR /&gt; &lt;STRONG&gt;IPP forum&lt;/STRONG&gt;: Reformatting data using strides&lt;BR /&gt;&lt;BR /&gt; &lt;A href="http://software.intel.com/en-us/forums/showpost.php?p=174994"&gt;http://software.intel.com/en-us/forums/showpost.php?p=174994&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;The C++ class has a different kind of 'transform' methodcompared to STL 'transform' method and it simply&lt;BR /&gt;demonstrates how a data set is initialized, transformed, released, etc.&lt;BR /&gt;</description>
      <pubDate>Sat, 07 Jul 2012 00:18:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792515#M569</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-07-07T00:18:51Z</dc:date>
    </item>
    <item>
      <title>Performance of C++ 2D array iterator dereferencing</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792516#M570</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1341620662812="60" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=353541" href="https://community.intel.com/en-us/profile/353541/" class="basic"&gt;Sergey Kostrov&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;A whileago I've provided a prototype of C++ class that allows to do transforms with data sets...&lt;/I&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;P&gt;// &lt;STRONG&gt;Simple 2D Data Set class&lt;/STRONG&gt; ( allows transforms )&lt;BR /&gt;//&lt;BR /&gt;// &lt;STRONG&gt;Notes:&lt;/STRONG&gt;&lt;BR /&gt;// - This is a prototype I used for a template based 2Ddata set class ( it's avery different but idea is the same )&lt;BR /&gt;// - An underlying 1D array for a 2D array is a CONTIGUOUS&lt;BR /&gt;// - A Transform functionality assumes that the underlying 1D array is not reallocated&lt;BR /&gt;// - You could easily add methods like 'SetValue', 'Clear', 'LoadData', C++ operators, etc&lt;/P&gt;&lt;P&gt;class &lt;STRONG&gt;CDataSet2D&lt;/STRONG&gt;&lt;BR /&gt;{&lt;BR /&gt;public:&lt;BR /&gt; &lt;STRONG&gt;CDataSet2D&lt;/STRONG&gt;()&lt;BR /&gt; {&lt;BR /&gt; Init();&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt; virtual ~&lt;STRONG&gt;CDataSet2D&lt;/STRONG&gt;()&lt;BR /&gt; {&lt;BR /&gt; Free();&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt;private:&lt;BR /&gt; void &lt;STRONG&gt;Init&lt;/STRONG&gt;( void )&lt;BR /&gt; {&lt;BR /&gt; m_iRows = 0;&lt;BR /&gt; m_iCols = 0;&lt;/P&gt;&lt;P&gt; m_piData1D = NULL;&lt;BR /&gt; m_piData2D = NULL;&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt;public:&lt;BR /&gt; int &lt;STRONG&gt;Allocate&lt;/STRONG&gt;( int iRows, int iCols )&lt;BR /&gt; {&lt;BR /&gt; if( iRows &amp;lt;= 0 || iCols &amp;lt;= 0 )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; m_iRows = iRows;&lt;BR /&gt; m_iCols = iCols;&lt;/P&gt;&lt;P&gt; m_piData1D = ( int * )malloc( ( m_iRows * m_iCols ) * sizeof( int ) );&lt;BR /&gt; if( m_piData1D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; memset( m_piData1D, 0x0, ( m_iRows * m_iCols ) * sizeof( int ) );&lt;/P&gt;&lt;P&gt; m_piData2D = ( int ** )malloc( m_iRows * sizeof( int * ) );&lt;BR /&gt; if( m_piData2D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; int *piData = m_piData1D;&lt;/P&gt;&lt;P&gt; for( int i = 0; i &amp;lt; m_iRows; i++ )&lt;BR /&gt; {&lt;BR /&gt; m_piData2D&lt;I&gt; = piData;&lt;BR /&gt; piData += m_iCols;&lt;BR /&gt; }&lt;/I&gt;&lt;/P&gt;&lt;P&gt; return ( int )1;&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt; void &lt;STRONG&gt;Free&lt;/STRONG&gt;( void )&lt;BR /&gt; {&lt;BR /&gt; if( m_piData2D != NULL )&lt;BR /&gt; {&lt;BR /&gt; free( m_piData2D );&lt;BR /&gt; m_piData2D = NULL;&lt;BR /&gt; }&lt;/P&gt;&lt;P&gt; if( m_piData1D != NULL )&lt;BR /&gt; {&lt;BR /&gt; free( m_piData1D );&lt;BR /&gt; m_piData1D = NULL;&lt;BR /&gt; }&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt; int &lt;STRONG&gt;Transform&lt;/STRONG&gt;( int iRows, int iCols )&lt;BR /&gt; {&lt;BR /&gt; if( iRows &amp;lt;= 0 || iCols &amp;lt;= 0 )&lt;BR /&gt; return ( int )0;&lt;BR /&gt; if( m_iRows &amp;lt;= 0 || m_iCols &amp;lt;= 0 )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; if( ( m_iRows * m_iCols ) != ( iRows * iCols ) )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; if( m_piData1D == NULL )&lt;BR /&gt; return ( int )0;&lt;BR /&gt; if( m_piData2D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; m_iRows = iRows;&lt;BR /&gt; m_iCols = iCols;&lt;/P&gt;&lt;P&gt; if( m_piData2D != NULL )&lt;BR /&gt; {&lt;BR /&gt; free( m_piData2D );&lt;BR /&gt; m_piData2D = NULL;&lt;BR /&gt; }&lt;/P&gt;&lt;P&gt; m_piData2D = ( int ** )malloc( m_iRows * sizeof( int * ) );&lt;BR /&gt; if( m_piData2D == NULL )&lt;BR /&gt; return ( int )0;&lt;/P&gt;&lt;P&gt; int *piData = m_piData1D;&lt;/P&gt;&lt;P&gt; for( int i = 0; i &amp;lt; m_iRows; i++ )&lt;BR /&gt; {&lt;BR /&gt; m_piData2D&lt;I&gt; = piData;&lt;BR /&gt; piData += m_iCols;&lt;BR /&gt; }&lt;/I&gt;&lt;/P&gt;&lt;P&gt; return ( int )1;&lt;BR /&gt; };&lt;/P&gt;&lt;P&gt;protected:&lt;BR /&gt; int m_iRows;&lt;BR /&gt; int m_iCols;&lt;/P&gt;&lt;P&gt; int *m_piData1D;&lt;BR /&gt; int **m_piData2D;&lt;BR /&gt;};&lt;/P&gt;&lt;P&gt;void &lt;STRONG&gt;main&lt;/STRONG&gt;( void )&lt;BR /&gt;{&lt;BR /&gt; int iRetCode = -1;&lt;/P&gt;&lt;P&gt; CDataSet2D ds2D;&lt;/P&gt;&lt;P&gt; iRetCode = ds2D.Allocate( 5, 5 ); // Initialized as Matrix 5x5&lt;BR /&gt; iRetCode = ds2D.Transform( 1, 25 ); // Transformed to Array 1x25&lt;BR /&gt; iRetCode = ds2D.Transform( 25, 1 ); // Transformed to Vector 25x1&lt;BR /&gt; iRetCode = ds2D.Transform( 7, 7 ); // Attempt to Transform to Matrix 7x7 - Invalid input&lt;BR /&gt;}&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Sat, 07 Jul 2012 00:20:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Performance-of-C-2D-array-iterator-dereferencing/m-p/792516#M570</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-07-07T00:20:56Z</dc:date>
    </item>
  </channel>
</rss>

