- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can anyone please help as to what approach should be taken in this situation.
Thanks
HG
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please give an example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This will give us an idea of the size of the array and the number of dimensions.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
( single-precision ) as input.
In case of a similar approach for a 'char' type the biggest dimension will be 16x16, and for 'short' type it will be 8x8.
I'd like to repeatthe same question:
How big are your 'char' / 'short' matricies?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does it make sense to use a SSE based transpose for a 4x4 matrix instead of a Classic algorithm?
Please take a look atresults of a test:
DEBUG configuration
> Test1028 Start <
Sub-Test 5 - 200,000,000 calls to [ CLASSIC 4x4 Matrix Transpose ] - 19657 ticks
Sub-Test 6 - 200,000,000 calls to [ SSE 4x4 Matrix Transpose ] - 8640 ticks // 2.28x faster
> Test1028 End <
RELEASE configuration
> Test1028 Start <
Sub-Test 5 - 200,000,000 calls to [ CLASSIC 4x4 Matrix Transpose ] - 18563 ticks
Sub-Test 6 - 200,000,000 calls to [ SSE 4x4 Matrix Transpose ] - 5843 ticks // 3.18x faster
> Test1028 End <
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I couldprovide you with the performance numbers for two Matrix Transpose algorithms, applied to a
1K x 1K matrix,that I've implemented for my current project. That is,
- a Classic ( Two-For-Loops /Non-Inplace)
and
- a Diagonal Based( Two-For-Loops / Inplace )
The Diagonal Based algorithm doesn't need a second outputmatrix andhas areduced number of
exchanges. It never "touches" values along the diagonal line from left-top corner to right-bottom corner of the matrix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please take a look at performance results.
Matrix size: 1,024 x 1,024
Classic Transpose - ( 128 transposes in 10.015 sec ) = 0.0782421875 sec
Diagonal Transpose - (128 transposes in 5.609 sec) = 0.0438203125 sec => ~1.79x faster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please take a look at results of another test.
If four __m128 variables:
...
__m128 row1 = { 0x0 };
__m128 row2 = { 0x0 };
__m128 row3 = { 0x0 };
__m128 row4 = { 0x0 };
...
initialized with characters as follows:
...
row1.m128_u8[ 0] = '0'; r1.m128_u8[ 1] = '1'; r1.m128_u8[ 2] = '2'; r1.m128_u8[ 3] = '3';
row1.m128_u8[ 4] = '4'; r1.m128_u8[ 5] = '5'; r1.m128_u8[ 6] = '6'; r1.m128_u8[ 7] = '7';
row1.m128_u8[ 8] = '8'; r1.m128_u8[ 9] = '9'; r1.m128_u8[10] = 'A'; r1.m128_u8[11] = 'B';
row1.m128_u8[12] = 'C'; r1.m128_u8[13] = 'D'; r1.m128_u8[14] = 'E'; r1.m128_u8[15] = 'F';
...
< the same for rows row2, row3 and row4 >
...
a Source Matrix ( as characters ) will look like:
0123456789ABCDEF
0123456789ABCDEF
0123456789ABCDEF
0123456789ABCDEF
and after a call to:
...
_MM_TRANSPOSE4_PS( row1, row2, row3, row4 );
...
a Transposed Matrix will look like:
0123012301230123
4567456745674567
89AB89AB89AB89AB
CDEFCDEFCDEFCDEF
This is wrong and there is nothing unusual here. The _MM_TRANSPOSE4_PS macro cannot be used for
transposing a 4x16 matrix of characters because it was designed to transpose a 4x4 matrix of floats.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would be interesting to see results of your R&D. Please provide some technical details and performance
numbersif you can.
Did you consideran Eklundh method of aMatrix Transpose?
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would nice to see a performance comparison of your SSE based algorithmwith a Classic algorithm.
The Eklundh method for a matrix transpose makes moreiterationsandmoreexchangescompared to a
Diagonal based algorithm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a comparisonof number of exchangesfordifferent algorithms. In case of an 8x8 matrix:
Classic-64 exchanges
Diagonal -28 exchanges
Eklundh - 48 exchanges
Take into account that forDiagonal and Eklundh algorithms an input matrix must be Square and both
algorithms areInplace (don't need an output matrix ).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A couple of days ago I tested '_MM_TRANSPOSE4_PS' macro vs. 'No-For-Loops' codes ( just exchanges )
for a 4x4 matrix of floats and it outperforms the macro in a couple of times. I'll post results for comparison later.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page