- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** Oversubscription of OpenMP threads for processing small data sets ***
Link Copied
87 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Abstract ]
Oversubscription of OpenMP threads is not a new processing technique and it is known for a long time.
Are there any benefits of processing small data sets, which fit into L3 or L2 cache lines, with a number of
OpenMP threads that is greater than a number of hardware threads of a multi-core CPU?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Computer Systems used for performance evaluations ]
** Dell Precision Mobile M4700 **
Intel Core i7-3840QM ( 2.80 GHz )
Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846
32GB RAM
320GB HDD
NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory )
Windows 7 Professional 64-bit SP1
Size of L3 Cache = 8MB ( shared between all cores for data & instructions )
Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions )
Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions )
Display resolution: 1366 x 768
** Dell Dimension 4400 **
Intel Pentium 4 ( 1.60 GHz / 1 core )
1GB RAM
Seagate 20GB HDD ( * )
Seagate 3TB HDD ( ** )
EVGA GeForce 6200 Video Card 512MB DDR2 AGP 8x Video Card
Windows XP Professional 32-bit SP3
Size of L2 Cache = 256KB
Size of L1 Cache = 8KB
Display resolution: 1440 x 990
( * ) Seagate Barracuda 20GB IDE Hard Disk Drive
ST320011A
3.5" 7200 Rpm 2MB Cache IDE Ultra ATA100 / ATA-iV/6
Average Rotational Latency : 4.17 ms
Average Seek Times Read : 9.0ms
Average Seek Times Write : 10.0ms
Maximum Internal Transfer Rate : 69.4MB/sec
Average External Transfer Rate : 100MB/sec ( Read and Write )
Maximum External Transfer Rate : 150MB/sec ( Read )
Note: Barracuda ATA IV Family
( ** ) Seagate Barracuda 3TB IDE Hard Disk Drive
ST3000DM001
3.5" 7200 Rpm 64MB Cache SATA III ( 6GB/sec )
Average Rotational Latency : 4.16 ms
Average Seek Times Read : 8.5ms
Average Seek Times Write : 9.5ms
Maximum Internal Transfer Rate : 268MB/sec
Average External Transfer Rate : 156MB/sec ( Read and Write )
Maximum External Transfer Rate : 210MB/sec ( Read )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ]
[ Compiler ]
/c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE121_300" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /W5 /nologo /Wp64 /Zi /Gd /TP /Qdiag-disable:2012 /Qdiag-disable:2013 /Qdiag-disable:2014 /Qdiag-disable:2015 /Qdiag-disable:2017 /Qdiag-disable:2021 /Qdiag-disable:2022 /Qdiag-disable:2304 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qparallel /Qstd=c++0x /Qrestrict /Qdiag-disable:111,673,10121
/Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2
[ Linker ]
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /MACHINE:X86 /qdiag-disable:111,673,10121
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ]
[ Compiler ]
/c /O3 /Ob1 /Oi /Ot /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE130_149" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"x64\Release\IccTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:always /Qipp /Qipp-link:dynamic /Qmkl
[ Linker ]
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"x64\Release/IccTestApp64.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\intel64" /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\compiler\lib\intel64" /MANIFEST /MANIFESTFILE:"x64\Release\IccTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib64.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:1000000000 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ List of Abbreviations ]
MM - Matrix Multiplication
C - Classic
LPS - Loop Processing Schema
1D - One Dimensional Input Matrices
2D - Two Dimensional Input Matrices
LB - Loop Blocking ( OT )
LBOT - Loop Blocking Optimization Technique
F - Fused ( OT )
T - Transposed ( OT )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 1 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 1 ]
[ Matrix Size: 128 x 128 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 128 x 128
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.00147 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 128x128 elements
Completed: 0.00170 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.00391 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 128x128 elements
Completed: 0.00391 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.00172 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00195 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.00439 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00416 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.00903 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 128x128 elements
Completed: 0.00269 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 2 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 2 ]
[ Matrix Size: 128 x 128 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 128 x 128
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.00147 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 128x128 elements
Completed: 0.00170 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.00391 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 128x128 elements
Completed: 0.00441 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.00170 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00195 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.00439 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00439 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.00903 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 128x128 elements
Completed: 0.00269 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 3 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 4 ]
[ Matrix Size: 128 x 128 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 128 x 128
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.00170 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 128x128 elements
Completed: 0.00172 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.00414 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 128x128 elements
Completed: 0.00391 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.00170 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00195 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.00441 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00463 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.00903 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 128x128 elements
Completed: 0.00269 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 4 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 8 ]
[ Matrix Size: 128 x 128 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 128 x 128
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.00170 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 128x128 elements
Completed: 0.00172 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.00391 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 128x128 elements
Completed: 0.00464 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.00195 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00220 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.00439 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00464 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.00928 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 128x128 elements
Completed: 0.00269 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 5 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 16 ]
[ Matrix Size: 128 x 128 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 128 x 128
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.00184 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 128x128 elements
Completed: 0.00195 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.00537 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 128x128 elements
Completed: 0.00415 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.00305 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00220 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.00488 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 128x128 elements
Completed: 0.00464 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.00916 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 128x128 elements
Completed: 0.00280 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 6 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 1 ]
[ Matrix Size: 256 x 256 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 256 x 256
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.01172 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 256x256 elements
Completed: 0.01172 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.03392 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 256x256 elements
Completed: 0.03589 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.01197 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.01245 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.05053 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.05055 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.07006 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 256x256 elements
Completed: 0.02514 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 7 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 2 ]
[ Matrix Size: 256 x 256 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 256 x 256
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.01172 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 256x256 elements
Completed: 0.01172 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.03492 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 256x256 elements
Completed: 0.03711 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.01197 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.01245 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.05127 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.05127 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.07008 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 256x256 elements
Completed: 0.02514 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 8 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 4 ]
[ Matrix Size: 256 x 256 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 256 x 256
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.01197 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 256x256 elements
Completed: 0.01195 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.03516 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 256x256 elements
Completed: 0.03736 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.01220 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.01270 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.05127 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.05127 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.07031 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 256x256 elements
Completed: 0.02564 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 9 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 8 ]
[ Matrix Size: 256 x 256 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 256 x 256
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.01270 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 256x256 elements
Completed: 0.01245 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.03614 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 256x256 elements
Completed: 0.03784 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.01270 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.01319 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.05248 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.05273 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.07006 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 256x256 elements
Completed: 0.02539 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 10 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 16 ]
[ Matrix Size: 256 x 256 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 256 x 256
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.01416 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 256x256 elements
Completed: 0.01416 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.03833 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 256x256 elements
Completed: 0.04102 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.01392 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.01416 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.05517 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 256x256 elements
Completed: 0.05519 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.07103 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 256x256 elements
Completed: 0.02613 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 11 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 1 ]
[ Matrix Size: 512 x 512 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 512 x 512
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.08741 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 512x512 elements
Completed: 0.08738 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.33644 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 512x512 elements
Completed: 0.34522 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.13475 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.14013 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.42044 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.41991 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.56444 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 512x512 elements
Completed: 0.30078 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 12 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 2 ]
[ Matrix Size: 512 x 512 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 512 x 512
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.13088 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 512x512 elements
Completed: 0.13084 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.33644 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 512x512 elements
Completed: 0.34472 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.13475 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.14016 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.42138 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.42091 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.56444 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 512x512 elements
Completed: 0.30028 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 13 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 4 ]
[ Matrix Size: 512 x 512 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 512 x 512
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.13037 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 512x512 elements
Completed: 0.13134 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.33303 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 512x512 elements
Completed: 0.34081 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.13575 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.14013 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.41797 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.41797 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.56541 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 512x512 elements
Completed: 0.30078 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 14 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 8 ]
[ Matrix Size: 512 x 512 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 512 x 512
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.13328 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 512x512 elements
Completed: 0.13428 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.33447 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 512x512 elements
Completed: 0.34325 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.13819 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.14306 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.42041 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.41994 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.56397 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 512x512 elements
Completed: 0.30078 secs
> Test1099 End <
Tests: Completed
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[ Test Case 15 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 16 ]
[ Matrix Size: 512 x 512 ]
Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release
Tests: Start
> Test1099 Start <
Matrix A, B and C Sizes : 512 x 512
Loop Processing Schema ( LPS ): IKJ
Loop Blocking Divider : 1
Sub-Test 1.1 - MxMultA1 - Classic 2D
LBOT size: N/A
Completed: 0.13475 secs
Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT
LBOT size: 512x512 elements
Completed: 0.13575 secs
Sub-Test 1.3 - MxMultA3 - Classic 2D Fused
LBOT size: N/A
Completed: 0.33056 secs
Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT
LBOT size: 512x512 elements
Completed: 0.33788 secs
Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed
LBOT size: N/A
Completed: 0.13869 secs
Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.14306 secs
Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed
LBOT size: N/A
Completed: 0.41897 secs
Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT
LBOT size: 512x512 elements
Completed: 0.41897 secs
Sub-Test 5.1 - MxMultD1 - Classic 1D
LBOT size: N/A
Completed: 0.56447 secs
Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT
LBOT size: 512x512 elements
Completed: 0.30078 secs
> Test1099 End <
Tests: Completed

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page