Software Archive
Read-only legacy content
17061 Discussions

Oversubscription of OpenMP threads for processing small data sets

SergeyKostrov
Valued Contributor II
2,932 Views
*** Oversubscription of OpenMP threads for processing small data sets ***
0 Kudos
87 Replies
SergeyKostrov
Valued Contributor II
1,297 Views
[ Abstract ] Oversubscription of OpenMP threads is not a new processing technique and it is known for a long time. Are there any benefits of processing small data sets, which fit into L3 or L2 cache lines, with a number of OpenMP threads that is greater than a number of hardware threads of a multi-core CPU?
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Computer Systems used for performance evaluations ] ** Dell Precision Mobile M4700 ** Intel Core i7-3840QM ( 2.80 GHz ) Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846 32GB RAM 320GB HDD NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory ) Windows 7 Professional 64-bit SP1 Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions ) Display resolution: 1366 x 768 ** Dell Dimension 4400 ** Intel Pentium 4 ( 1.60 GHz / 1 core ) 1GB RAM Seagate 20GB HDD ( * ) Seagate 3TB HDD ( ** ) EVGA GeForce 6200 Video Card 512MB DDR2 AGP 8x Video Card Windows XP Professional 32-bit SP3 Size of L2 Cache = 256KB Size of L1 Cache = 8KB Display resolution: 1440 x 990 ( * ) Seagate Barracuda 20GB IDE Hard Disk Drive ST320011A 3.5" 7200 Rpm 2MB Cache IDE Ultra ATA100 / ATA-iV/6 Average Rotational Latency : 4.17 ms Average Seek Times Read : 9.0ms Average Seek Times Write : 10.0ms Maximum Internal Transfer Rate : 69.4MB/sec Average External Transfer Rate : 100MB/sec ( Read and Write ) Maximum External Transfer Rate : 150MB/sec ( Read ) Note: Barracuda ATA IV Family ( ** ) Seagate Barracuda 3TB IDE Hard Disk Drive ST3000DM001 3.5" 7200 Rpm 64MB Cache SATA III ( 6GB/sec ) Average Rotational Latency : 4.16 ms Average Seek Times Read : 8.5ms Average Seek Times Write : 9.5ms Maximum Internal Transfer Rate : 268MB/sec Average External Transfer Rate : 156MB/sec ( Read and Write ) Maximum External Transfer Rate : 210MB/sec ( Read )
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Intel C++ compiler v12.1.7 ( u371 ) 32-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Oy /Qipo /I "..\..\Include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE121_300" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"Release\IccTestApp.pch" /Fo"Release/" /W5 /nologo /Wp64 /Zi /Gd /TP /Qdiag-disable:2012 /Qdiag-disable:2013 /Qdiag-disable:2014 /Qdiag-disable:2015 /Qdiag-disable:2017 /Qdiag-disable:2021 /Qdiag-disable:2022 /Qdiag-disable:2304 /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qparallel /Qstd=c++0x /Qrestrict /Qdiag-disable:111,673,10121 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 /Wport /Qeffc++ /QxSSE2 /Qansi-alias /Qvec-report=0 /Qfma /Qunroll:8 /Qunroll-aggressive /Qopt-streaming-stores:always /Qopt-block-factor:128 /Qopt-mem-layout-trans:2 [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"Release/IccTestApp.exe" /INCREMENTAL:NO /nologo /MANIFEST /MANIFESTFILE:"Release\IccTestApp.exe.intermediate.manifest" /NODEFAULTLIB:"../../Bin/Release/ScaLib.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:268435456 /LARGEADDRESSAWARE /MACHINE:X86 /qdiag-disable:111,673,10121
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Intel C++ compiler v13.1.0 ( u149 ) 64-bit ] [ Compiler ] /c /O3 /Ob1 /Oi /Ot /Qipo /I "..\..\Include" /I "C:\WorkLib\ICC2013\Composer XE 2013\ipp\include" /D "WIN32" /D "_CONSOLE" /D "NDEBUG" /D "_WIN32_ICC" /D "INTEL_SUITE_VERSION=PE130_149" /D "_IPP_PARALLEL_DYNAMIC" /D "IPP_USE_CUSTOM" /D "_VC80_UPGRADE=0x0710" /D "_UNICODE" /D "UNICODE" /GF /MT /GS- /arch:AVX /fp:fast=2 /GR- /Yu"Stdphf.h" /Fp"x64\Release\IccTestApp64.pch" /Fo"x64/Release/" /Fd"x64/Release/" /W5 /nologo /Wp64 /Zi /TP /U "_WIN32_MSC" /U "_WINCE_MSC" /U "WIN32_PLATFORM_PSPC" /U "WIN32_PLATFORM_WFSP" /U "WIN32_PLATFORM_WM50" /U "_WIN32_MGW" /U "_WIN32_BCC" /U "_COS16_TCC" /U "_WIN32_WCC" /Qopenmp /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /Qansi-alias /Qdiag-disable:111,673,2012,2015,2960,10121 /Wport /Qeffc++ /QxAVX /Qansi-alias /Qvec-report=0 /Qfma /Qunroll /Qunroll-aggressive /Qopt-streaming-stores:always /Qipp /Qipp-link:dynamic /Qmkl [ Linker ] kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /OUT:"x64\Release/IccTestApp64.exe" /INCREMENTAL:NO /nologo /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\ipp\lib\intel64" /LIBPATH:"C:\WorkLib\ICC2013\Composer XE 2013\compiler\lib\intel64" /MANIFEST /MANIFESTFILE:"x64\Release\IccTestApp64.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /NODEFAULTLIB:"../../Bin/Release/ScaLib64.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /STACK:1000000000 /LARGEADDRESSAWARE /DYNAMICBASE /NXCOMPAT /MACHINE:X64 /qdiag-disable:111,673,2012,2015,2960,10121 /qdiag-sc-dir:"My Inspector XE Results - IccTestApp"
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ List of Abbreviations ] MM - Matrix Multiplication C - Classic LPS - Loop Processing Schema 1D - One Dimensional Input Matrices 2D - Two Dimensional Input Matrices LB - Loop Blocking ( OT ) LBOT - Loop Blocking Optimization Technique F - Fused ( OT ) T - Transposed ( OT )
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 1 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 1 ] [ Matrix Size: 128 x 128 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 128 x 128 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.00147 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 128x128 elements Completed: 0.00170 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.00391 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 128x128 elements Completed: 0.00391 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.00172 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 128x128 elements Completed: 0.00195 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.00439 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 128x128 elements Completed: 0.00416 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.00903 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 128x128 elements Completed: 0.00269 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 2 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 2 ] [ Matrix Size: 128 x 128 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 128 x 128 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.00147 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 128x128 elements Completed: 0.00170 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.00391 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 128x128 elements Completed: 0.00441 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.00170 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 128x128 elements Completed: 0.00195 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.00439 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 128x128 elements Completed: 0.00439 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.00903 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 128x128 elements Completed: 0.00269 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 3 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 4 ] [ Matrix Size: 128 x 128 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 128 x 128 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.00170 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 128x128 elements Completed: 0.00172 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.00414 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 128x128 elements Completed: 0.00391 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.00170 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 128x128 elements Completed: 0.00195 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.00441 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 128x128 elements Completed: 0.00463 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.00903 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 128x128 elements Completed: 0.00269 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 4 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 8 ] [ Matrix Size: 128 x 128 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 128 x 128 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.00170 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 128x128 elements Completed: 0.00172 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.00391 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 128x128 elements Completed: 0.00464 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.00195 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 128x128 elements Completed: 0.00220 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.00439 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 128x128 elements Completed: 0.00464 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.00928 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 128x128 elements Completed: 0.00269 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 5 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 16 ] [ Matrix Size: 128 x 128 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 128 x 128 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.00184 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 128x128 elements Completed: 0.00195 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.00537 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 128x128 elements Completed: 0.00415 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.00305 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 128x128 elements Completed: 0.00220 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.00488 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 128x128 elements Completed: 0.00464 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.00916 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 128x128 elements Completed: 0.00280 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 6 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 1 ] [ Matrix Size: 256 x 256 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 256 x 256 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.01172 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 256x256 elements Completed: 0.01172 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.03392 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 256x256 elements Completed: 0.03589 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.01197 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 256x256 elements Completed: 0.01245 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.05053 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 256x256 elements Completed: 0.05055 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.07006 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 256x256 elements Completed: 0.02514 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 7 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 2 ] [ Matrix Size: 256 x 256 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 256 x 256 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.01172 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 256x256 elements Completed: 0.01172 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.03492 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 256x256 elements Completed: 0.03711 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.01197 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 256x256 elements Completed: 0.01245 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.05127 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 256x256 elements Completed: 0.05127 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.07008 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 256x256 elements Completed: 0.02514 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 8 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 4 ] [ Matrix Size: 256 x 256 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 256 x 256 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.01197 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 256x256 elements Completed: 0.01195 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.03516 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 256x256 elements Completed: 0.03736 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.01220 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 256x256 elements Completed: 0.01270 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.05127 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 256x256 elements Completed: 0.05127 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.07031 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 256x256 elements Completed: 0.02564 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 9 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 8 ] [ Matrix Size: 256 x 256 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 256 x 256 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.01270 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 256x256 elements Completed: 0.01245 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.03614 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 256x256 elements Completed: 0.03784 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.01270 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 256x256 elements Completed: 0.01319 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.05248 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 256x256 elements Completed: 0.05273 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.07006 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 256x256 elements Completed: 0.02539 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 10 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 16 ] [ Matrix Size: 256 x 256 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 256 x 256 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.01416 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 256x256 elements Completed: 0.01416 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.03833 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 256x256 elements Completed: 0.04102 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.01392 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 256x256 elements Completed: 0.01416 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.05517 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 256x256 elements Completed: 0.05519 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.07103 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 256x256 elements Completed: 0.02613 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 11 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 1 ] [ Matrix Size: 512 x 512 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 512 x 512 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.08741 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 512x512 elements Completed: 0.08738 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.33644 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 512x512 elements Completed: 0.34522 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.13475 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 512x512 elements Completed: 0.14013 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.42044 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 512x512 elements Completed: 0.41991 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.56444 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 512x512 elements Completed: 0.30078 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 12 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 2 ] [ Matrix Size: 512 x 512 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 512 x 512 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.13088 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 512x512 elements Completed: 0.13084 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.33644 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 512x512 elements Completed: 0.34472 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.13475 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 512x512 elements Completed: 0.14016 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.42138 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 512x512 elements Completed: 0.42091 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.56444 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 512x512 elements Completed: 0.30028 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 13 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 4 ] [ Matrix Size: 512 x 512 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 512 x 512 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.13037 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 512x512 elements Completed: 0.13134 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.33303 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 512x512 elements Completed: 0.34081 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.13575 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 512x512 elements Completed: 0.14013 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.41797 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 512x512 elements Completed: 0.41797 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.56541 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 512x512 elements Completed: 0.30078 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,297 Views
[ Test Case 14 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 8 ] [ Matrix Size: 512 x 512 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 512 x 512 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.13328 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 512x512 elements Completed: 0.13428 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.33447 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 512x512 elements Completed: 0.34325 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.13819 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 512x512 elements Completed: 0.14306 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.42041 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 512x512 elements Completed: 0.41994 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.56397 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 512x512 elements Completed: 0.30078 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
1,225 Views
[ Test Case 15 - 32-bit / Cores: 1 / CPUs: 1 - Number of OpenMP threads - 16 ] [ Matrix Size: 512 x 512 ] Application - IccTestApp - WIN32_ICC ( 32-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 512 x 512 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.13475 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 512x512 elements Completed: 0.13575 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.33056 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 512x512 elements Completed: 0.33788 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 0.13869 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 512x512 elements Completed: 0.14306 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 0.41897 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 512x512 elements Completed: 0.41897 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.56447 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 512x512 elements Completed: 0.30078 secs > Test1099 End < Tests: Completed
0 Kudos
Reply