Community
cancel
Showing results for 
Search instead for 
Did you mean: 
SergeyKostrov
Valued Contributor II
87 Views

Performance Evaluation of MinGW v6.1.0 C++ compiler ( OpenMP Scalability )

*** Performance Evaluation of MinGW v6.1.0 C++ compiler ( OpenMP Scalability ) ***
0 Kudos
14 Replies
SergeyKostrov
Valued Contributor II
87 Views

[ Computer System used for performance evaluations ] ** Dell Precision Mobile M4700 ** Intel Core i7-3840QM ( 2.80 GHz ) Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846 32GB RAM 320GB HDD NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory ) Windows 7 Professional 64-bit Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions ) Display resolution: 1366 x 768
SergeyKostrov
Valued Contributor II
87 Views

[ MinGW v6.1.0 C++ compiler command line options ] -DNDEBUG -O3 -mavx -mprfchw -mhard-float -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -fopenmp-simd -falign-functions -falign-jumps -falign-labels -falign-loops -freorder-blocks -freorder-functions --param l1-cache-line-size=64 --param l1-cache-size=262144 --param l2-cache-size=1048576 -w -Xlinker --stack=1073741824
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 1024 x 1024 ] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.25000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.26500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.53000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.53100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.39100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.37600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.65600 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.64100 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.29600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.34300 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 1024 x 1024 ] [ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.12500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.14100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.28000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.28100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 4.69600 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.69600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 4.88300 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.86700 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.14100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.15600 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 1024 x 1024 ] [ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.07800 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.06200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.15600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.14100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 2.35600 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 2.88600 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 2.57400 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 2.90200 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.10900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.14000 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 2048 x 2048 ] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 2048 x 2048 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.68400 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 2048x2048 elements Completed: 2.66700 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.10100 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 2048x2048 elements Completed: 5.10100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 88.56100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 2048x2048 elements Completed: 88.56200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 102.00900 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 2048x2048 elements Completed: 102.07101 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 3.10400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 2048x2048 elements Completed: 3.05800 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 2048 x 2048 ] [ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 2048 x 2048 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.68500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 2048x2048 elements Completed: 1.73100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 3.57300 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 2048x2048 elements Completed: 3.57200 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 46.89300 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 2048x2048 elements Completed: 46.91000 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 50.48100 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 2048x2048 elements Completed: 50.42000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 1.62200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 2048x2048 elements Completed: 1.60700 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 2048 x 2048 ] [ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 2048 x 2048 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.78000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 2048x2048 elements Completed: 0.76400 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.54400 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 2048x2048 elements Completed: 1.54400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 29.17200 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 2048x2048 elements Completed: 28.93800 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 32.80700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 2048x2048 elements Completed: 32.88500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 1.03000 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 2048x2048 elements Completed: 0.92000 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 4096 x 4096 ] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 4096 x 4096 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 21.91800 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 4096x4096 elements Completed: 21.88700 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 44.11700 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 4096x4096 elements Completed: 44.02400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 844.13702 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 4096x4096 elements Completed: 844.07501 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1027.29810 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 4096x4096 elements Completed: 1027.62500 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 25.77100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 4096x4096 elements Completed: 24.86600 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 4096 x 4096 ] [ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 4096 x 4096 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 14.04000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 4096x4096 elements Completed: 14.78900 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 28.37700 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 4096x4096 elements Completed: 27.56500 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 430.07901 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 4096x4096 elements Completed: 431.88901 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 502.87003 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 4096x4096 elements Completed: 502.76001 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 14.27400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 4096x4096 elements Completed: 14.22700 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 4096 x 4096 ] [ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 4096 x 4096 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 6.24000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 4096x4096 elements Completed: 6.22400 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 12.69800 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 4096x4096 elements Completed: 12.82400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 312.65802 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 4096x4096 elements Completed: 315.32501 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 336.08902 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 4096x4096 elements Completed: 333.76401 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 9.70300 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 4096x4096 elements Completed: 9.36000 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 8192 x 8192 ] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 8192 x 8192 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 208.41701 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 8192x8192 elements Completed: 208.35501 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 352.32800 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 8192x8192 elements Completed: 352.21902 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 8496.01855 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 8192x8192 elements Completed: 8495.95508 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 8963.55273 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 8192x8192 elements Completed: 8964.23828 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 242.39500 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 8192x8192 elements Completed: 230.89801 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 8192 x 8192 ] [ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 8192 x 8192 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 114.75500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 8192x8192 elements Completed: 114.13000 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 238.68102 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 8192x8192 elements Completed: 238.10400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 4389.88428 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 8192x8192 elements Completed: 4391.36621 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 4776.03320 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 8192x8192 elements Completed: 4763.30322 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 124.28600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 8192x8192 elements Completed: 119.43401 secs > Test1099 End < Tests: Completed
SergeyKostrov
Valued Contributor II
87 Views

[ Matrix Dimensions 8192 x 8192 ] [ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 8192 x 8192 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 55.88000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 8192x8192 elements Completed: 56.81500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 105.02000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 8192x8192 elements Completed: 107.12601 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 2753.40210 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 8192x8192 elements Completed: 2728.22314 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 2736.28809 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 8192x8192 elements Completed: 2733.94922 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 65.91000 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 8192x8192 elements Completed: 64.41200 secs > Test1099 End < Tests: Completed
Reply