Software Archive
Read-only legacy content
17061 Discussions

Performance Evaluation of MinGW v5.1.0 C++ compiler ( OpenMP Scalability )

SergeyKostrov
Valued Contributor II
543 Views
*** Performance Evaluation of MinGW v5.1.0 C++ compiler ( OpenMP Scalability ) *** I really like latest versions of MinGW C++ compiler ( versions 5.1.0 and 6.1.0 ) and I decided evaluate OpenMP ( version 4 ) Scalability.
0 Kudos
16 Replies
SergeyKostrov
Valued Contributor II
543 Views
[ Computer System used for performance evaluations ] ** Dell Precision Mobile M4700 ** Intel Core i7-3840QM ( 2.80 GHz ) Ivy Bridge / 4 cores / 8 logical CPUs / ark.intel.com/products/70846 32GB RAM 320GB HDD NVIDIA Quadro K1000M ( 192 CUDA cores / 2GB memory ) Windows 7 Professional 64-bit Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) Size of L1 Cache = 256KB ( 32KB per core for data & 32KB per core for instructions ) Display resolution: 1366 x 768
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ MinGW v5.1.0 C++ compiler options ] -DNDEBUG -O3 -mavx -mprfchw -mhard-float -ffast-math -fpeel-loops -ftree-vectorizer-verbose=0 -ftree-vectorize -fvect-cost-model -fomit-frame-pointer -fwhole-program -fopenmp -fopenmp-simd -falign-functions -falign-jumps -falign-labels -falign-loops -freorder-blocks -freorder-functions --param l1-cache-line-size=64 --param l1-cache-size=262144 --param l2-cache-size=1048576 -w -Xlinker --stack=1073741824
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Matrix Dimensions 1024 x 1024 ] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.24900 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.26500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.54600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.56200 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 9.39100 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.39200 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9.65700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 9.65600 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.31200 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.25000 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.14100 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.12500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.29600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.29600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 4.69600 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.71100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 4.86700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 4.88300 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.15600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.12500 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 1024 x 1024 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.06300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 1024x1024 elements Completed: 0.06200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 0.15600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 1024x1024 elements Completed: 0.15600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 2.41800 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 1024x1024 elements Completed: 2.44900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 2.85500 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 1024x1024 elements Completed: 2.73000 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.10900 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 1024x1024 elements Completed: 0.09400 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Matrix Dimensions 2048 x 2048] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 2048 x 2048 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 2.60600 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 2048x2048 elements Completed: 2.62100 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 5.19500 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 2048x2048 elements Completed: 5.30400 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 87.90601 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 2048x2048 elements Completed: 87.95300 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 98.71800 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 2048x2048 elements Completed: 98.70200 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 2.96400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 2048x2048 elements Completed: 2.59000 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 2048 x 2048 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 1.63800 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 2048x2048 elements Completed: 1.51300 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 3.47900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 2048x2048 elements Completed: 3.54100 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 44.33500 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 2048x2048 elements Completed: 44.28900 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 49.07700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 2048x2048 elements Completed: 49.09400 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 1.60600 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 2048x2048 elements Completed: 1.49800 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 2048 x 2048 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 0.78000 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 2048x2048 elements Completed: 0.76400 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 1.56000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 2048x2048 elements Completed: 1.62200 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 25.53700 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 2048x2048 elements Completed: 25.10100 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 28.01700 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 2048x2048 elements Completed: 30.49800 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 0.90500 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 2048x2048 elements Completed: 0.85800 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Matrix Dimensions 4096 x 4096 ] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 4096 x 4096 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 21.79300 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 4096x4096 elements Completed: 22.83900 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 44.60000 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 4096x4096 elements Completed: 44.86600 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 812.54706 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 4096x4096 elements Completed: 812.71906 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 1000.90204 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 4096x4096 elements Completed: 1000.23102 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 24.36700 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 4096x4096 elements Completed: 21.70000 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 4096 x 4096 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 14.80400 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 4096x4096 elements Completed: 14.92900 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 29.26600 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 4096x4096 elements Completed: 29.07900 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 423.32401 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 4096x4096 elements Completed: 423.10602 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 513.69501 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 4096x4096 elements Completed: 514.03900 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 14.27400 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 4096x4096 elements Completed: 14.71100 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 4096 x 4096 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 6.67700 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 4096x4096 elements Completed: 6.67700 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 12.97900 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 4096x4096 elements Completed: 12.85500 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 281.31702 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 4096x4096 elements Completed: 279.27301 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 324.51401 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 4096x4096 elements Completed: 323.20300 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 8.61100 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 4096x4096 elements Completed: 8.25200 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Matrix Dimensions 8192 x 8192 ] [ Number of OpenMP threads: 1 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 8192 x 8192 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 209.88400 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 8192x8192 elements Completed: 213.12801 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 355.44803 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 8192x8192 elements Completed: 356.69601 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 8047.92041 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 8192x8192 elements Completed: 8047.04736 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 9423.58301 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 8192x8192 elements Completed: 9422.35156 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 232.16101 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 8192x8192 elements Completed: 206.85701 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 2 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 8192 x 8192 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 113.56901 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 8192x8192 elements Completed: 114.16200 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 238.90001 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 8192x8192 elements Completed: 240.02301 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 4357.93506 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 8192x8192 elements Completed: 4355.20508 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 4844.87646 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 8192x8192 elements Completed: 4843.70605 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 122.60201 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 8192x8192 elements Completed: 112.74200 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
[ Number of OpenMP threads: 4 ] Application - MgwTestApp - WIN64_MGW ( 64-bit ) - Release Tests: Start > Test1099 Start < Matrix A, B and C Sizes : 8192 x 8192 Loop Processing Schema ( LPS ): IKJ Loop Blocking Divider : 1 Sub-Test 1.1 - MxMultA1 - Classic 2D LBOT size: N/A Completed: 57.45500 secs Sub-Test 1.2 - MxMultA2 - Classic 2D LBOT LBOT size: 8192x8192 elements Completed: 67.31500 secs Sub-Test 1.3 - MxMultA3 - Classic 2D Fused LBOT size: N/A Completed: 105.11301 secs Sub-Test 1.4 - MxMultA4 - Classic 2D Fused LBOT LBOT size: 8192x8192 elements Completed: 106.73601 secs Sub-Test 2.1 - MxMultB1 - Classic 2D Transposed LBOT size: N/A Completed: 2779.17407 secs Sub-Test 2.2 - MxMultB2 - Classic 2D Transposed LBOT LBOT size: 8192x8192 elements Completed: 2756.69312 secs Sub-Test 2.3 - MxMultB3 - Classic 2D Fused Transposed LBOT size: N/A Completed: 2904.25513 secs Sub-Test 2.4 - MxMultB4 - Classic 2D Fused Transposed LBOT LBOT size: 8192x8192 elements Completed: 2912.24316 secs Sub-Test 5.1 - MxMultD1 - Classic 1D LBOT size: N/A Completed: 81.30701 secs Sub-Test 5.2 - MxMultD2 - Classic 1D LBOT LBOT size: 8192x8192 elements Completed: 83.77300 secs > Test1099 End < Tests: Completed
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
All Tests above are for MinGW C++ compiler version 5.1.0. Tests for MinGW C++ compiler version 6.1.0 will be completed some time later.
0 Kudos
SergeyKostrov
Valued Contributor II
543 Views
Performance Evaluation of MinGW v6.1.0 C++ compiler ( OpenMP Scalability ) completed and will posted soon.
0 Kudos
Reply