> grep GHz /proc/cpuinfo model name : Intel(R) Xeon(R) CPU E31240 @ 3.30GHz > ifort --version ifort (IFORT) 14.0.2 20140120 Copyright (C) 1985-2014 Intel Corporation. All rights reserved. [Phi: this compiler uses avx instructions by default in parallel mode] > ifort -O3 -parallel -par_threshold90 -DARDIM=4000 matmul.F > OMP_NUM_THREADS=1 ./a.out Running with array sizes 4000 by 4000 dtime: 0.120 real time: 0.134 init dtime: 4.700 real time: 4.690 ikj dtime: 4.670 real time: 4.683 jki Sum of elements: 105312418747995.250 > OMP_NUM_THREADS=4 ./a.out Running with array sizes 4000 by 4000 dtime: 0.290 real time: 0.073 init dtime: 5.360 real time: 1.345 ikj dtime: 5.340 real time: 1.336 jki Sum of elements: 105312418747995.031 > ifort -O3 -parallel -par_threshold90 -DSIZEARGS -DARDIM=4000 matmul.F > OMP_NUM_THREADS=1 ./a.out Running with array sizes 4000 by 4000 dtime: 0.130 real time: 0.137 init dtime: 11.650 real time: 11.651 ikj dtime: 11.610 real time: 11.615 jki Sum of elements: 105312418747995.297 > OMP_NUM_THREADS=4 ./a.out Running with array sizes 4000 by 4000 dtime: 0.220 real time: 0.058 init dtime: 12.900 real time: 3.225 ikj dtime: 12.880 real time: 3.223 jki Sum of elements: 105312418747995.016 > ifort -O3 -mavx -DARDIM=4000 matmul.F > ./a.out Running with array sizes 4000 by 4000 dtime: 0.120 real time: 0.132 init dtime: 7.160 real time: 7.156 ikj dtime: 7.090 real time: 7.107 jki Sum of elements: 105312418747994.937 > ifort -O3 -mavx -DSIZEARGS -DARDIM=4000 matmul.F > ./a.out Running with array sizes 4000 by 4000 dtime: 0.120 real time: 0.128 init dtime: 8.710 real time: 8.710 ikj dtime: 35.370 real time: 35.400 jki Sum of elements: 105312418747995.000 > ifort --version ifort (IFORT) 13.1.2 20130514 Copyright (C) 1985-2013 Intel Corporation. All rights reserved. [Phi: this compiler uses avx instructions in parallel mode only via mkl] > ifort -O3 -parallel -par_threshold90 -mkl -DARDIM=4000 matmul.F > OMP_NUM_THREADS=1 ./a.out Running with array sizes 4000 by 4000 dtime: 0.130 real time: 0.134 init dtime: 5.440 real time: 5.442 ikj dtime: 4.540 real time: 4.548 jki Sum of elements: 105312418747995.250 > OMP_NUM_THREADS=4 ./a.out Running with array sizes 4000 by 4000 dtime: 0.220 real time: 0.059 init dtime: 6.730 real time: 1.877 ikj dtime: 5.170 real time: 1.292 jki Sum of elements: 105312418747995.031 > ifort -O3 -parallel -par_threshold90 -mkl -DSIZEARGS -DARDIM=4000 matmul.F > OMP_NUM_THREADS=1 ./a.out Running with array sizes 4000 by 4000 dtime: 0.130 real time: 0.133 init dtime: 5.660 real time: 5.658 ikj dtime: 4.660 real time: 4.663 jki Sum of elements: 105312418747995.250 > OMP_NUM_THREADS=4 ./a.out Running with array sizes 4000 by 4000 dtime: 0.210 real time: 0.055 init dtime: 7.250 real time: 2.304 ikj dtime: 5.290 real time: 1.323 jki Sum of elements: 105312418747995.031 > ifort -O3 -mavx -DARDIM=4000 matmul.F > ./a.out Running with array sizes 4000 by 4000 dtime: 0.120 real time: 0.132 init dtime: 7.230 real time: 7.234 ikj dtime: 7.270 real time: 7.269 jki Sum of elements: 105312418747994.937 > ifort -O3 -mavx -DSIZEARGS -DARDIM=4000 matmul.F > ./a.out Running with array sizes 4000 by 4000 dtime: 0.130 real time: 0.136 init dtime: 8.500 real time: 8.503 ikj dtime: 35.530 real time: 35.562 jki Sum of elements: 105312418747995.016 [Phi: in single threaded mode both compilers require -mavx flag]