[bash]export OMP_NUM_THREADS=12 export OMP_NESTED=true export OMP_MAX_ACTIVE_LEVELS=4 export OMP_DYNAMIC=true export MKL_NUM_THREADS=12 export MKL_DYNAMIC=false...and here are the Fortran linker command line options
[/bash]
[bash]-L/cluster/intel/mkl/lib/intel64/ /cluster/intel/mkl/lib/intel64/libmkl_solver_lp64.a -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -openmp -lpthread[/bash]
Link Copied
[bash]export OMP_NUM_THREADS=4 export OMP_NESTED=true export OMP_DYNAMIC=true[/bash]Then I ran it on my four core desktop PC. At times "top -H" was giving:
[bash] 3008 dewhurst 20 0 435m 165m 2744 R 72 2.1 1:01.07 elk 3010 dewhurst 20 0 435m 165m 2744 R 26 2.1 0:42.09 elk 3011 dewhurst 20 0 435m 165m 2744 S 25 2.1 0:07.60 elk 3023 dewhurst 20 0 435m 165m 2744 S 23 2.1 0:18.94 elk 3038 dewhurst 20 0 435m 165m 2744 R 11 2.1 0:04.82 elk 3029 dewhurst 20 0 435m 165m 2744 R 9 2.1 0:03.80 elk 3036 dewhurst 20 0 435m 165m 2744 R 9 2.1 0:02.96 elk 3039 dewhurst 20 0 435m 165m 2744 R 9 2.1 0:04.82 elk 3031 dewhurst 20 0 435m 165m 2744 R 9 2.1 0:04.10 elk 3016 dewhurst 20 0 435m 165m 2744 R 7 2.1 0:03.88 elk 3030 dewhurst 20 0 435m 165m 2744 R 7 2.1 0:02.62 elk 3037 dewhurst 20 0 435m 165m 2744 R 7 2.1 0:03.66 elk 3035 dewhurst 20 0 435m 165m 2744 R 7 2.1 0:03.96 elk 3032 dewhurst 20 0 435m 165m 2744 R 6 2.1 0:04.64 elk 3040 dewhurst 20 0 435m 165m 2744 R 6 2.1 0:04.72 elk 3013 dewhurst 20 0 435m 165m 2744 R 5 2.1 0:04.36 elk 3033 dewhurst 20 0 435m 165m 2744 R 5 2.1 0:04.66 elk 3034 dewhurst 20 0 435m 165m 2744 R 5 2.1 0:04.62 elk 3020 dewhurst 20 0 435m 165m 2744 S 3 2.1 0:18.54 elk 3014 dewhurst 20 0 435m 165m 2744 S 2 2.1 0:04.32 elk 3017 dewhurst 20 0 435m 165m 2744 S 2 2.1 0:03.76 elk 3018 dewhurst 20 0 435m 165m 2744 S 2 2.1 0:04.86 elk 3019 dewhurst 20 0 435m 165m 2744 S 2 2.1 0:18.52 elk 3024 dewhurst 20 0 435m 165m 2744 S 2 2.1 0:12.82 elk[/bash]
[bash] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11435 dewhurst 20 0 365m 226m 2452 R 394 2.9 0:53.31 elk[/bash]In other words, one task running at 394 %CPU.
[bash]11527 dewhurst 20 0 587m 239m 2584 R 17 3.1 0:00.70 elk 11552 dewhurst 20 0 587m 239m 2584 R 17 3.1 0:02.74 elk 11505 dewhurst 20 0 587m 239m 2584 R 16 3.1 0:07.08 elk 11533 dewhurst 20 0 587m 239m 2584 R 16 3.1 0:02.74 elk 11540 dewhurst 20 0 587m 239m 2584 R 16 3.1 0:03.06 elk 11541 dewhurst 20 0 587m 239m 2584 R 16 3.1 0:02.88 elk 11542 dewhurst 20 0 587m 239m 2584 R 16 3.1 0:02.94 elk 11502 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:21.72 elk 11504 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:07.34 elk 11521 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:00.62 elk 11531 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:02.78 elk 11553 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:02.70 elk 11506 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:06.28 elk 11519 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:01.72 elk 11520 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:00.62 elk 11532 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:02.66 elk 11554 dewhurst 20 0 587m 239m 2584 R 15 3.1 0:02.74 elk 11518 dewhurst 20 0 587m 239m 2584 R 14 3.1 0:01.74 elk 11539 dewhurst 20 0 587m 239m 2584 R 8 3.1 0:01.36 elk 11523 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:02.50 elk 11530 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:00.24 elk 11537 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:01.36 elk 11538 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:01.34 elk 11545 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:01.36 elk 11546 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:01.30 elk 11547 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:01.28 elk 11548 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:01.26 elk 11551 dewhurst 20 0 587m 239m 2584 R 7 3.1 0:01.24 elk[/bash]
[bash] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11617 dewhurst 20 0 365m 193m 2452 R 100 2.5 0:22.22 elk 11621 dewhurst 20 0 365m 193m 2452 R 100 2.5 0:06.00 elk 11620 dewhurst 20 0 365m 193m 2452 R 98 2.5 0:07.74 elk 11619 dewhurst 20 0 365m 193m 2452 R 97 2.5 0:08.88 elk[/bash]
For more complete information about compiler optimizations, see our Optimization Notice.