- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After migrating my code from ifort15 to ifort17 I observe a substantial slowdown caused by the pow() intrinsic. The runtime profile shows __bwr_pow with almost 3x the sample count compared to the previous pow.L. Also the SVML version is notably slower. The other routines have not changed much.
Build and profiling were done on a Haswell running RH6. Relevant FP compile switches are:
-qopenmp -O3 -fp-model fast=1 -r8 -xAVX -fp-model fast=2 -no-prec-div -no-prec_sqrt -ftz -fast-transcendentals
Profile with ifort15 exe:
samples % symbol name 158163 12.9953 __kmp_hyper_barrier_release 118068 9.7009 bicgstab_solv_mp_sparse_matvec_ 92752 7.6209 pow.L 88174 7.2447 xschem_mod_mp_xschem_part2_continuity_energy_ 80151 6.5855 bicgstab_solv_mp_bicgstab_kahan_dp_ 57120 4.6932 intfr_IP_intfr_chan_ 39386 3.2361 heat_mp_heat_loop_a_ 31915 2.6223 __svml_pow4_l9 31106 2.5558 xschem_mod_mp_xschem_part1_momentum_ 29114 2.3921 post3d_
Profile with ifort17 exe:
samples % symbol name 240081 17.4939 __bwr_pow 132852 9.6805 _INTERNAL_25_______src_ ... __kmp_hyper_barrier_release 116221 8.4686 bicgstab_solv_mp_sparse_matvec_ 80016 5.8305 xschem_mod_mp_xschem_part2_continuity_energy_ 78456 5.7168 bicgstab_solv_mp_bicgstab_kahan_dp_ 57502 4.1900 intfr_IP_intfr_chan_ 43250 3.1515 __svml_pow4_br_e9 39836 2.9027 heat_mp_heat_loop_a_ 34180 2.4906 __kmp_yield 32767 2.3876 __bwr_floor 26206 1.9095 xschem_mod_mp_xschem_part1_momentum_ 21549 1.5702 post3d_
Any idea how this runtime regression can be resolved ?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It has turned out that this issue is caused by a mess of compile switches. If the options list has "-fp-model consistent" and somewhere after there is "-fp-model fast=2" then the conservative option still overrides the aggressive option. The last option does not win. Not sure if this is a bug or a feature.
The conservative option puts different math intrinsics versions in place, in this case __bwr_pow.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page