Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28950 Discussions

Many regressions with 2024.2 version

MehdiChinoune
New Contributor I
3,072 Views

I tried to build CASTEP-24.1 (Commercial, free for Academic) with Intel Fortran Compiler.

 

with 2023.2.4 It gives those results

 

Test project /home/runner/work/castep/castep/build
      Start  1: build_castep
 1/18 Test  #1: build_castep .....................   Passed    0.68 sec
      Start  2: quick
 2/18 Test  #2: quick ............................   Passed   81.09 sec
      Start  3: spe
 3/18 Test  #3: spe ..............................   Passed  142.53 sec
      Start  4: bs
 4/18 Test  #4: bs ...............................   Passed   71.37 sec
      Start  5: phonon
 5/18 Test  #5: phonon ...........................   Passed  467.34 sec
      Start  6: geom
 6/18 Test  #6: geom .............................   Passed  252.06 sec
      Start  7: md
 7/18 Test  #7: md ...............................   Passed  158.84 sec
      Start  8: pair-pot
 8/18 Test  #8: pair-pot .........................   Passed    5.88 sec
      Start  9: magres
 9/18 Test  #9: magres ...........................   Passed   53.44 sec
      Start 10: tddft
10/18 Test #10: tddft ............................   Passed   92.81 sec
      Start 11: XC
11/18 Test #11: XC ...............................   Passed  130.84 sec
      Start 12: NLXC
12/18 Test #12: NLXC .............................   Passed  175.83 sec
      Start 13: misc
13/18 Test #13: misc .............................   Passed  118.33 sec
      Start 14: otf
14/18 Test #14: otf ..............................***Failed  911.56 sec
.........................................................F...................................................................................................................................................... [207/208]

      Start 15: pseudo
15/18 Test #15: pseudo ...........................   Passed   31.81 sec
      Start 16: d3
16/18 Test #16: d3 ...............................   Passed    5.76 sec
      Start 17: d4
17/18 Test #17: d4 ...............................   Passed    6.25 sec
      Start 18: solvation
18/18 Test #18: solvation ........................   Passed   73.78 sec

94% tests passed, 1 tests failed out of 18

Total Test time (real) = 2780.28 sec
Errors while running CTest

The following tests FAILED:
	 14 - otf (Failed)

 

As you can see tests pass 94% with only otf passing 207/208.

 

With 2024.0, It started to regress:

 

Test project /home/runner/work/castep/castep/build
      Start  1: build_castep
 1/18 Test  #1: build_castep .....................   Passed    0.71 sec
      Start  2: quick
 2/18 Test  #2: quick ............................***Failed   61.58 sec
Using executable: /home/runner/work/castep/castep/build/bin/castep.mpi.
Test id: 280724-1.
Benchmark: castep-23.1.castep-python-1.0.python-3.6.9.

Dispersion/HeNe_D3 - HeNe_D3_PBE-geom.param: Passed.
Dispersion/HeNe_D4 - HeNe_D4_PBE-geom.param: Passed.
Electronic/Si2-spe - Si2.param: Passed.
Excitations/GaAs-SOC-spectral - GaAs.param: Passed.
Geometry/H2-geom - H2_Geom.param: Passed.
Magres/CH4-shield-xal-mag - ch4.param: Passed.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-3.1.param: Passed.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-4.1.param: Passed.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-5.0.param: Passed.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-5.5.param: Passed.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-6.0.param: Passed.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-6.1.param: Passed.
Pair-Pot/PP-BUCK - pair-pot-buck.param: Passed.
Pair-Pot/PP-COUL - pair-pot-coul.param: Passed.
Pair-Pot/PP-DZ - pair-pot-dz.param: Passed.
Pair-Pot/PP-LJ - pair-pot-lj.param: Passed.
Pair-Pot/PP-MORS - pair-pot-mors.param: Passed.
Pair-Pot/PP-POL - pair-pot-pol.param: Passed.
Pair-Pot/PP-SHO - pair-pot-sho.param: Passed.
Pair-Pot/PP-SW - pair-pot-sw.param: Passed.
Pair-Pot/PP-WALL - pair-pot-walls-sho.param: Passed.
Pseudo/Realspace - Si8-realpot.param: Passed.
forrtl: severe (71): integer divide by zero
Image              PC                Routine            Line        Source             
libc.so.6          00007F3C9BE45320  Unknown               Unknown  Unknown
castep.mpi         0000000001761F15  set_mg_levels             112  dl_mg_grids.F90
castep.mpi         0000000001761FFD  set_mg_grids              149  dl_mg_grids.F90
castep.mpi         0000000001753644  dl_mg_init                340  dl_mg.F90
castep.mpi         0000000000C792F6  multigrid_dlmg_in         438  multigrid_dlmg.f90
castep.mpi         0000000000417CE8  castep_prepare_mo         887  castep.f90
castep.mpi         000000000041284C  Unknown               Unknown  Unknown
castep.mpi         00000000004125AD  Unknown               Unknown  Unknown
libc.so.6          00007F3C9BE2A1CA  Unknown               Unknown  Unknown
libc.so.6          00007F3C9BE2A28B  __libc_start_main     Unknown  Unknown
castep.mpi         00000000004124C5  Unknown               Unknown  Unknown
Exited with exit code 71
Solvation/H2_fixed - H2.fixed.param: **FAILED**.

All done. ERROR: only 22 out of 23 tests passed.
Failed test in:
	/home/runner/work/castep/castep/Test/Solvation/H2_fixed

      Start  3: spe
 3/18 Test  #3: spe ..............................   Passed  148.52 sec
      Start  4: bs
 4/18 Test  #4: bs ...............................   Passed   76.87 sec
      Start  5: phonon
 5/18 Test  #5: phonon ...........................   Passed  466.32 sec
      Start  6: geom
 6/18 Test  #6: geom .............................   Passed  252.56 sec
      Start  7: md
 7/18 Test  #7: md ...............................   Passed  156.34 sec
      Start  8: pair-pot
 8/18 Test  #8: pair-pot .........................   Passed    5.88 sec
      Start  9: magres
 9/18 Test  #9: magres ...........................   Passed   53.44 sec
      Start 10: tddft
10/18 Test #10: tddft ............................   Passed  100.81 sec
      Start 11: XC
11/18 Test #11: XC ...............................   Passed  139.84 sec
      Start 12: NLXC
12/18 Test #12: NLXC .............................   Passed  177.33 sec
      Start 13: misc
13/18 Test #13: misc .............................   Passed  118.33 sec
      Start 14: otf
14/18 Test #14: otf ..............................   Passed  891.55 sec
      Start 15: pseudo
15/18 Test #15: pseudo ...........................   Passed   31.80 sec
      Start 16: d3
16/18 Test #16: d3 ...............................   Passed    5.76 sec
      Start 17: d4
17/18 Test #17: d4 ...............................   Passed    6.26 sec
      Start 18: solvation
18/18 Test #18: solvation ........................***Failed   21.28 sec
forrtl: severe (71): integer divide by zero
Image              PC                Routine            Line        Source             
libc.so.6          00007F07A5A45320  Unknown               Unknown  Unknown
Errors while running CTest
castep.mpi         0000000001761F15  set_mg_levels             112  dl_mg_grids.F90
castep.mpi         0000000001761FFD  set_mg_grids              149  dl_mg_grids.F90
castep.mpi         0000000001753644  dl_mg_init                340  dl_mg.F90
castep.mpi         0000000000C792F6  multigrid_dlmg_in         438  multigrid_dlmg.f90
castep.mpi         0000000000417CE8  castep_prepare_mo         887  castep.f90
castep.mpi         000000000041284C  Unknown               Unknown  Unknown
castep.mpi         00000000004125AD  Unknown               Unknown  Unknown
libc.so.6          00007F07A5A2A1CA  Unknown               Unknown  Unknown
libc.so.6          00007F07A5A2A28B  __libc_start_main     Unknown  Unknown
castep.mpi         00000000004124C5  Unknown               Unknown  Unknown
Exited with exit code 71
Fforrtl: severe (71): integer divide by zero
Image              PC                Routine            Line        Source             
libc.so.6          00007F9C28C45320  Unknown               Unknown  Unknown
castep.mpi         0000000001761F15  set_mg_levels             112  dl_mg_grids.F90
castep.mpi         0000000001761FFD  set_mg_grids              149  dl_mg_grids.F90
castep.mpi         0000000001753644  dl_mg_init                340  dl_mg.F90
castep.mpi         0000000000C792F6  multigrid_dlmg_in         438  multigrid_dlmg.f90
castep.mpi         0000000000417CE8  castep_prepare_mo         887  castep.f90
castep.mpi         000000000041284C  Unknown               Unknown  Unknown
castep.mpi         00000000004125AD  Unknown               Unknown  Unknown
libc.so.6          00007F9C28C2A1CA  Unknown               Unknown  Unknown
libc.so.6          00007F9C28C2A28B  __libc_start_main     Unknown  Unknown
castep.mpi         00000000004124C5  Unknown               Unknown  Unknown
Exited with exit code 71
F [0/2]


89% tests passed, 2 tests failed out of 18

Total Test time (real) = 2715.19 sec

The following tests FAILED:
	  2 - quick (Failed)
	 18 - solvation (Failed)

 

The same results with 2024.1

 

But with 2024.2, It's a disaster:

 

Test project /home/runner/work/castep/castep/build
      Start  1: build_castep
 1/18 Test  #1: build_castep .....................   Passed    0.65 sec
      Start  2: quick
 2/18 Test  #2: quick ............................***Failed   51.07 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
Using executable: /home/runner/work/castep/castep/build/bin/castep.mpi.
Test id: 280724-7.
Benchmark: castep-23.1.castep-python-1.0.python-3.6.9.

Dispersion/HeNe_D3 - HeNe_D3_PBE-geom.param: **FAILED**.
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
Dispersion/HeNe_D4 - HeNe_D4_PBE-geom.param: **FAILED**.
Electronic/Si2-spe - Si2.param: **FAILED**.
Excitations/GaAs-SOC-spectral - GaAs.param: Passed.
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
Geometry/H2-geom - H2_Geom.param: **FAILED**.
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
Magres/CH4-shield-xal-mag - ch4.param: **FAILED**.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-3.1.param: **FAILED**.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-4.1.param: **FAILED**.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-5.0.param: **FAILED**.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-5.5.param: **FAILED**.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-6.0.param: **FAILED**.
Misc/Continuation-Oldversion - Si2-phonon-cont-from-6.1.param: **FAILED**.
Pair-Pot/PP-BUCK - pair-pot-buck.param: Passed.
Pair-Pot/PP-COUL - pair-pot-coul.param: Passed.
Pair-Pot/PP-DZ - pair-pot-dz.param: Passed.
Pair-Pot/PP-LJ - pair-pot-lj.param: Passed.
Pair-Pot/PP-MORS - pair-pot-mors.param: Passed.
Pair-Pot/PP-POL - pair-pot-pol.param: Passed.
Pair-Pot/PP-SHO - pair-pot-sho.param: Passed.
Pair-Pot/PP-SW - pair-pot-sw.param: Passed.
Pair-Pot/PP-WALL - pair-pot-walls-sho.param: Passed.
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
Pseudo/Realspace - Si8-realpot.param: **FAILED**.
forrtl: severe (71): integer divide by zero
Image              PC                Routine            Line        Source             
libc.so.6          00007F4847645320  Unknown               Unknown  Unknown
castep.mpi         000000000176FEC5  set_mg_levels             112  dl_mg_grids.F90
castep.mpi         000000000176FFB2  set_mg_grids              149  dl_mg_grids.F90
castep.mpi         000000000176118B  dl_mg_init                340  dl_mg.F90
castep.mpi         0000000000C7F676  multigrid_dlmg_in         438  multigrid_dlmg.f90
castep.mpi         0000000000417ACB  castep_prepare_mo         880  castep.f90
castep.mpi         000000000041274E  Unknown               Unknown  Unknown
castep.mpi         000000000041249D  Unknown               Unknown  Unknown
libc.so.6          00007F484762A1CA  Unknown               Unknown  Unknown
libc.so.6          00007F484762A28B  __libc_start_main     Unknown  Unknown
castep.mpi         00000000004123B5  Unknown               Unknown  Unknown
Exited with exit code 71
Solvation/H2_fixed - H2.fixed.param: **FAILED**.

All done. ERROR: only 10 out of 23 tests passed.
Failed tests in:
	/home/runner/work/castep/castep/Test/Dispersion/HeNe_D3
	/home/runner/work/castep/castep/Test/Dispersion/HeNe_D4
	/home/runner/work/castep/castep/Test/Electronic/Si2-spe
	/home/runner/work/castep/castep/Test/Geometry/H2-geom
	/home/runner/work/castep/castep/Test/Magres/CH4-shield-xal-mag
	/home/runner/work/castep/castep/Test/Misc/Continuation-Oldversion
	/home/runner/work/castep/castep/Test/Pseudo/Realspace
	/home/runner/work/castep/castep/Test/Solvation/H2_fixed

      Start  3: spe
 3/18 Test  #3: spe ..............................***Failed  168.02 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F..Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFF [2/21]

      Start  4: bs
 4/18 Test  #4: bs ...............................***Failed   91.86 sec
.Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [1/16]

      Start  5: phonon
 5/18 Test  #5: phonon ...........................***Failed  326.29 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F.FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F.FFFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [2/46]

      Start  6: geom
 6/18 Test  #6: geom .............................***Failed  281.56 sec
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [0/32]

      Start  7: md
 7/18 Test  #7: md ...............................***Failed  112.32 sec
.Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [1/13]

      Start  8: pair-pot
 8/18 Test  #8: pair-pot .........................   Passed    5.38 sec
      Start  9: magres
 9/18 Test  #9: magres ...........................***Failed   39.93 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFF [0/12]

      Start 10: tddft
10/18 Test #10: tddft ............................***Failed    8.28 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [0/12]

      Start 11: XC
11/18 Test #11: XC ...............................***Failed  132.34 sec
FFFFFFFFFFFFFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFFFFFFFFFFFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [0/45]

      Start 12: NLXC
12/18 Test #12: NLXC .............................***Failed  162.32 sec
FFFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFFFFFFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F.F...Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F.....Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F....... [16/33]

      Start 13: misc
13/18 Test #13: misc .............................***Failed  246.35 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
Fforrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
libc.so.6          00007F9746845320  Unknown               Unknown  Unknown
castep.mpi         00000000018A62E3  Unknown               Unknown  Unknown
castep.mpi         00000000016F779B  trace_branch_entr        1290  trace.f90
castep.mpi         00000000016F70D0  trace_section_ent        1201  trace.f90
castep.mpi         0000000001332AC7  wave_initialise_w        1755  wave.f90
castep.mpi         000000000053DB33  geom_xvec_to_mdl         4347  geometry.f90
castep.mpi         0000000000593570  geom_tpsd               12441  geometry.f90
castep.mpi         000000000051C6C0  geometry_optimise         742  geometry.f90
castep.mpi         000000000041A52D  castep_run_task          1314  castep.f90
castep.mpi         00000000004128EA  Unknown               Unknown  Unknown
castep.mpi         000000000041249D  Unknown               Unknown  Unknown
libc.so.6          00007F974682A1CA  Unknown               Unknown  Unknown
libc.so.6          00007F974682A28B  __libc_start_main     Unknown  Unknown
castep.mpi         00000000004123B5  Unknown               Unknown  Unknown
Exited with exit code 174
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFFFFFF...FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [3/34]

      Start 14: otf
14/18 Test #14: otf ..............................   Passed  747.51 sec
      Start 15: pseudo
15/18 Test #15: pseudo ...........................***Failed   37.30 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
FFAbort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F..Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [2/10]

      Start 16: d3
16/18 Test #16: d3 ...............................***Failed    1.25 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [0/1]

      Start 17: d4
17/18 Test #17: d4 ...............................***Failed    1.25 sec
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
Exited with exit code 1
F [0/1]

      Start 18: solvation
18/18 Test #18: solvation ........................***Failed   20.77 sec
forrtl: severe (71): integer divide by zero
Image              PC                Routine            Line        Source             
libc.so.6          00007F3939A45320  Unknown               Unknown  Unknown
castep.mpi         000000000176FEC5  set_mg_levels             112  dl_mg_grids.F90
castep.mpi         000000000176FFB2  set_mg_grids              149  dl_mg_grids.F90
castep.mpi         000000000176118B  dl_mg_init                340  dl_mg.F90
castep.mpi         0000000000C7F676  multigrid_dlmg_in         438  multigrid_dlmg.f90
castep.mpi         0000000000417ACB  castep_prepare_mo         880  castep.f90
castep.mpi         000000000041274E  Unknown               Unknown  Unknown
castep.mpi         000000000041249D  Unknown               Unknown  Unknown
libc.so.6          00007F3939A2A1CA  Unknown               Unknown  Unknown
libc.so.6          00007F3939A2A28B  __libc_start_main     Unknown  Unknown
castep.mpi         00000000004123B5  Unknown               Unknown  Unknown
Exited with exit code 71
Fforrtl: severe (71): integer divide by zero
Image              PC                Routine            Line        Source             
libc.so.6          00007FCE5F045320  Unknown               Unknown  Unknown
castep.mpi         000000000176FEC5  set_mg_levels             112  dl_mg_grids.F90
castep.mpi         000000000176FFB2  set_mg_grids              149  dl_mg_grids.F90
castep.mpi         000000000176118B  dl_mg_init                340  dl_mg.F90
castep.mpi         0000000000C7F676  multigrid_dlmg_in         438  multigrid_dlmg.f90
castep.mpi         0000000000417ACB  castep_prepare_mo         880  castep.f90
castep.mpi         000000000041274E  Unknown               Unknown  Unknown
castep.mpi         000000000041249D  Unknown               Unknown  Unknown
libc.so.6          00007FCE5F02A1CA  Unknown               Unknown  Unknown
libc.so.6          00007FCE5F02A28B  __libc_start_main     Unknown  Unknown
castep.mpi         00000000004123B5  Unknown               Unknown  Unknown
Exited with exit code 71
F [0/2]

Errors while running CTest

17% tests passed, 15 tests failed out of 18

Total Test time (real) = 2434.49 sec

The following tests FAILED:
	  2 - quick (Failed)
	  3 - spe (Failed)
	  4 - bs (Failed)
	  5 - phonon (Failed)
	  6 - geom (Failed)
	  7 - md (Failed)
	  9 - magres (Failed)
	 10 - tddft (Failed)
	 11 - XC (Failed)
	 12 - NLXC (Failed)
	 13 - misc (Failed)
	 15 - pseudo (Failed)
	 16 - d3 (Failed)
	 17 - d4 (Failed)
	 18 - solvation (Failed)

 

 

The project was configured with:

 

cmake . \
    -B build \
    -GNinja \
    -DCMAKE_Fortran_COMPILER=ifx \
    -DCMAKE_C_COMPILER=icx \
    -DMPI_Fortran_COMPILER=mpiifx \
    -DMPI_C_COMPILER=mpiicx \
    -DMATHLIBS=mkl \
    -DFFT=mkl

 

Installed packages:

 

sudo apt install --no-install-recommends -y \
    intel-oneapi-compiler-fortran-<version> \
    intel-oneapi-compiler-dpcpp-cpp-<version> \
    intel-oneapi-mkl-devel \
    intel-oneapi-mpi-devel \
    ninja-build

 

Remark: Tests pass 100% with gfortran.

 

0 Kudos
1 Solution
MehdiChinoune
New Contributor I
875 Views

The issue was fixed with 2025.0

View solution in original post

0 Kudos
19 Replies
TobiasK
Moderator
3,016 Views

@MehdiChinoune 

That's strange, usually older versions of IFX are not as stable as newer. Without source-code access I doubt we can do much to investigate this issue. Are you able to provide the sources?

0 Kudos
MehdiChinoune
New Contributor I
3,007 Views

I couldn't provide the source code.

I thought you have already the source code as CASTEP is one the most used software in Physics/Chemistry.

 

I think you could get it If you contact them.

0 Kudos
TobiasK
Moderator
2,851 Views

@MehdiChinoune


at the moment it looks like we do not have access to the same code as you do. We are eager to help, but we need some source code to work with.


0 Kudos
MehdiChinoune
New Contributor I
2,773 Views

After I contacted CASTEP developers they responded:

 


Dear Mehdi,
 
The Intel ifx (OneAPI) compiler team already has access to CASTEP, but unfortunately they are not very quick to fix either bugs or performance issues. I will draw their attention to your bug report though, and we'll see what happens.
 
All the best,


 

0 Kudos
Ron_Green
Moderator
2,801 Views

Perhaps you can share your compiler and link options?

And what MPI are you using? 

Have you tried options

-fp-model source  -prec-div

 

It's integer divide by zero.  You might read this and see if it helps.  This is something fairly new and different from older ifx and ifort

 

-fstrict-overflow (Linux) /Qstrict-overflow (Windows)

Integer overflow in arithmetic expressions is not permitted by the Fortran standard. However, some legacy programs rely on integer overflow. When overflow occurs, as much of the value as can fit into the result is assigned, and may also result in a change in the sign bit.

By default, the ifx compiler assumes integer arithmetic does not overflow. ifx defaults to -fstrict-overflow (Linux) or /Qstrict-overflow (Windows). When strict-overflow is enabled the compiler assumes that integer operations can never overflow, which allows for better optimizations. But, if overflow does occur, the resulting behavior is undefined and this behavior may not be compatible with the default ifort behavior.

ifort allowed integer overflow. Therefore, programs that rely on the ifort behavior for integer overflow should use the -fno-strict-overflow (Linux) or /Qstrict-overflow- (Windows) with ifx. The -fno-strict-overflow (linux) or /Qstrict-overflow- (Windows) for ifx allows to compiler to assume intergers may overflow and are allowed to overflow.  Allowing overflow in ifx may result in less optimized code.

 

0 Kudos
MehdiChinoune
New Contributor I
2,647 Views

The compiler points to line of code that could never have a divide by zero:

 

https://bitbucket.org/dlmgteam/dl_mg_code_public/src/564b8f3e4574d32596e345a8b507582267b227dd/src/dl_mg_grids.F90#lines-112

 

One CASTEP developer told me:


  Dear Both,
 
  I saw your conversation on the CASTEP mailing list regarding a spurious divide by zero error in DL_MG's
 
gridc(:) = gridc(:)/2
 
  I diagnosted the same problem in ONETEP (which also uses DL_MG) a few months back. It's a code generation bug in ifx. We couldn't find the time to distill an MWE to send to Intel, but Lucian Anton found a workaround, which is to drop the optimisation level to -O1 for this source file.
 
  Newer versions of DL_MG do this automatically via a tweak to the Makefile. If you have means of updating the DL_MG version in CASTEP, you will be able to work around the problem.

Applying his workaround only fixes the issue with 2024.1, 2024.2 still  a disaster as I mentione in the first post.

0 Kudos
MehdiChinoune
New Contributor I
2,707 Views

The error occurs at a code where there is no division, there is no way `divide by zero` could happen

https://bitbucket.org/dlmgteam/dl_mg_code_public/src/564b8f3e4574d32596e345a8b507582267b227dd/src/dl_mg_grids.F90#lines-112

 

I got this response from a CASTEP developer:


  I saw your conversation on the CASTEP mailing list regarding a spurious divide by zero error in DL_MG's
 
gridc(:) = gridc(:)/2

  I diagnosted the same problem in ONETEP (which also uses DL_MG) a few months back. It's a code generation bug in ifx. We couldn't find the time to distill an MWE to send to Intel, but Lucian Anton found a workaround, which is to drop the optimisation level to -O1 for this source file.
 
  Newer versions of DL_MG do this automatically via a tweak to the Makefile. If you have means of updating the DL_MG version in CASTEP, you will be able to work around the problem

Applying his suggested workaround only fixes issue with 2024.1, 2024.2 still a disaster.

0 Kudos
MehdiChinoune
New Contributor I
2,743 Views

My reply was deleted twice. I don't know why?

0 Kudos
TobiasK
Moderator
2,580 Views

@MehdiChinoune

I see three posts, maybe it was just a temporary hick up?


I by now can reproduce the divide by zero with a very simple reproducer, I will escalate it to the developers.

Thanks for bringing that to our attention!


Best

Tobias


TobiasK
Moderator
2,486 Views

@MehdiChinoune we implemented a fix for the divide by zero. The fix made it into the next compiler release. Again for the CASTEP source, I did not find anyone with the correct source code version available.

0 Kudos
Keith_R_1
Novice
2,114 Views

CASTEP developer here. 

 

I have been investigating this, and indeed both ifort and ifx- compiled executables crash early on in the initialisation in a regression from previous versions of each. I have managed to isolate the origin of the error to an array-valued assignment operation:

  den%charge=cmplx(tmp_data(:,1),0.0_dp,dp)

which appears to generate an unassigned variable, which may be detected in Valgrind, or with ifort -check=all.  I managed to boil down to a demo testcase, which you can find at

d_r_to_c.f90

With ifort 2021.13.1 

ifort -g  -traceback -check all d_r_to_c.f90
./a.out
forrtl: severe (194): Run-Time Check Failure. The variable 'var$225' is being used in 'd_r_to_c.f90(58,5)' without being defined
Image              PC                Routine            Line        Source              
a.out              000000000040D133  MAIN__                    387  d_r_to_c.f90
a.out              000000000040369D  Unknown               Unknown  Unknown
libc-2.31.so       00007F7C9055E24D  __libc_start_main     Unknown  Unknown
a.out              00000000004035CA  Unknown               Unknown  Unknown

or without the "-check all" under valgrind:

==11294== Use of uninitialised value of size 8 
==11294==    at 0x403EA3: density_mp_density_real_to_complex_ (d_r_to_c.f90:58)
==11294==    by 0x409CF6: MAIN__ (d_r_to_c.f90:387)
==11294==    by 0x40369C: main (in /home/kr/CASTEP/Compiler_BUGS/ifort-2024/a.out)
==11294==  
==11294== Use of uninitialised value of size 8
==11294==    at 0x403FC1: density_mp_density_real_to_complex_ (d_r_to_c.f90:66)
==11294==    by 0x409CF6: MAIN__ (d_r_to_c.f90:387)
==11294==    by 0x40369C: main (in /home/kr/CASTEP/Compiler_BUGS/ifort-2024/a.out

 

With ifx, -check=all does not detect an unassigned variable, but valgrind still does.   I'm reasonably confident that this is the origin of the trouble as the optimised executable of the full CASTEP crashes only a dozen lines later.  Do let me know if you are able to reproduce from my example.

 

0 Kudos
Ron_Green
Moderator
2,067 Views

I am investigating the reproducer.  So far, no conclusion.  but I do have a note:

I trust the LLVM uninitialized variable check more than Valgrind.  llvm keeps a shadow copy of all variables - think of it as similar to a cache table with a DIRTY bit.  Every memory access to every variable - if a WRITE, set dirty.  If READ, check dirty and if not set, uninitialized is flagged.  And it works for each and every element in an array - so if you touch all array elements but 1, and touch that 1, it gets flagged.  Valgrind, while useful, sometimes does not understand that Fortran RTL allocates and deallocates use a memory manager.  So I always treat Valgrind as an advisement, not an absolute.

Also, ifx and the LLVM optimizer, even at -O0, does super aggressive dead code elimination.  For this case, I added this:

program real_to_complex_test

  use density

 

  type(electron_density) :: myden

 

  call density_allocate(myden, den_type='R')

  myden%real_charge = 0.0_dp

  call density_real_to_complex(myden)

  print*, "myden%real_charge ", myden%real_charge

 

The Print will prevent LLVM from possibly removing the entire program, as you don't reuse anything you set in this program.  even at -O0.  Now, that said, in this case it does not help, as the Fortran Runtime lib function CMPLX aborts BEFORE llvm can catch any possible uninitialized use.  This to me is key to understanding the fault. 

 

But I think you are correct - something in the line

den%charge=cmplx(tmp_data(:,1),0.0_dp,dp)

is at fault.  The CMPLX Runtime lib function is pretty simple.  So I suspect the bug is in the array descriptor it is passed by our front-end parser.  I have not isolated this, but I do believe you may have found the source.  I need some time to walk the debugger through this to confirm before I throw it over to my front-end team.  Thank you for isolating such a clean and succinct reproducer.  I appreciate the time you put into this.

 

Ron

 

 

 

0 Kudos
Ron_Green
Moderator
2,010 Views

An interesting data point - I tested this example code with the NAG 7.2 compiler.  I had to change dp=2 to work with NAG.   Other than that, no change other than the PRINT statement I mentioned above.

 

Interestingly, the NAG compiled code is crashing at the same point as IFX.  Here is the gdb traceback

a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000000000042176f in real_to_complex_test_ (__NAGf90_main_args=0x0) at /nfs/site/home/rwgreen/d_r_to_c.f90:389
389	  print*, "myden%real_charge ", myden%real_charge 
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-28.el9_0.x86_64
(gdb) where
#0  0x000000000042176f in real_to_complex_test_ (__NAGf90_main_args=0x0) at /nfs/site/home/rwgreen/d_r_to_c.f90:389
#1  0x0000000000421672 in main (argc=1, argv=0x7fffffffdfa8) at /nfs/site/home/rwgreen/d_r_to_c.f90:380

  So I need to dig into the code more thoroughly.  I'll use NAG for the debug.

 

With the PRINT statement, gfortran 14 runs BUT it does not print any value for myden%real_charge!  Perhaps because the data is undefined??  

gfortran --version
GNU Fortran (GCC) 14.0.1 20240411 (Red Hat 14.0.1-0)
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rwgreen@orcsle153:~/quad/triage$ 
rwgreen@orcsle153:~/quad/triage$ gfortran --check=all d_r_to_c.f90 
rwgreen@orcsle153:~/quad/triage$ ./a.out
 myden%real_charge 
0 Kudos
Keith_R_1
Novice
1,996 Views

I suggest replacing

print*, "myden%real_charge ", myden%real_charge

with

print*, "myden%real_charge ", sum(myden%charge)

The output data is in myden%charge as the function of the routine is to allocate this, copy data from myden%real_charge and deallocate that.  The sum() makes the output easier to read!

 

 gfortran13 is the latest I have access to:

gfortran-13 --check=all d_r_to_c.f90 -o a.out-gf13
./a.out-gf13  
myden%real_charge                (0.0000000000000000,0.0000000000000000)

 Another data point is nvfortran

nvfortran -g -Mbounds -Mchkptr -o a.out-nv d_r_to_c.f90
./a.out-nv  
myden%real_charge   (0.000000000000000,0.000000000000000)
 

Unfortunately the machine I usually test NAG on appears to be down at the moment, so I can't check that.

 

One further remark I would like to make is that in the context for the full CASTEP code, the run proceeds slight;y beyond this point and then aborts or crashes in a manner which appears to indicate some kind of random memory corruption. This occurs with both ifort and ifx and seems consistent with your suggestion that possibly a bad array descriptor was being passed to the library routine.

 

Aha, the NAG machine appears to be back up:

$ nagfor 
NAG Fortran Compiler Release 7.2(Shin-Urayasu) Build 7200
...
$ nagfor -C=all d_r_to_c.f90  
NAG Fortran Compiler Release 7.2(Shin-Urayasu) Build 7200
Extension(F2023): d_r_to_c.f90, line 15: Comment extends past column 132
Extension(F2023): d_r_to_c.f90, line 17: Comment extends past column 132
Extension(F2023): d_r_to_c.f90, line 18: Comment extends past column 132
Warning: d_r_to_c.f90, line 80: Unused local variable I
Warning: d_r_to_c.f90, line 358: Unused local variable STATUS
Warning: d_r_to_c.f90, line 375: Unused local variable STATUS
[NAG Fortran Compiler normal termination, 6 warnings]
pc16:~/Tmp$ ./a.out
myden%real_charge  (0.0000000000000000,0.0000000000000

so no sign of trouble there either.

0 Kudos
Ron_Green
Moderator
1,945 Views

unforced error on my part when I added the print of myden%real_charge, since by the time the code reaches that print real_charge is undefined by design.

 

So looking into this more, yes, a regression happened in 2024.2.0.  2024.1.x and some older compilers work as expected.  The ifort 2021.x versions in that package I mean.

What I see is a regression in the reallocation of the LHS of this expression:

den%charge=cmplx(tmp_data(:,1),0.0_dp,dp)

for some reason the compiler thinks that the RHS and LHS do not match in shape.  They do match.  So no realloc should occur.  However, 2024.2.0 thinks the LHX should be realloced, and it messes up the bounds for the new target on the LHS.  I am trying to isolate whether the runtime team changed cmplx() ( I don't think we did ) OR if our heuristics for determining if realloc is needed (this is most probably the root cause).  What I see is after the realloc of den%charge the copy of the RHS to the LHS is trying to start with ZERO as the lower bound, ie den%charge(0:) is the target.  WHich is obviously just plain WRONG.

 

Workaround:  

-assume norealloc_lhs

 

this assumes that in array assignments in your code conform in shape.  From what I see they do.   This option will prevent the code from trying to realloc den%charge. 

Keith_R_1
Novice
1,892 Views

Interestingly we removed -assume norealloc_lhs from the CASTEP compile flags in 2022 with change log message:

Date:   Mon Nov 28 12:39:01 2022 +0000

   Dropped all flags which disabled allocate-on-assignment.
     These were included 10 years ago purely as safeguard against compiler bugs/slowdown for new feature.

I'll need to check the codebase to see if allocate on assignment is used in any newer code.

In the meantime a couple of further data. An accidental syntax error on my part

     den%charge%re(:)=tmp_data(:,1)

results in an ICE from both ifort and ifx.  The corrected version

     den%charge(:)%re=tmp_data(:,1)

also seems to also work around the issue while

     den%charge%re=tmp_data(:,1)

does not. The last one still gives an invalid write past the end of the array (shown in Vagrind) with no call to the CMPLX intrinsic in sight.

0 Kudos
Ron_Green
Moderator
1,844 Views

@Keith_R_1 the bug ID for this regression is CMPLRLLVM-61704

 

Good news is that it does not affect ifx 2025.0.0 which will be released in Q4 2024.  The bad news is that release will not contain ifort.

 

What I am working on is to backport a fix into 2024.2.x code branch.  Then when and if there is another Patch release for 2024.2 we can possible see a fix for ifx and maybe ifort.  An ifort fix is questionable, as that code is deprecated and soon to be removed and has an older code branch that ifx 2024.2.  We'll have to see if a fix is possible for ifort.  Since the code diverged between ifx and ifort it's unclear if a backport to ifort is possible.  I think it should be possible, but we'll have to see.

Workaround is to use option

-assume norealloc_lhs

for ifx 2024.2.0 and .1, and ifort 2021.13.0 and 2021.13.1.  Or use 2024.1.0 ifx or 2021.12.x ifort or older.  Or use the upcoming ifx 2025.0.0 when it is released.

The simplified reproducer is this:

module density
implicit none

contains

  subroutine lhs(dens)
  implicit none
    ! Arguments
    complex (kind=8), allocatable, intent(inout)::dens(:)
    ! locals
    real(kind=8), dimension(:,:), allocatable::tmp_data
 
    allocate( tmp_data(4,2) )
    tmp_data(:,1) = 1.0
    dens=cmplx(tmp_data(:,1), 1.0_8, 8 ) 
     
  end subroutine lhs
end module density

program reall
use density
implicit none
complex (kind=8), allocatable :: dens(:)

allocate( dens(4) )
call lhs(dens)
print*, "expected answer is  sum of dens  (4.00000000000000,4.00000000000000)"
print*, "we have "
print*, "sum of dens ", sum(dens(:) )
end program reall

 

0 Kudos
Keith_R_1
Novice
1,799 Views

Getting back to the original posting in this topic, I have confirmed that CASTEP-24.1 compiles and passes the test suite correctly with oneAPI 2024.2.1 (ifort 2021.13.1) provided you add the "-assume norealloc-lhs" compile flag.  This answers the original question at least, and provides a solution to the poster. 

I'll update the CASTEP release notes with this information too.

 

MehdiChinoune
New Contributor I
876 Views

The issue was fixed with 2025.0

0 Kudos
Reply