Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

ifx linking memory issue

cxp484
Beginner
1,603 Views

Hi,

We are using the latest IFX compiler to compile the Fire Dynamics Simulator (FDS) codebase (https://github.com/firemodels/fds). Occasionally, we encounter the following errors during the linking stage:

-------Error#1--------------------------

free(): corrupted unsorted chunks
73ifx: error #10105: ld: core dumped
74ifx: warning #10102: unknown signal(867702016)
75ifx: error #10106: Fatal error in ld, terminated by unknown
76make: *** [../makefile:242: impi_intel_linux] Error 1
 
-------Error#2--------------------------

 ifx: error #10105: ld: core dumped

74ifx: warning #10102: unknown signal(682815504)
75ifx: error #10106: Fatal error in ld, terminated by unknown
76make: *** [../makefile:242: impi_intel_linux] Error 1
 
 

The issue is sporadic and difficult to reproduce. If we rebuild after encountering the error, the build completes successfully without any modifications. This happens both in our local cluster (RedHat Linux) and GitHub actions (Ubuntu)

We are using a local Linux system with 500 GB of RAM, so it’s unlikely that memory limitations are causing the problem. You can find our build and compilation commands at the following links:

 
 
 

Has anyone encountered similar errors? Could this be a potential bug in the LTO (Link-Time Optimization) stage of the compiler?

 

Best regards,
CP

     

0 Kudos
8 Replies
cxp484
Beginner
1,487 Views

Additional information: We ran a script which build fds repository (https://github.com/firemodels/fds) 100 times and the failure message comes 11 times with  -ipo -O2 compiling (and linking) option. If we remove -ipo, the build passes 100 times without any error message. So, it probably indicate that the error in my earlier post is related to ipo (-flto) implementation. 

0 Kudos
Ron_Green
Moderator
1,482 Views

This one is an unusual error.  I see some hits with web searches for this error going back over 10 years for many languages, not just Fortran.  Are you using a VM?  Or what OS distro and version are you using?  and ld version? 

I'll ask our compiler driver team if anyone else has reported this issue.

 

One other question - does the script run the 100 builds serially or in parallel ( how many builds at once ?)

0 Kudos
Kevin_McGrattan
1,467 Views

I work with cxp484. I have submitted this issue to priority support but have had no response. We use RedHat 9.3 on a linux server. The compilation failure occurs both on our server and on GitHub Actions. We are using ifx  2025.0.4 20241205, but there error occurred with earlier versions of ifx.

 

$ hostnamectl
Static hostname: spark-login
Icon name: computer-server
Chassis: server 🖳
Machine ID: 88943d6e51ae4d928bbbbfec99611388
Boot ID: fd256dfeb68847448807fe1b4263b192
Operating System: Red Hat Enterprise Linux 9.3 (Plow)
CPE OS Name: cpe:/o:redhat:enterprise_linux:9::baseos
Kernel: Linux 5.14.0-503.15.1.el9_5.x86_64
Architecture: x86-64
Hardware Vendor: GIGABYTE
Hardware Model: R182-M80-00
Firmware Version: F25

0 Kudos
cxp484
Beginner
1,462 Views

In addition to Reply from Kevin: On our Linux server the 'ld -v' command output is 'GNU ld version 2.35.2-42.el9_3.1'. 

 

The 100 build script I mentioned before run one build after another, so sequentially. After each build we clean the obj and mod files and start afresh. 

0 Kudos
Mark_Lewy
Valued Contributor I
1,184 Views

We are also having similar issues with ld sometimes failing with "free(): corrupted unsorted chunks".

This started when we deployed a new build agent with Ubuntu 22.04 and OneAPI 2024.2.1.

The older agent with Ubuntu 22.04 and OneAPI 2024.0.2 is fine.

 

I also get intermittent build failures doing local builds on my Ubuntu 22.04 WSL instances, again with ifx 2024.2.1 20240711

 

In both cases, a rebuild usually succeeds; the project is using ifx and LTO.

 

0 Kudos
Mark_Lewy
Valued Contributor I
803 Views

I don't know if this helps @Ron_Green , but here is a backtrace from one of my recent link fails with ifx 2024.2:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140574586906432) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140574586906432) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140574586906432, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fda12573476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fda125597f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fda125ba677 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fda1270cb77 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:156
#6  0x00007fda125d1cfc in malloc_printerr (str=str@entry=0x7fda1270f838 "free(): corrupted unsorted chunks")
    at ./malloc/malloc.c:5664
#7  0x00007fda125d3f16 in _int_free (av=0x7fda1274bc80 <main_arena>, p=0x5569d5a086e0, have_lock=<optimized out>)
    at ./malloc/malloc.c:4630
#8  0x00007fda125d6453 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x00007fda10b1e461 in (anonymous namespace)::IRLinker::computeTypeMapping() ()
   from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
#10 0x00007fda10b1ad14 in (anonymous namespace)::IRLinker::run() ()
   from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
#11 0x00007fda10b1a607 in llvm::IRMover::move(std::__1::unique_ptr<llvm::Module, std::__1::default_delete<llvm::Module> >, llvm::ArrayRef<llvm::GlobalValue*>, llvm::unique_function<void (llvm::GlobalValue&, std::__1::function<void (llvm::GlobalValue&)>)>, bool) () from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
#12 0x00007fda10b0bf5c in llvm::lto::LTO::linkRegularLTO(llvm::lto::LTO::RegularLTOState::AddedModule, bool) ()
   from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
#13 0x00007fda10b0d069 in llvm::lto::LTO::runRegularLTO(std::__1::function<llvm::Expected<std::__1::unique_ptr<llvm::CachedFileStream, std::__1::default_delete<llvm::CachedFileStream> > > (unsigned int, llvm::Twine const&)>) ()
   from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
#14 0x00007fda10b0cc59 in llvm::lto::LTO::run(std::__1::function<llvm::Expected<std::__1::unique_ptr<llvm::CachedFileStream, std::__1::default_delete<llvm::CachedFileStream> > > (unsigned int, llvm::Twine const&)>, std::__1::function<llvm::Expected<std::__1::function<llvm::Expected<std::__1::unique_ptr<llvm::CachedFileStream, std::__1::default_delete<llvm::CachedFileStream> > > (unsigned int, llvm::Twine const&)> > (unsigned int, llvm::StringRef, llvm::Twine const&)>) ()
   from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x00007fda10646644 in allSymbolsReadHook() () from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
#16 0x00007fda10642f8c in all_symbols_read_hook() () from /opt/intel/oneapi/compiler/2024.2/lib/icx-lto.so
#17 0x00005569c9e0c05d in ?? ()
#18 0x00005569c9e14437 in ?? ()
#19 0x00007fda1255ad90 in __libc_start_call_main (main=main@entry=0x5569c9e13730, argc=argc@entry=155,
    argv=argv@entry=0x7ffdf5453f98) at ../sysdeps/nptl/libc_start_call_main.h:58
#20 0x00007fda1255ae40 in __libc_start_main_impl (main=0x5569c9e13730, argc=155, argv=0x7ffdf5453f98,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdf5453f88)
    at ../csu/libc-start.c:392
#21 0x00005569c9e13665 in ?? ()

 I can attach the core dump file, if needed.

0 Kudos
Diehl__Martin
Novice
994 Views

same here:

GNU ld (GNU Binutils for Ubuntu) 2.34

Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.4.0-208-generic
Architecture: x86-64

 

Intel oneAPI 2025.0. Linker fails (reproducibly)  if -ipo and/or -flto are set. Other options are -O3 -fp-model strict -xHost -align array64byte

0 Kudos
Ron_Green
Moderator
729 Views

I apologize for not getting back to this issue in a long time.

 

I downloaded the most recent FDS sources for Linux.  I used 

Build/impi_intel_linux/make_fds.sh

 

I started with the 2024.2 compiler.  Since this is intermittent, I devised a test.  I built the code to create the .o and .mod files with this compiler.  Then ran just the link step in a loop 20 times, figuring if the failure rate was 1 in 10, this 20 interations should give a good statistical sample.  Server was Ubuntu 22.04:

DISTRIB_CODENAME=jammy

DISTRIB_DESCRIPTION="Ubuntu 22.04.5 LTS"

 

It didn't take long, iteration 2 aborted with the above mentioned error.

 

Next test, removes all .o and .mod files.  Using 2025.1.1 redo the test.

20 iterations, no link errors.

 

To be forward thinking, removed all .o and .mod files.  Used a prerelease build of 2025.2.0 which is due to release sometime late june to mid july.  
20 iterations, no link errors.

 

A little about LTO/IPO - this is a function of LLVM.  Our Fortran compiler has nothing to do with this, other than to create the binaries with IPO information and that is done by another team in code generation, not the Fortran team.  Intel pulls LLVM from upstream frequently, about once a month or more.  That would include the IPO passes logic.  So since 2024.2 build date there has been roughly 12-20 pull downs.  Something in these obviously fixed the issue.  At least statistically from my study I can conclude this with reasonable sense of certainty.  One other note - this code takes FOREVER to link with IPO!  I watch it chewing up 100% of a 1 core for a very long time.  I am sure you see the same.  Again, that is outside my our (fortran/Intel) control.


2025.1.1 would be a good compiler to recommend for FDS.  Or if people can wait until late June or early July, 2025.2.0.  If you can switch compilers easily, use 2025.1.1 today and upgrade to 2025.2.0 when it releases.  2025.2.0 is looking to be a very good build with a number of performance enhancements and a lot bug fixes. 

Sorry for the delay in this report.

Reply