- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
First time here, not sure of the details needed in this message. Do not hesitate to make a suggestion.
I am compiling NEMO (ocean model) in a singularity container with MPI.
As far as I know, declarations and allocation are properly done. Compilation goes well. The same code compiles and run as intended on a different computer with an older IFORT compiler (not in a container).
In the singularity container, when I run the model on one or more CPU, I get an error when I try to access a specific index of a particular variable (one dimension array).
For example, if I write to screen the variable ( WRITE(numout, * ) (gdept_1d) ), I see the values but if I try to access at any particular indices of this array ( WRITE(numout, * ) (gdept_1d(1)) ), the model crash.
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
nemo.exe 000000000145D99A Unknown Unknown Unknown
libpthread-2.27.s 00007F8ED11C0980 Unknown Unknown Unknown
nemo.exe 00000000008A0C5B iom_mp_iom_init_ 190 iom.f90
nemo.exe 000000000045E4EE step_mp_stp_ 148 step.f90
nemo.exe 000000000041BC9C nemogcm_mp_nemo_g 145 nemogcm.f90
nemo.exe 000000000041BBE7 MAIN__ 18 nemo.f90
nemo.exe 000000000041BB82 Unknown Unknown Unknown
libc-2.27.so 00007F8ED0629BF7 __libc_start_main Unknown Unknown
nemo.exe 000000000041BA6A Unknown Unknown Unknown
Thank you if you have any suggestions to solve this kind of problem.
Simon
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel Communities.
Could you please let us know the OS details, Intel MPI Library & Intel Fortran Compiler version you are using?
Could you please provide us with the complete steps(the Github link of NEMO/steps you have followed to build the NEMO application) and also the steps to reproduce your issue.
>>Compilation goes well. The same code compiles and run as intended on a different computer with an older IFORT compiler
Could you please let us know the older IFORT and MPI versions you are using where you are able to run successfully?
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Varsha,
Here are the informations to reproduce the problem. I hope I did not forget anything. I think the simplest way is to give you access to the singularity .sif file. Here is the link to download it with a wget.
https://srwebpolr01.uqar.ca/polr/nemo_forum.sif
To get the directory:
singularity build --sandbox nemo_forum nemo_forum.sif
Use the directory:
singularity shell --writable nemo_forum
In singularity:
Singularity> cd /NEMO/NEMOGCM/CONFIG/MY_GYRE_new/EXP00/
There you have the result of running ./opa in this diretory. The problem come from the acces
to a specific element of the variable "gdept_1d". The line where it crash is 606 in file
"/NEMO/NEMOGCM/CONFIG/MY_GYRE_new/MY_SRC/istate.F90". If you erase the files produce by the executable you will
be able to run opa to get the same error.
Singularity> rm ocean.output nemo_status output.namelist.dyn mesh_mask.nc layout.dat
Singularity> ./opa
You can recompile the code by going in
Singularity> cd /NEMO/NEMOGCM/CONFIG/
Singularity> ./makenemo -n MY_GYRE_new -m mpiifort_linux
Do not hesitate to tell me if you need any other information, I really appreciate your help.
For the old system where it work:
[old system ~]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.2.164 Build 20150121
Copyright (C) 1985-2015 Intel Corporation. All rights reserved.here is our compilation options
fcm arch file:
%NCDF_INC -I/share/apps/netcdf/ifort/include
%NCDF_LIB -L/share/apps/netcdf/ifort/lib -lnetcdff -lnetcdf
%XIOS_HOME /share/apps/xios/1.0
%XIOS_INC -I%XIOS_HOME/inc
%XIOS_LIB -L%XIOS_HOME/lib -lxios
%FC ifort
%FCFLAGS -r8 -O3 -traceback -openmp %NCDF_INC
%FFLAGS -r8 -O3 -traceback -openmp %NCDF_INC
%LD ifort
%CICE_FPP ${CICECMC_FPP}
%FPPFLAGS -P -C -traditional %CICE_FPP
%LDFLAGS -L/share/apps/intel/impi/5.0.3.048/intel64/lib %XIOS_LIB %NCDF_INC %NCDF_LIB -lstdc++ -openmp -L/usr -L/usr/lib64
%AR ar
%ARFLAGS -r
%MK gmake
%USER_INC %XIOS_INC %NCDF_INC
%USER_LIB %XIOS_LIB %NCDF_LIB
%CPP cpp[old system : EXP00]# ldd opa
ldd
linux-vdso.so.1 => (0x00007fffb35ef000)
libnetcdff.so.6 => not found
libnetcdf.so.7 => not found
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003348000000)
libmpi.so.12 => /share/apps/intel/impi/5.0.3.048/intel64/lib/release_mt/libmpi.so.12 (0x00002ba44812a000)
libmpifort.so.12 => /share/apps/intel/impi/5.0.3.048/intel64/lib/libmpifort.so.12 (0x00002ba4488b6000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003346c00000)
librt.so.1 => /lib64/librt.so.1 (0x0000003347400000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003347000000)
libm.so.6 => /lib64/libm.so.6 (0x0000003346800000)
libiomp5.so => not found
libc.so.6 => /lib64/libc.so.6 (0x0000003346400000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003347c00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003346000000)
readelf -h
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x413a00
Start of program headers: 64 (bytes into file)
Start of section headers: 36329544 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 8
Size of section headers: 64 (bytes)
Number of section headers: 33
Section header string table index: 30
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing the details.
Could you please let us know the OS Details, and Intel MPI version in which your application got crashed?
And also, could you please confirm whether you are able to get the expected results without using the Singularity Container with Intel MPI?
Could you please provide as us with the Singularity Container file you are using to run the NEMO application to investigate more on your issue?
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. Could you please provide us with the details mentioned in the previous reply to investigate more on your issue?
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will no longer be monitored by Intel. If you need additional information, please post a new question.
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry about the delay, we were trying to answer your last questions.
Since we were able to run NEMO on a virtual machine, we tried to build a new singularity container instead of using your oneapi-hpckit container.
NEMO is compiling and running in this new “home-made” container.
We do not know for the moment exactly what is not working when we use your container.
Here is the recipe to build our container.
docker run -it rockylinux
yum update
yum -y install cmake pkgconfig
yum -y groupinstall "Development Tools"
which cmake pkg-config make gcc g++
tee > /tmp/oneAPI.repo << EOF
[oneAPI]
name=Intel® oneAPI repository
baseurl=https://yum.repos.intel.com/oneapi
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
EOF
cat /tmp/oneAPI.repo
mv /tmp/oneAPI.repo /etc/yum.repos.d
yum -y install intel-hpckit
. /opt/intel/oneapi/setvars.sh
wget -y https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.12/hdf5-1.12.2/src/hdf5-1.12.2.tar.gz
./configure --enable-hl --enable-parallel FC=mpiifort CXX=mpiicpc CC=mpiicc
make
make install
vim /etc/ld.so.conf.d/uqar.conf
ajouter /usr/local/lib
ldconfig
cp hdf5-1.12/hdf/include/* /usr/local/include
yum install libxml2-devel
wget https://github.com/Unidata/netcdf-c/archive/refs/tags/v4.9.0.tar.gz
./configure FC=mpiifort CXX=mpiicpc CC=mpiicc
wget https://github.com/Unidata/netcdf-fortran/archive/refs/tags/v4.5.4.tar.gz
./configure FC=mpiifort CXX=mpiicpc CC=mpiicc
yum install perl-URI
yum install perl-Text-Balanced
yum install libcurl-devel
Thanks for your time, hoping it could help someone else.
Simon
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page