- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to find the cause of a segmentation fault when running several of the SPEC Accel OpenMP benchmarks. Specifically, I have been working with the 552.ep (embarrassingly parallel) benchmark.
OS: x86_64 GNU/Linux
CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Accelerator: Intel Iris Xe MAX
Toolkit: Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)
Compilation commands:
$ ulimit -s unlimited
$ export IGC_EnableDPEmulation=1
$ export OverrideDefaultFP64Settings=1
$ source /opt/intel/oneapi/setvars.sh
$ icx -g -Wall -O3 -I. -fiopenmp -fopenmp-targets=spir64 *.c -o ep -lm
$ gdb ./ep
Here is the result of running gdb (run). I can install the missing debuginfos and update the question if that would help in finding a solution.
(gdb) run
Starting program: /552.pep/src/ep
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-127.el8.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
[New Thread 0x155550008700 (LWP 811436)]
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
warning: Loadable section ".note.gnu.property" outside of ELF segments
[New Thread 0x15554d9ac700 (LWP 811437)]
Reading from input file ep.input
NAS Parallel Benchmarks (NPB3.3-OPENMP-C) - EP Benchmark
Number of random numbers generated: 67108864
Thread 1 "ep" received signal SIGSEGV, Segmentation fault.
0x000015553f455bca in vISA::G4_SrcRegRegion::computeLeftBound() () from /lib64/libigc.so.1
Missing separate debuginfos, use: yum debuginfo-install intel-gmmlib-20.4.1-i482.el8.x86_64 intel-igc-core-1.0.6410-i509.el8.x86_64 intel-igc-opencl-1.0.6410-i509.el8.x86_64 libedit-3.1-23.20170329cvs.el8.x86_64 libgcc-8.3.1-5.1.el8.x86_64 libstdc++-8.3.1-5.1.el8.x86_64 libxml2-2.9.7-8.el8.x86_64 ncurses-libs-6.1-7.20180224.el8.x86_64 xz-libs-5.2.4-3.el8.x86_64
(gdb) bt
#0 0x000015553f455bca in vISA::G4_SrcRegRegion::computeLeftBound() () from /lib64/libigc.so.1
#1 0x000015553f34ef81 in vISA::IR_Builder::createSrcRegRegion(G4_SrcModifier, G4_RegAccess, vISA::G4_VarBase*, short, short, RegionDesc const*, G4_Type, G4_AccRegSel) () from /lib64/libigc.so.1
#2 0x000015553f360739 in VISAKernelImpl::CreateVISASrcOperand(_VISA_VectorOpnd*&, _VISA_GenVar*, VISA_Modifier, unsigned short, unsigned short, unsigned short, unsigned char, unsigned char) () from /lib64/libigc.so.1
#3 0x000015553f256b97 in IGC::CEncoder::GetSourceOperand(IGC::CVariable*, IGC::SModifier const&) () from /lib64/libigc.so.1
#4 0x000015553f25948f in IGC::CEncoder::DataMov(ISA_Opcode, IGC::CVariable*, IGC::CVariable*) () from /lib64/libigc.so.1
#5 0x000015553f297893 in IGC::EmitPass::UniformCopy(IGC::CVariable*, IGC::CVariable*&, IGC::CVariable*, bool) () from /lib64/libigc.so.1
#6 0x000015553f2b9a7b in IGC::EmitPass::emitVectorStore(llvm::StoreInst*, llvm::Value*, llvm::ConstantInt*) () from /lib64/libigc.so.1
#7 0x000015553f2c0a1c in IGC::EmitPass::runOnFunction(llvm::Function&) () from /lib64/libigc.so.1
#8 0x00001555404265ce in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /lib64/libigc.so.1
#9 0x0000155540426e41 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /lib64/libigc.so.1
#10 0x0000155540427298 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /lib64/libigc.so.1
#11 0x000015553f0a9ef9 in void IGC::CodeGen<IGC::OpenCLProgramContext>(IGC::OpenCLProgramContext*, llvm::MapVector<llvm::Function*, IGC::CShaderProgram*, llvm::DenseMap<llvm::Function*, unsigned int, llvm::DenseMapInfo<llvm::Function*>, llvm::detail::DenseMapPair<llvm::Function*, unsigned int> >, std::vector<std::pair<llvm::Function*, IGC::CShaderProgram*>, std::allocator<std::pair<llvm::Function*, IGC::CShaderProgram*> > > >&) () from /lib64/libigc.so.1
#12 0x000015553f079635 in IGC::CodeGen(IGC::OpenCLProgramContext*) () from /lib64/libigc.so.1
#13 0x000015553ef40e67 in TC::TranslateBuild(TC::STB_TranslateInputArgs const*, TC::STB_TranslateOutputArgs*, TC::TB_DATA_FORMAT, IGC::CPlatform const&, float) [clone .part.314] () from /lib64/libigc.so.1
#14 0x000015553eff261d in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate(unsigned long, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, CIF::Builtins::Buffer<1ul>*, unsigned int, void*) const () from /lib64/libigc.so.1
#15 0x00001555506df59a in NEO::CompilerInterface::build(NEO::Device const&, NEO::TranslationInput const&, NEO::TranslationOutput&) ()
from /lib64/libze_intel_gpu.so.1
#16 0x0000155550672a8e in L0::ModuleTranslationUnit::buildFromSpirV(char const*, unsigned int, char const*, char const*, _ze_module_constants_t const*) () from /lib64/libze_intel_gpu.so.1
#17 0x00001555506740ac in L0::ModuleImp::initialize(_ze_module_desc_t const*, NEO::Device*) () from /lib64/libze_intel_gpu.so.1
#18 0x0000155550674413 in L0::Module::create(L0::Device*, _ze_module_desc_t const*, L0::ModuleBuildLog*, L0::ModuleType) ()
from /lib64/libze_intel_gpu.so.1
#19 0x000015555065fa48 in L0::DeviceImp::createModule(_ze_module_desc_t const*, _ze_module_handle_t**, _ze_module_build_log_handle_t**, L0::ModuleType) () from /lib64/libze_intel_gpu.so.1
#20 0x000015555129748e in __tgt_rtl_load_binary () from /opt/intel/oneapi/compiler/2021.1.2/linux/lib/libomptarget.rtl.level0.so
--Type <RET> for more, q to quit, c to continue without paging--
#21 0x00001555554ec6ba in DeviceTy::load_binary(void*) () from /opt/intel/oneapi/compiler/2021.1.2/linux/lib/libomptarget.so
#22 0x00001555554f8159 in CheckDeviceAndCtors(long) () from /opt/intel/oneapi/compiler/2021.1.2/linux/lib/libomptarget.so
#23 0x00001555554eeb63 in __tgt_target_data_begin_mapper () from /opt/intel/oneapi/compiler/2021.1.2/linux/lib/libomptarget.so
#24 0x0000000400000000 in ?? ()
#25 0x0000000000000000 in ?? ()
I'm wondering if I'm having the same issue as this post, and if so, if there's a workaround without modifying the source code:
Also, I'm intending to use the level0 backend, but I see "OpenCLProgramContext" in the gdb output. Does this mean I may be using the OpenCL backend by mistake?
Thank you!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm wondering if I'm having the same issue as this post, and if so, if there's a workaround without modifying the source code:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
By default, the backend is set to level0. You can also set the backend explicitly using the environment variable LIBOMPTARGET_PLUGIN=OPENCL (or LEVEL0).
Can you try running on the OpenCL backend and let me know if it works?
Also, could you please attach the debug logs by setting the environment variable LIBOMPTARGET_DEBUG=2?
In the other forum link that you have mentioned, the segmentation fault occurs due to the exceeding memory limit on the device side. Please make sure that the memory allocation is within the specified limits. (You can get the device info from the "clinfo" command.)
Please attach a small reproducer code for your issue if possible.
Thanks,
Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This response didn't display originally so I duplicated below.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response! I've attached text files with the results from setting LIBOMPTARGET_DEBUG=2 and LIBOMPTARGET_PLUGIN=OPENCL and LIBOMPTARGET_PLUGIN=LEVEL0.
It seems like I'm hitting the same error with OpenCL (gdb still reports SIGSEGV in vISA::G4_SrcRegRegion::computeLeftBound()).
I'll see if I can create a small code example that reproduces the error. I'll also see if I can temporarily reduce the memory size in the original application and see if that changes the behavior. Here is the information provided by clinfo:
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Graphics [0x4905]
Global memory size 6811549696 (6.344GiB)
Max memory allocation 3405774848 (3.172GiB)
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 3405774848 (3.172GiB)
Global Memory cache size 1048576 (1024KiB)
Global Memory cache line size 64 bytes
Max constant buffer size 3405774848 (3.172GiB)
Max size of kernel argument 2048 (2KiB)
To see if I'm exceeding the device memory, I'm guessing I need to do the math on the variables in the OpenMP data clauses and compare?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, this may or may not be related, but I'm having trouble figuring out if my device supports double-precision. According to the results of clinfo, it seems like it could:
Platform Name Intel(R) OpenCL HD Graphics
Device Name Intel(R) Graphics [0x4905]
Device Version OpenCL 3.0 NEO
Device OpenCL C Version OpenCL C 1.2
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Device Extensions (no cl_khr_fp64)
Device Extensions with Version cl_khr_fp64 0x400000 (1.0.0)
However if I try and run my application above without using the "OverrideDefaultFP64Settings=1" and "IGC_EnableDPEmulation=1" flags (from here: https://community.intel.com/t5/Intel-DevCloud/Iris-Xe-MAX-node-is-missing-double-precision-support/td-p/1247876) I get the following errors:
error: double type is not supported on this platform
error: backend compiler failed build.
Libomptarget fatal error 1: failure of target construct while offloading is mandatory
Is there a way to somehow use the "Device Extensions with Version" for cl_khr_fp64 support?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I now realize I ran the above clinfo command after already setting the "OverrideDefaultFP64Settings=1" and "IGC_EnableDPEmulation=1" flags, which is why it then reported double-precision support. If the flags are not set it does not report support as expected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Double precision computation is disabled by default on the GPU. To enable the same, it is mandatory to set those two environment variable flags to 1. (OverrideDefaultFP64Settings=1 and IGC_EnableDPEmulation=1)
>>I'll see if I can create a small code example that reproduces the error. I'll also see if I can temporarily reduce the memory size in the original application and see if that changes the behavior.
Yes, a small reproducer code will help. Also, let me know if it works after reducing the memory.
Thanks,
Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Do you have any updates on this? Were you able to replicate the issue with a small reproducer code?
Let us know if you face any issues.
Thanks,
Rahul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have not heard back from you, so I will go ahead and close this thread from my end. Intel will no longer monitor this thread. Feel free to post a new query if you require further assistance from Intel.
Thanks,
Rahul
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page