Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Error 0xc0000007b compiled console programs on AMD Ryzen Series

loe__babi
Beginner
1,218 Views

Hi

I tried to compile Dave Frank's TEST_FPU benchmark that now zipped into polyhedron benchmark ( https://www.fortran.uk/fortran-compiler-comparisons/the-polyhedron-solutions-benchmark-suite/ )

I tried some switches including /O2 and /O3 also /Qparallel

Both /O2 /O3 works fine.

The auto parallel to enhance the benchmark via multi threading get error 0xc000000b7 on AMD Ryzen workstation and notebook: Ryzen 3xxx

 and Ryzen 2xxxU. series. 

0 Kudos
9 Replies
Steve_Lionel
Honored Contributor III
1,218 Views

Here's what I wrote on StackOverflow where you also posted this:

The usual cause for this particular error is that Windows is unable to map all of the various fixed memory requirements, including static code and data, and the stack. Why auto-parallel should matter, I don't know. Do I understand correctly that installing the redistributables changed the behavior. I very much doubt the processor brand has any relation to this.

I note that you don't mention the redistributables here. Are they a factor? You also omitted the compile commands you used - you set large values for the stack reserve size, though not abnormally large.

0 Kudos
loe__babi
Beginner
1,218 Views

Yes, i was posting to stackoverflow because my post getting something 404, get abducted by alien or something. Now i know it being reviewed first, sorry for double posting. I cant edit here when it get reviewed first, so i cannot re-edit adding the redistributable story.

Basically it doesnt fired up, get error 0xc000007b when i use /Qparallel with Intel Fortran 19 update 1 at Win10 AMD computers, and i had the libiomp5md.dll in same directory just like Win 10 Intel that runs well.

the /O3 without /Qparallel was succeed to run at AMD, but not with /Qparallel.

The Qparallel cut 2 second faster than 20second total time without auto parallel in intel processors. Its run faster with parallel enhancement just like polyhedron suggesting switch: /O3 /Qparallel   /link /stack:6400000

Next, i install the redistributable at AMD computers, its get weirder. After clicked or run in command prompt, it suddenly closed. No windows error at all. just dissapear immediately.

So the AMD cant handle the Qparallel, so i tried /Qopenmp instead. Same story.  0xc000007b error.

Any suggest? manual parallel?

 

 

0 Kudos
loe__babi
Beginner
1,218 Views

Tried:

ifort  test_fpu2.f90 /O2 /Qopt-prefetch=2 /assume:buffered_io /Qip /Qopt-matmul /assume:nocc_omp /reentrancy:threaded /Qopenmp-offload- /extend_source:132  /Fd"vc150.pdb" /libs:qwin/threads /link /stack:64000000

Can be run at AMD, not /Qparallel but no performance increases...

0 Kudos
Steve_Lionel
Honored Contributor III
1,218 Views

Oh my goodness. That's certainly a lot more options than you showed at StackOverflow. The first thing I would suggest is that you DEFINITELY do not want /libs:qwin here - take that out. There's also no reason to be using /Qopenmp, /assume:nocc_omp or /extend_source:132. The /Fd is also inappropriate since you're not debugging.

I would start by taking off ALL of the options. If that works (which it probably will), add /O3. Then add /parallel. This source uses no OpenMP syntax so there's no point in adding OpenMP options. Note that /parallel uses OpenMP internally.

If you get that far, replace /O3 with /fast. This sets a number of options that should make the program faster, even on AMD.

0 Kudos
John_Campbell
New Contributor II
1,218 Views

My opinion is some of the benchmark examples are not very good examples of Fortran.

Test_FPU and Test_FPU2 appear to create very large local arrays which flood the stack, for no apparent benefit.
Other routines also have large automatic arrays, which can also be problematic.

If you add ",SAVE" to these large arrays, the program will be much easier to compile and link, eg:

REAL(RK8), SAVE :: pool(smallsize,smallsize,smallits), pool3(bigsize,bigsize) ! random numbers to invert
REAL(RK8), SAVE :: a(smallsize,smallsize), a3(bigsize,bigsize)     ! working matrices
 

As for your use of /Qparallel , this is not likely to help the Gauss or Crout solver, which require a sequential approach.

/Qparallel is good for suitable algorithms, but unfortunately many classical solutions are not. 

0 Kudos
loe__babi
Beginner
1,218 Views

Steve, the post #4 actually alternative number 4. It doesnt have /Qparallel, it just testing another switch.

meanwhile back to /Qparallel problem alternative number #3, we use /Qparallel  /O3 only with /libs:qwin and without, the problem is still there. the /fast create slightly faster speed than /O3 on intel systems. I will try it monday on amd systems.

It will superb faster if i removed the blas section on test_fpu.f90 and used the intel mkl library with /Qmkl:parallel on intel systems.

John, thank you for suggestion. i will try to change it. how about allocatable but still it need /stack switch for windows.

 

UPDATE:

ifort /fast only runs on intels. AMDs said please verify that your os and cpu support x87 to avx.

0 Kudos
Steve_Lionel
Honored Contributor III
1,218 Views

It is not true that ifort /fast runs only on Intel - at least not for the last three or four major versions. I should know as i wrote the processor-detection code used for /Qxhost, used by /fast.

0 Kudos
loe__babi
Beginner
1,218 Views

Problem solved. the 0xc000007b was error with /Qparallel is solved in Intel Parellel Studio XE 2020.

The problem exist on Parallel Studio 2019 Update 2.

Now it can ran on the latest AMD Ryzen.

Steve, the /fast enabled can be ran on AMD Ryzen, unless it was compiled on Ryzen host also with /Qxhost. If it compiled at Intel host and switch /fast enabled, it will be using /Qx mode that apps only run on intel.

Case close.

0 Kudos
Steve_Lionel
Honored Contributor III
1,218 Views

/QxHost (implied by /fast) requires that you run on the host system - that is, the one you compiled on. In early ifort versions, /fast could not be used with non-Intel CPUs but that changed quite a while ago (version 14 or 15, I think). If you run on a different CPU, even a different Intel CPU, the program may not run.

0 Kudos
Reply