I tried to compile Dave Frank's TEST_FPU benchmark that now zipped into polyhedron benchmark ( https://www.fortran.uk/fortran-compiler-comparisons/the-polyhedron-solutions-benchmark-suite/ )
I tried some switches including /O2 and /O3 also /Qparallel
Both /O2 /O3 works fine.
The auto parallel to enhance the benchmark via multi threading get error 0xc000000b7 on AMD Ryzen workstation and notebook: Ryzen 3xxx
and Ryzen 2xxxU. series.
Here's what I wrote on StackOverflow where you also posted this:
The usual cause for this particular error is that Windows is unable to map all of the various fixed memory requirements, including static code and data, and the stack. Why auto-parallel should matter, I don't know. Do I understand correctly that installing the redistributables changed the behavior. I very much doubt the processor brand has any relation to this.
I note that you don't mention the redistributables here. Are they a factor? You also omitted the compile commands you used - you set large values for the stack reserve size, though not abnormally large.
Yes, i was posting to stackoverflow because my post getting something 404, get abducted by alien or something. Now i know it being reviewed first, sorry for double posting. I cant edit here when it get reviewed first, so i cannot re-edit adding the redistributable story.
Basically it doesnt fired up, get error 0xc000007b when i use /Qparallel with Intel Fortran 19 update 1 at Win10 AMD computers, and i had the libiomp5md.dll in same directory just like Win 10 Intel that runs well.
the /O3 without /Qparallel was succeed to run at AMD, but not with /Qparallel.
The Qparallel cut 2 second faster than 20second total time without auto parallel in intel processors. Its run faster with parallel enhancement just like polyhedron suggesting switch: /O3 /Qparallel /link /stack:6400000
Next, i install the redistributable at AMD computers, its get weirder. After clicked or run in command prompt, it suddenly closed. No windows error at all. just dissapear immediately.
So the AMD cant handle the Qparallel, so i tried /Qopenmp instead. Same story. 0xc000007b error.
Any suggest? manual parallel?
ifort test_fpu2.f90 /O2 /Qopt-prefetch=2 /assume:buffered_io /Qip /Qopt-matmul /assume:nocc_omp /reentrancy:threaded /Qopenmp-offload- /extend_source:132 /Fd"vc150.pdb" /libs:qwin/threads /link /stack:64000000
Can be run at AMD, not /Qparallel but no performance increases...
Oh my goodness. That's certainly a lot more options than you showed at StackOverflow. The first thing I would suggest is that you DEFINITELY do not want /libs:qwin here - take that out. There's also no reason to be using /Qopenmp, /assume:nocc_omp or /extend_source:132. The /Fd is also inappropriate since you're not debugging.
I would start by taking off ALL of the options. If that works (which it probably will), add /O3. Then add /parallel. This source uses no OpenMP syntax so there's no point in adding OpenMP options. Note that /parallel uses OpenMP internally.
If you get that far, replace /O3 with /fast. This sets a number of options that should make the program faster, even on AMD.
My opinion is some of the benchmark examples are not very good examples of Fortran.
Test_FPU and Test_FPU2 appear to create very large local arrays which flood the stack, for no apparent benefit.
Other routines also have large automatic arrays, which can also be problematic.
If you add ",SAVE" to these large arrays, the program will be much easier to compile and link, eg:
REAL(RK8), SAVE :: pool(smallsize,smallsize,smallits), pool3(bigsize,bigsize) ! random numbers to invert
REAL(RK8), SAVE :: a(smallsize,smallsize), a3(bigsize,bigsize) ! working matrices
As for your use of /Qparallel , this is not likely to help the Gauss or Crout solver, which require a sequential approach.
/Qparallel is good for suitable algorithms, but unfortunately many classical solutions are not.
Steve, the post #4 actually alternative number 4. It doesnt have /Qparallel, it just testing another switch.
meanwhile back to /Qparallel problem alternative number #3, we use /Qparallel /O3 only with /libs:qwin and without, the problem is still there. the /fast create slightly faster speed than /O3 on intel systems. I will try it monday on amd systems.
It will superb faster if i removed the blas section on test_fpu.f90 and used the intel mkl library with /Qmkl:parallel on intel systems.
John, thank you for suggestion. i will try to change it. how about allocatable but still it need /stack switch for windows.
ifort /fast only runs on intels. AMDs said please verify that your os and cpu support x87 to avx.
It is not true that ifort /fast runs only on Intel - at least not for the last three or four major versions. I should know as i wrote the processor-detection code used for /Qxhost, used by /fast.
Problem solved. the 0xc000007b was error with /Qparallel is solved in Intel Parellel Studio XE 2020.
The problem exist on Parallel Studio 2019 Update 2.
Now it can ran on the latest AMD Ryzen.
Steve, the /fast enabled can be ran on AMD Ryzen, unless it was compiled on Ryzen host also with /Qxhost. If it compiled at Intel host and switch /fast enabled, it will be using /Qx mode that apps only run on intel.
/QxHost (implied by /fast) requires that you run on the host system - that is, the one you compiled on. In early ifort versions, /fast could not be used with non-Intel CPUs but that changed quite a while ago (version 14 or 15, I think). If you run on a different CPU, even a different Intel CPU, the program may not run.