With IVF Compiler 10.1, A same program performs on IA32 architecture (Intel Pentium D CPU/ Ms Windows XP Home Edition) and Intel 64 architecture (Dual-Core AMD Opteron Processor 2214 / Ms Windows Server 2003 Enterprise x64 Edition SP2) respectively. Both projects are optimized with '/O3 /Og /QaxN /QxN /Qparallel' etc. Few loops in program main are auto-parallelized or vectorized.
Then, I add some '!DEC$ PARALLEL' directives before the appropriate loops, however, no otherinformation of 'auto-parallelized' appears, and the running efficiency does not improved in fact. Why the directive does not work?
Moreover, the dual-core or quad-core processors above seem not work, while the auto-parallelized program runs almost with only single core. Could the auto-parallelization fully use all cores of CPU? Or my settings have any mistakes? What is the best setting about above two systems?
Thanks a lot.
According to your suggestion, after I have reducing the -Qpar-threshold value to 0, and increase -Qpar-report and -Qvec-report value to 3 and 5, respectively, the rich compilers comments show. There are many remarks: loop was not vectorized: unsupported loop structure. And one of the loops is as follow,
where, the Nx is a integer constant, dx is a real*8 constant, and x is a real*8 array. Why this kind of loop can not vectorized? Other remarks: loop was not vectorized: unsupported data type. Which kinds of data could vectorized?
There is some introduction of IVF compiler 10 about HPO (High Performance, Parallel Optimizer), which combines automatic vectorization, automatic parallelization and loop transformations into a single pass which is faster, more effective and more reliable than prior discrete phases. How could I use it? It seems not to be refered much in IVF Compiler Documentation...
About the final point in my former post,
'Moreover, the dual-core or quad-core processors above seem not work, while the auto-parallelized program runs almost with only single core. Could the auto-parallelization fully use all cores of CPU?... '.
Could all the cores be used synchronously when a parallelized program running,where theCPUutilization is not 25% or 50% but 100%?How to deal with it?