I am running using intel Fortran compiler version 11.1 release 6 and the accomanying release of mkl. I have an array of around two million elements, which I want to raise to the ".23" power. When I use the "**" syntax my program is running in about half the time as when I use usiethe vdpow function from the VML library. Does anyone have ideas of how I can speed up the evaluation time in VML.
Pleaseensure you use vdPowx (vs. vdPow). Powx is intended for raising vector elements to a constant power (0.23 in your case). That should significantly reduce both memory footprint and pressure on the memory subsystem.
I think high pressure on the memory subsystem is the main reason why you see worse performance. There was good suggestion to segment input/output vectors in chunks to ensure results fit into the cache. (Using chunks of a few thousand elements should be fine).
Default math library accuracy in the compiler is equivalent to MKL VML_LA. If you use vdPowx and VML_LA plus vector blocking then I would expect the MKL VML performanceis at least on par with what you see in Fortran.
I ran some tests using gprof. When I use vml, a huge amount of processor time is used by the function powc_scalar, while when I use svml, that function is not run at all. (Running that function basically accounts for the difference in run time). If anyone has suggestions about what that function is and why it is taking so much time, I would appreciate it.
I now see where such adifference may come from. Having powc_scalar in the hotspot suggests that you probably evaluate the power functionon very large arguments. Is that the case?
Relatively recent optimizations in MKL VML and Fortran compiler SVML were to improve power function performance on typical arguments (not very large) at the cost of performance on large arguments. In earlier versions of MKL and Fortran compiler (including 11.1) very large arguments performed better but at cost of slower performance on reasonable arguments. So if you're using relatively new MKL and old Fortran compiler that may be an explanation.
Assuming that I'm correct with this hypoethtis,areal question is whether your test case arguments for power function represent some real life workload or it is just a synthetic test case. Can you please clarify a bit?
For archival purposes I am using an Core i7 860 processor.