dot versus dot_product?

j0e · ‎12-17-2013

Since the intel compiler makes it easy to use the MKL library now, and so it is easy to use LAPACK routines, is there any pros/cons to using DDOT (LAPACK) versus the f95 intrinsic function DOT_PRODUCT?

For the code I'm working on, it doesn't matter, but I'm just curious.

thanks

-joe

Roman1 · ‎12-17-2013

DOT() from MKL works only with reals, however DOT_PRODUCT() works with real, complex and integer arrays.

TimP · ‎12-17-2013

MKL BLAS ?dot functions should include the ability to produce their own OpenMP threading. Even with OpenMP 4.0, this is a little more involved to write out in source code. Problems would need to be fairly large (size > 1000) for MKL to perform as well as dot_product.

j0e · ‎12-17-2013

good point. How about performance? has dot_product been optimized as one might expect for an intrinsic function?

Craig_Dedo · ‎12-17-2013

Since DOT_PRODUCT is an intrinsic function of Fortran 95, Fortran 2003, and Fortran 2008, it is always available in any Fortran compiler that supports any of the Fortran 95, Fortran 2003, or Fortran 2008 standards. Thus, it is guaranteed to be portable between different Fortran compilers, since almost all currently used Fortran compilers are at least at the level of Fortran 95. Although LAPACK is a very widely used library, it may not always be available on some platforms.

Portability between different compilers is important for almost all but the most personal applications. You, or anyone else using the source code you wrote may not always be using Intel Visual Fortran (IVF).

I'm not sure if the intrinsic function DOT_PRODUCT has been optimized in IVF. However, it is my experience that most developers place too much emphasis on run-time optimization and not nearly enough emphasis on things like correctness, robustness, clarity, and algorithmic efficiency. Unless your program is stressing the limits of the hardware, optimization is way down the priority list.

TimP · ‎12-18-2013

dot_product with appropriate optimization settings should out-perform MKL ?dot up to the point where automatic introduction of threading in MKL becomes useful. Blanket statements of course are worth little. When using strided operands in dot_product you may need to watch opt-report and tinker with directives.

OpenMP 4.0 and legacy ifort directives offer some choices for vector threaded implementation of dot products, but they don't work with dot_product intrinsic.

Apologies if a post I made 12 hours ago eventually makes it through moderation.

j0e · ‎01-23-2014

Thanks for everyone's input! Currently, I'm solving intermediate size problems (vector/matrices between 1000 and 5000), where the core code has been written with DOT_PRODUCT and MATMUL, or DDOT and DGEMV. The core code also always calls DGETRF and DGETRS (these reside in some third party code).

So far, with no real tinkering, I have found that the BLAS/LAPACK calls using MKL runs a little faster than when using DOT_PRODUCT and MATMUL. I have also noticed that I need to use /Qparallel in order for DOT_PRODUCT and MATMUL to generate multiple threads, as they don't do so by default.

I've also played around a bit on a machine that's running four Phi 5110P cards and using automatic offload for MKL. It's nice in the sense that it's easy to implement, but unfortunately, system size needs to be pretty large (~8000 dim, see http://software.intel.com/en-us/articles/intel-mkl-automatic-offload-enabled-functions-for-intel-xeon-phi-coprocessors) before automatic offloading even occurs with DGETRF. While working around that lower limit I did not get any performance benefits, not surprisingly.

-joe