Compiling reference BLAS and LAPACK with Intel 10.1.015

markdixon · ‎07-31-2008

For various reasons, I'm trying to build the reference implementations of BLAS and LAPACK (3.1.1) from the www.netlib.org website on a 32-bit x86 Linux box (CentOS 5.2). I'm doing this with a variety of compilers, such as Intel 10.1.015 and PGI 7.2-3.

I can happily build these packages, and pass their test suites, using PGI with optimisation turned on.

With the Intel compiler, the only way I can get them to pass their test suites is to use the "-O0" compiler option to turn all optimisation off!

This doesn't strike me as right: is the Intel compiler's floating point really that bad? Does anyone have any idea on what is going on?

markdixon · ‎08-01-2008

OK, I've made some progress.

The Intel compiler passes the BLAS tests as long as both BLAS *and* the test programs are compiled with either of the following flags:

-mp (or -fltconsistency)
-fp-model precise

I can live with the idea of needing a switch to enforce strict floating-point behaviour, but I still have the following puzzles:

1) Why does the switch need to be applied to both the library and the test program? I would have thought that it would only need to be applied to the library which does the grunt work.

2) Any ideas on which of the switches slows-down code the least?

Thanks,

Mark

TimP · ‎08-01-2008

You might have found some of the previous discussions useful, in both the linux and Windows Fortran sections.

Tests of numerical accuracy often require observance of parentheses. The best option to specify that is -assume protect_parens. The most likely place in the library where stricter compilation options would be needed would be in the functions which take the place of EPSILON, HUGE, TINY, and the like. You may also need to use the abrupt underflow switches (-ftz, and options which imply it) consistently. SSE and non-SSE code aren't consistent.

If you use SSE options, such as -xW or newer (or 64-bit compiler), -fp-model precise has less effect on performance than -mp. If you want -ftz along with -fp-model precise, you must specify it (-fp-model precise -ftz). -mp (or fltconsistency) invokes promotion of many single precision expressions to double, has unpredictable interaction with -ftz, will give you warnings in the newer compilers, and may not work for you with SSE options. Until -fp-model precise is changed so as to always imply -assume protect_parens, you should always set the latter option as well.

It's unlikely that your code requires adherence to Fortran standard on treatment of negative zeros (-assume minus0), as that was not standardized until f95.

As you are no doubt aware, the usual recommendation is to use currently maintained lapack/blas implementations, such as Intel MKL (included with ifort professional), Goto, or even ACML.

markdixon · ‎08-01-2008

Many thanks, Tim: that was extremely useful. I'll certainly take a look at the previous discussions.

Incidentally, some of the same issues crop up when compiling ATLAS BLAS/LAPACK, so this is of use even when looking at currently maintained implementations.

Mark