LAPACK form netlibs.org is slower in intel comparing to CVF

arypramudito · ‎05-27-2004

Hi

Since i dont have imsl5 in intel fortran, i compiled lapack as math library. I use intel and CVF in same computer, and later i check out the speed of BLAS1 and the result is intel Fortran 8 is slower than CVF6.6 even in 50% MIPS.

In CVF i use optimize:3 (optimize4 or beyond goes to error in verification [too much optimizing ???] )

in IF8 i cant use /Qipo because it failed in linking from Lapack library builded to console exe, why?

TimP · ‎05-27-2004

We can't read your mind, as to which CPU you are using, which options, which compiler versions, which tests you are using to compare.

Generally, the emphasis in ifort 8.0 is on optimization for P4 and later (/QxW, /QxN, /QxP). The loops of interest to you should vectorize. With recent versions of 8.0, which allocate arrays on 16-byte boundaries when you permit it, this should give good performance for loop lengths beyond 50 or so.

ipo isn't very relevant for BLAS, unless, possibly,you are testing with short arrays and want to in-line the BLAS functions into your test driver. In that case, there would have been no reason to use CVF /optimize:5, as that would optimize for longer loops.

I don't think anyone would buy IMSL for BLAS, with so many public BLAS versions available, and MKL optimized for IA processors.

g_f_thomas · ‎05-28-2004

"We can't read your mind, as to which CPU you are using, which options, which compiler versions, which tests you are using to compare."

What a mean-mouthed comment, so typical of tcprince! Nobody is impressed, you don't help anyone, least of all yourself, but whose's counting?.

It looks like IMSL 5 gives you all of MKL, so don't waste anything, especially money, on the latter.

HTH,
Gerry T.

arypramudito · ‎05-28-2004

Well TC prince i want to have a function IMSL like for matrix subroutine, since i dont have IMSL5 for IFORT, but i do for CVF.
Thats why i try to compile LAPACK library. I compiled library using CVF and IFORT for testing the speed to.

My processor was an AMD, but i will go for a bench on my office comp that has P4.

For CVF i using same "make" as i download from netlibs but i edit option build: df -optimize:2 as df -optimize:5 and processor option is Pentium3 for building a static library.
Since LAPACK gave a demo testing and bench by creating the exe files, i build the exe and ran the test. The testing showed that i have error on verification.
I did compiling static library and demo exe again by using optimize:4 and optimize:3, the result is optimize:4 still gave error, and optimize:3 give no error.

For IFORT i change the make "make" with ifort /QaxW /Qx /Qipo
resulting error when reasembly lib to create exe file
so i change to ifort /QaxW /Qx
It was succesfull and i try to bench the exe bench resulting CVF compiled is faster almost 50%.

Question :
For CVF; To much optimize will kill you ?
For IFORT 8.0, Why the speed difference is quite big?

arypramudito · ‎05-28-2004

Sorry mistypes

/Qx should be /Ox

CVF is 6.5 and 6.6B
IFORT vers is 8 standard

additional option is /architecture:pn3 /tune:pn3 for both build options CVF and IFORT.

g_f_thomas · ‎05-28-2004

tcprince:

"We can't read your mind, as to which CPU you are using, which options, which compiler versions, which tests you are using to compare."

gft to Steve Lionel re tcprince :

Is this being a representative of Intel?, it's not clear, and in any event, Intel ought not to condone his overt contempt for forum users, a behavior you tacitely endorse. His contribution to the forum is of questionable
value and his nonparticipation wouldn't be missed.

--
Gerry T.

TimP · ‎05-28-2004

CVF has an option /fltconsistency which removes some optimizations which may be troublesome, and may allow you to remove /optimize. /optimize:5 could unroll loops too much, particularly if they are short, or have already been unrolled in the source code, as you may see in the code you have.

The closest option in ifort 8.0 to :pn3 is /QxK /Qprefetch. Unfortunately, the libraries invoked by /QxKmay fail when run on AMD. This should be corrected in the next update. That should make a big improvement on your CPU. /QaxW should show you the improvement on your P4, but when run on a P3 compatible machine, it may not do well, as you have seen. You could try adding the options /Qprefetch/G6, it may make some improvement for the older AMD. There may be tests in lapack which work only with /Op, which has some things in common with the CVF /fltconsistency.

arypramudito · ‎06-04-2004

I have tried ifort /G6 /Qprefetch resulting DGEMM 1,4 Gflop DGEV 0,5 Gflop. So no improvement.

then i use GOTO BLAS Pentium3 resulting DGEMM 1,6 Gflop DGEV 1,4 Gflop,

comparing to CVF resultingDGEMM 1,37 Gflop DGEV 1,2 Gflop.