- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is it a bug?
thanks!
=====================================
apple@localhost~$iforttest.f90-llapack_if-lblas_if
apple@localhost~$./a.out
eval
0.1745844822254170.1745844822254170.174584482225417
apple@localhost~$gfortrantest.f90-llapack-lblas
apple@localhost~$./a.out
eval
-0.48943056644789246-0.186354201199527280.74131197099260637
test.f90 :
=====================================
program main
implicit none
integer(4)::n,il,iu,m,isuppz(6),iwork(30),lwork,liwork,info
real(8) :: a(3,3),z(3,3),w(3),work(80),vl,vu,abstol
n = 3
il = 1
iu = 3
abstol = 0.0_8
lwork = 80
liwork = 30
a=reshape([0.694444444444445_8,0.208135487069874_8,1.069274406890994E-002_8,0.208135487069874_8,-0.211332634659698_8,&
0.133515358818456_8,1.069274406890994E-002_8,0.133515358818456_8,-0.417584606439560_8],[3,3])
call dsyevr('V','I','U',n,a,n,vl,vu,il,iu,abstol,&
m, w, z, n, isuppz, work, lwork,iwork,liwork,info)
print *, "eval"
print *, w
end program main
thanks!
=====================================
apple@localhost~$iforttest.f90-llapack_if-lblas_if
apple@localhost~$./a.out
eval
0.1745844822254170.1745844822254170.174584482225417
apple@localhost~$gfortrantest.f90-llapack-lblas
apple@localhost~$./a.out
eval
-0.48943056644789246-0.186354201199527280.74131197099260637
test.f90 :
=====================================
program main
implicit none
integer(4)::n,il,iu,m,isuppz(6),iwork(30),lwork,liwork,info
real(8) :: a(3,3),z(3,3),w(3),work(80),vl,vu,abstol
n = 3
il = 1
iu = 3
abstol = 0.0_8
lwork = 80
liwork = 30
a=reshape([0.694444444444445_8,0.208135487069874_8,1.069274406890994E-002_8,0.208135487069874_8,-0.211332634659698_8,&
0.133515358818456_8,1.069274406890994E-002_8,0.133515358818456_8,-0.417584606439560_8],[3,3])
call dsyevr('V','I','U',n,a,n,vl,vu,il,iu,abstol,&
m, w, z, n, isuppz, work, lwork,iwork,liwork,info)
print *, "eval"
print *, w
end program main
Link Copied
11 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And where did the lapack library come from?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
of course I have downloaded lapack from netlib and compiled it with ifort and gfortran respectively.
just edit make.inc ,then make...
just edit make.inc ,then make...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am in the process of trying to reproduce this. 3 questions:
1) Was this on Linux or Mac OS X?
2) What did you set in make.inc for OPTS for ifort and gfortran respectively
3) Did you use the system blas, the source blas with lapack, or ?? other blas ??
thanks
ron
1) Was this on Linux or Mac OS X?
2) What did you set in make.inc for OPTS for ifort and gfortran respectively
3) Did you use the system blas, the source blas with lapack, or ?? other blas ??
thanks
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried to reproduce this using very aggressive optimizations, thinking that perhaps overly aggressive optimizations might push the algorithm over a stability tipping point. No such luck. I can achieve differences in the last digit of precision, but one expects this at very high optimization levels.
Here is the most difference I've achieved using -O3 -xT -ip on both the blas and lapack libs:
./resp_opt
eval
-0.489430566447892 -0.186354201199528 0.741311970992606
Now I'd be quite curious to see your settings in make.inc. Also, are you 100% certain you cleaned up the lapack objs and libs between switching between ifort and gfortran? I do this sequence:
make cleanall
make blaslib
make lib
The 'cleanall' to insure that every bit of lapack/blas are cleaned between builds.
I am using the 10.0.026 compiler on Linux. I still await hearing what you have as a build platform. I just cannot replicate what you are seeing.
ron
Here is the most difference I've achieved using -O3 -xT -ip on both the blas and lapack libs:
./resp_opt
eval
-0.489430566447892 -0.186354201199528 0.741311970992606
Now I'd be quite curious to see your settings in make.inc. Also, are you 100% certain you cleaned up the lapack objs and libs between switching between ifort and gfortran? I do this sequence:
make cleanall
make blaslib
make lib
The 'cleanall' to insure that every bit of lapack/blas are cleaned between builds.
I am using the 10.0.026 compiler on Linux. I still await hearing what you have as a build platform. I just cannot replicate what you are seeing.
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to mention, if you want a mix of performance and accuracy, try this, it gave me exact results to unoptimized gfortran and ifort:
ifort -O3 -x[your arch] -fp-model source
that is, use those options for OPTS
ron
ifort -O3 -x[your arch] -fp-model source
that is, use those options for OPTS
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you very much ,ron.
1) my OS is linux (glibc 2.5,gcc/gfortran 3.4.0pre-release ,binutils 2.18, kernel 2.6.23.1).
2) I always compile blas together with lapack.
I modified the Makefile,
# lib: lapacklib tmglib
to
lib: blaslib lapacklib tmglib
3) the result is correct when calling MKL's lapack routine.
4)can you tell me how to set the fp-model flag for ifort?
5) my make.inc :
=============================
SHELL = /bin/sh
VERSION = 3.1.1
# FORTRAN = gfortran
# FORTRAN_VERSION = 4.3.0pre-release
# OPTS = -msse2 -mfpmath=sse -O3 -ftree-vectorize -funroll-all-loops -fbounds-check
FORTRAN = ifort
FORTRAN_VERSION = 10.0.025
OPTS =
DRVOPTS = $(OPTS)
NOOPT =
LOADER = $(FORTRAN)
LOADOPTS =
PRE = lib
PLAT = linux
TIMER = NONE
ARCH = ar
ARCHFLAGS= cr
RANLIB = ranlib
BLASLIB = ../../$(PRE)blas_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
LAPACKLIB = $(PRE)lapack_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
TMGLIB = $(PRE)tmglib_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
EIGSRCLIB = $(PRE)eigsrc_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
LINSRCLIB = $(PRE)linsrc_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
1) my OS is linux (glibc 2.5,gcc/gfortran 3.4.0pre-release ,binutils 2.18, kernel 2.6.23.1).
2) I always compile blas together with lapack.
I modified the Makefile,
# lib: lapacklib tmglib
to
lib: blaslib lapacklib tmglib
3) the result is correct when calling MKL's lapack routine.
4)can you tell me how to set the fp-model flag for ifort?
5) my make.inc :
=============================
SHELL = /bin/sh
VERSION = 3.1.1
# FORTRAN = gfortran
# FORTRAN_VERSION = 4.3.0pre-release
# OPTS = -msse2 -mfpmath=sse -O3 -ftree-vectorize -funroll-all-loops -fbounds-check
FORTRAN = ifort
FORTRAN_VERSION = 10.0.025
OPTS =
DRVOPTS = $(OPTS)
NOOPT =
LOADER = $(FORTRAN)
LOADOPTS =
PRE = lib
PLAT = linux
TIMER = NONE
ARCH = ar
ARCHFLAGS= cr
RANLIB = ranlib
BLASLIB = ../../$(PRE)blas_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
LAPACKLIB = $(PRE)lapack_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
TMGLIB = $(PRE)tmglib_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
EIGSRCLIB = $(PRE)eigsrc_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
LINSRCLIB = $(PRE)linsrc_$(VERSION)_$(PLAT)_$(FORTRAN)-$(FORTRAN_VERSION).a
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For your OPTS variable in make.inc, you be very simple and add:
OPTS = -fp-model source
The default optimization for the compiler, since you are not specifying -O[ 0 | 1 | 2 | 3 ], is -O2. With the 10.0 compiler, this default optimization level provides very good performance for most codes on Intel architecture.
You'll note that I stepped it up a notch with -O3 and -x[architecture]. There is another thread on this forum regarding compiler options. Take a look at this article for the -x architure-specific vectorization settings:; http://www.intel.com/support/performancetools/sb/CS-009787.htm
As for the fp-model option, I recommend reading the documentation on the various settings. This option controls how strictly the floating point operations follow the IEEE 754 standard. This is always a tradeoff between performance and accuracy, so there is no general guidelines I can offer. Iterative solvers tend to be more sensitive to numerical precision, direct solvers not so much. For any given code, I usually try the default, which is fp-model fast=1. If I find the results are not within 'acceptable', then I slowly increase the compliance with fp-model source. If this is not sufficient, next step is 'precise' and finally for those extreme cases where every bit must match, fp-model strict. Note that you give up performance in each step towards complete IEEE compliance.
I hope this helps.
ron
OPTS = -fp-model source
The default optimization for the compiler, since you are not specifying -O[ 0 | 1 | 2 | 3 ], is -O2. With the 10.0 compiler, this default optimization level provides very good performance for most codes on Intel architecture.
You'll note that I stepped it up a notch with -O3 and -x[architecture]. There is another thread on this forum regarding compiler options. Take a look at this article for the -x architure-specific vectorization settings:; http://www.intel.com/support/performancetools/sb/CS-009787.htm
As for the fp-model option, I recommend reading the documentation on the various settings. This option controls how strictly the floating point operations follow the IEEE 754 standard. This is always a tradeoff between performance and accuracy, so there is no general guidelines I can offer. Iterative solvers tend to be more sensitive to numerical precision, direct solvers not so much. For any given code, I usually try the default, which is fp-model fast=1. If I find the results are not within 'acceptable', then I slowly increase the compliance with fp-model source. If this is not sufficient, next step is 'precise' and finally for those extreme cases where every bit must match, fp-model strict. Note that you give up performance in each step towards complete IEEE compliance.
I hope this helps.
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for your great suggestion.
As a test, today I modified "-O3 -xN -ipo" flags of lapack's make.inc file ,and I modified Makefile also to make .so dynamic-linked libraries of lapack. I secceeded.but,
when using -O3 flag ,the a.out can execute (but the result is still wrong).
when using "-O3 -xN -ipo" flags, It goes wrong when executing:
===================================================
apple@localhost ~/linux/lapack-lite-3.1.1 $ ifort tes.f90 -L. -llapack -lblas
./liblapack.so: undefined reference to `__svml_cosf4'
./liblapack.so: undefined reference to `__svml_log2'
./liblapack.so: undefined reference to `__svml_roundf4'
./liblapack.so: undefined reference to `__svml_logf4'
./liblapack.so: undefined reference to `__svml_cos2'
what are those symbols?
Maybe I must use the MKL library with ifort.
As a test, today I modified "-O3 -xN -ipo" flags of lapack's make.inc file ,and I modified Makefile also to make .so dynamic-linked libraries of lapack. I secceeded.but,
when using -O3 flag ,the a.out can execute (but the result is still wrong).
when using "-O3 -xN -ipo" flags, It goes wrong when executing:
===================================================
apple@localhost ~/linux/lapack-lite-3.1.1 $ ifort tes.f90 -L. -llapack -lblas
./liblapack.so: undefined reference to `__svml_cosf4'
./liblapack.so: undefined reference to `__svml_log2'
./liblapack.so: undefined reference to `__svml_roundf4'
./liblapack.so: undefined reference to `__svml_logf4'
./liblapack.so: undefined reference to `__svml_cos2'
what are those symbols?
Maybe I must use the MKL library with ifort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you build lapack with -xN you need to also specify this when compiling/linking your source. Or you can add -lsvml.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
another boring question is how I can recompile glibc's libm.so library to using the sse family ISs instead of x87 FPU to gain more efficiency?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it passed. thank you.
apple@localhost ~/linux/lapack-lite-3.1.1 $ ifort tes.f90 -L. -llapack -lblas -xN -O3 -lsvml
apple@localhost ~/linux/lapack-lite-3.1.1 $ ./a.out
eval
0.174584482225417 0.174584482225417 0.174584482225417
apple@localhost ~/linux/lapack-lite-3.1.1 $ ifort tes.f90 -L. -llapack -lblas -xN -O3 -lsvml
apple@localhost ~/linux/lapack-lite-3.1.1 $ ./a.out
eval
0.174584482225417 0.174584482225417 0.174584482225417
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page