- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good evening. I recently registered in this this forum and I need some help. During one year period I used Compaq Visual Fortran to develop and compile my programs. Recently I became interested in Intel Visual Fortran because I red in various magazines and Internet sites that this compiler is more efficient and programs are better optimized than with Compaq. However I'm a bit frustrated because I spent some work converting my Compaq Programmes to Intel and I found that they run much slowly with Intel. In some of them the overall computing time increase by a factor of 4. What worries me the most is that the difference in computing time are obvious, not only an increase of 5 or 10%, but 400%. If somebody can help me I will be very grateful.
Pedro
Pedro
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pedro,
Does your program contain any REAL*16 or COMPLEX*32 code or calls, eg beginning with DQ or ZQ?
The only area I am aware of where the latest IMSL is substantially slower than the one shipped with CVF is in the use of quad precision (REAL*16) arithmetic. That is because CVF did not support REAL(16), so the IMSL authors had implemented an approximate form of extended precision. The Intel compiler does support fullIEEE quad precision, but does so in software, so it is rather slow, significantly slower than the approximate extended precision implemented in the IMSL shipping with CVF. Certain double precision routines from IMSL use quad precision internally to improve accuracy. If you have a tool such as the Intel VTune Performance Analyzer, you might be able to generate a call graph that would show calls to DQ or ZQ functions. I have seen an application that called the linear solver DLSACG that slowed down by a factor of between 2X and 4X due to this effect.
If this is indeed the explanation of your observations, you might consider trying the single precision version of the solver you are using, since I believe that in this case, the intermediate sums would still be accumulated in double precision. Perhaps the accuracy would be sufficient for your needs.
Otherwise, you might consider posting to an IMSL forum,and asking whether VNI/Rogue Wave would considerreverting to the older, less accurate but faster implementation of quad precision for accumulation routines in future versions of IMSL,if the precision seems sufficient.
And as Steve has said, perhaps you could find a differentsolver that meets your needs in another library such as MKL. For solvers in either IMSL or MKL, you may be able to get a significant speedup on multi-core systems if threading is enabled.
Martyn Corden
Intel Developer Support
Does your program contain any REAL*16 or COMPLEX*32 code or calls, eg beginning with DQ or ZQ?
The only area I am aware of where the latest IMSL is substantially slower than the one shipped with CVF is in the use of quad precision (REAL*16) arithmetic. That is because CVF did not support REAL(16), so the IMSL authors had implemented an approximate form of extended precision. The Intel compiler does support fullIEEE quad precision, but does so in software, so it is rather slow, significantly slower than the approximate extended precision implemented in the IMSL shipping with CVF. Certain double precision routines from IMSL use quad precision internally to improve accuracy. If you have a tool such as the Intel VTune Performance Analyzer, you might be able to generate a call graph that would show calls to DQ or ZQ functions. I have seen an application that called the linear solver DLSACG that slowed down by a factor of between 2X and 4X due to this effect.
If this is indeed the explanation of your observations, you might consider trying the single precision version of the solver you are using, since I believe that in this case, the intermediate sums would still be accumulated in double precision. Perhaps the accuracy would be sufficient for your needs.
Otherwise, you might consider posting to an IMSL forum,and asking whether VNI/Rogue Wave would considerreverting to the older, less accurate but faster implementation of quad precision for accumulation routines in future versions of IMSL,if the precision seems sufficient.
And as Steve has said, perhaps you could find a differentsolver that meets your needs in another library such as MKL. For solvers in either IMSL or MKL, you may be able to get a significant speedup on multi-core systems if threading is enabled.
Martyn Corden
Intel Developer Support
Link Copied
18 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you show us a program that has this behavior? Are you sure you are building with comparable options?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
Can you show us a program that has this behavior? Are you sure you are building with comparable options?
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without seeing the application, there's little I can suggest. What options are you using? If you are in Visual Studio, go to the Fortran > Command Line page and copy-paste the options shown there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I'm using Visual Studio 2008. I'm also using the f90 version of IMSL libraries. It's the only difference between the Intel code and the Compaq code as in my Compaq codes I used f77 version of IMSL. The options of the compiler are the following:
/nologo /debug:full /QaxSSE3 /QxHost /Qipo /assume:nocc_omp
/arch:SSE3 /warn:unused /Qopt-report:2 /Qsave /iface:cvf
/module:"Debug/" /object:"Debug/" /traceback /check:bounds
/libs:static /threads /dbglibs /c
Thank you for your help.
Pedro
/nologo /debug:full /QaxSSE3 /QxHost /Qipo /assume:nocc_omp
/arch:SSE3 /warn:unused /Qopt-report:2 /Qsave /iface:cvf
/module:"Debug/" /object:"Debug/" /traceback /check:bounds
/libs:static /threads /dbglibs /c
Thank you for your help.
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
debug sets /Od, since you don't specify a level of optimization. /check:bounds is likely to compound the slowness. If you had permitted optimization, it might have been good to choose just one /arch option; supposing that the compiler takes you at your word that you want separate paths for 4 architectures, the code size expansion may hurt. If you have worked to make your code standard, you shouldn't need /Qsave /iface:cvf.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Tim notes, you're using a Debug configuration, which is unoptimized. Please switch to a Release configuration. I suggest leaving /QxHost and removing the /QaxSSE3 and /arch:SSE3. If you plan on running the application on other systems, then remove /QxHost and read the documentation to see which /Qx option makes sense for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
I try to make all the changes you suggested. My command line looks like:
/nologo /debug:minimal /O3 /Qipo /arch:SSE2 /warn:unused /Qopt-report:2 /module:"Debug/"
/object:"Debug/" /libs:dll /threads /c
However, my code is still slow when compared with Compaq. Other suggestion's will be appreciated!
Pedro
I try to make all the changes you suggested. My command line looks like:
/nologo /debug:minimal /O3 /Qipo /arch:SSE2 /warn:unused /Qopt-report:2 /module:"Debug/"
/object:"Debug/" /libs:dll /threads /c
However, my code is still slow when compared with Compaq. Other suggestion's will be appreciated!
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
As Tim notes, you're using a Debug configuration, which is unoptimized. Please switch to a Release configuration. I suggest leaving /QxHost and removing the /QaxSSE3 and /arch:SSE3. If you plan on running the application on other systems, then remove /QxHost and read the documentation to see which /Qx option makes sense for you.
/nologo /O3 /QxHost /module:"Release/" /object:"Release/" /libs:static /threads /c
And... program is still slow. I'm very disorientated and almost can't believe that it is possible. I have a doubt: which run-time library should I use? I'm currently using /libs:static /threads. Is this correct? Thank you for your help.
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The library type you selected is correct. Does your program do a lot of I/O? Try adding /assume:buffered_io
If that does not help, please ZIP the project ("build > clean" it first) and attach it to a reply here. Include any data files it needs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
The library type you selected is correct. Does your program do a lot of I/O? Try adding /assume:buffered_io
If that does not help, please ZIP the project ("build > clean" it first) and attach it to a reply here. Include any data files it needs.
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pedro,
Thanks, but I'm very confused about something. How did you build this with CVF when it relies on IMSL 6 that was not provided for CVF? Your CVF project's settings point to the newer IMSL modules and libraries which would not work with CVF, and there are references to the "Fortran 90" interfaces for IMSL which CVF did not support.
Is your real CVF project available for me to look at?
Thanks, but I'm very confused about something. How did you build this with CVF when it relies on IMSL 6 that was not provided for CVF? Your CVF project's settings point to the newer IMSL modules and libraries which would not work with CVF, and there are references to the "Fortran 90" interfaces for IMSL which CVF did not support.
Is your real CVF project available for me to look at?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
Pedro,
Thanks, but I'm very confused about something. How did you build this with CVF when it relies on IMSL 6 that was not provided for CVF? Your CVF project's settings point to the newer IMSL modules and libraries which would not work with CVF, and there are references to the "Fortran 90" interfaces for IMSL which CVF did not support.
Is your real CVF project available for me to look at?
Thanks, but I'm very confused about something. How did you build this with CVF when it relies on IMSL 6 that was not provided for CVF? Your CVF project's settings point to the newer IMSL modules and libraries which would not work with CVF, and there are references to the "Fortran 90" interfaces for IMSL which CVF did not support.
Is your real CVF project available for me to look at?
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pedro,
It's not the compiler, it's IMSL. Your program spends most of its time calling IMSL routines, so even if the compiler generates better code for your sources, the execution time is completely dominated by the calls to IMSL.
First, I took your CVF sources and compiled them with CVF and IVF. I did see the slowdown you mentioned. (You had made a lot of changes in the IVF version so I wanted to compare same sources.) I then took the CVF project and forced it to link to the newer IMSL. When I ran this, the execution time was actually a bit slower than when compiled with IVF (not by much.)
There have been many changes made to IMSL since the IMSL 4 days (what came with CVF), and it's evident that at least one of the routines you're calling has slowed down a lot. Perhaps that was needed to get better accuracy or reliability, I don't know. It would take significant further investigation to identify the routine(s) responsible, but I'm doubtful that they would do anything about it in the near term.
I was able to cut the time down by about 1/3 by specifying "link_fnl_static_hpc.h" as the library selection, but it was still twice the time of the CVF build.
Sorry I don't have better news for you here. The only thing I can suggest at the moment is to see if any of the IMSL calls can be replaced with calls to Intel Math Kernel Library routines. I do have one side-comment: in your newer version where you used the "F90" interfaces to IMSL, please call the generic names of the routines and not D_ or S_ variants.
It's not the compiler, it's IMSL. Your program spends most of its time calling IMSL routines, so even if the compiler generates better code for your sources, the execution time is completely dominated by the calls to IMSL.
First, I took your CVF sources and compiled them with CVF and IVF. I did see the slowdown you mentioned. (You had made a lot of changes in the IVF version so I wanted to compare same sources.) I then took the CVF project and forced it to link to the newer IMSL. When I ran this, the execution time was actually a bit slower than when compiled with IVF (not by much.)
There have been many changes made to IMSL since the IMSL 4 days (what came with CVF), and it's evident that at least one of the routines you're calling has slowed down a lot. Perhaps that was needed to get better accuracy or reliability, I don't know. It would take significant further investigation to identify the routine(s) responsible, but I'm doubtful that they would do anything about it in the near term.
I was able to cut the time down by about 1/3 by specifying "link_fnl_static_hpc.h" as the library selection, but it was still twice the time of the CVF build.
Sorry I don't have better news for you here. The only thing I can suggest at the moment is to see if any of the IMSL calls can be replaced with calls to Intel Math Kernel Library routines. I do have one side-comment: in your newer version where you used the "F90" interfaces to IMSL, please call the generic names of the routines and not D_ or S_ variants.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
Pedro,
It's not the compiler, it's IMSL. Your program spends most of its time calling IMSL routines, so even if the compiler generates better code for your sources, the execution time is completely dominated by the calls to IMSL.
First, I took your CVF sources and compiled them with CVF and IVF. I did see the slowdown you mentioned. (You had made a lot of changes in the IVF version so I wanted to compare same sources.) I then took the CVF project and forced it to link to the newer IMSL. When I ran this, the execution time was actually a bit slower than when compiled with IVF (not by much.)
There have been many changes made to IMSL since the IMSL 4 days (what came with CVF), and it's evident that at least one of the routines you're calling has slowed down a lot. Perhaps that was needed to get better accuracy or reliability, I don't know. It would take significant further investigation to identify the routine(s) responsible, but I'm doubtful that they would do anything about it in the near term.
I was able to cut the time down by about 1/3 by specifying "link_fnl_static_hpc.h" as the library selection, but it was still twice the time of the CVF build.
Sorry I don't have better news for you here. The only thing I can suggest at the moment is to see if any of the IMSL calls can be replaced with calls to Intel Math Kernel Library routines. I do have one side-comment: in your newer version where you used the "F90" interfaces to IMSL, please call the generic names of the routines and not D_ or S_ variants.
It's not the compiler, it's IMSL. Your program spends most of its time calling IMSL routines, so even if the compiler generates better code for your sources, the execution time is completely dominated by the calls to IMSL.
First, I took your CVF sources and compiled them with CVF and IVF. I did see the slowdown you mentioned. (You had made a lot of changes in the IVF version so I wanted to compare same sources.) I then took the CVF project and forced it to link to the newer IMSL. When I ran this, the execution time was actually a bit slower than when compiled with IVF (not by much.)
There have been many changes made to IMSL since the IMSL 4 days (what came with CVF), and it's evident that at least one of the routines you're calling has slowed down a lot. Perhaps that was needed to get better accuracy or reliability, I don't know. It would take significant further investigation to identify the routine(s) responsible, but I'm doubtful that they would do anything about it in the near term.
I was able to cut the time down by about 1/3 by specifying "link_fnl_static_hpc.h" as the library selection, but it was still twice the time of the CVF build.
Sorry I don't have better news for you here. The only thing I can suggest at the moment is to see if any of the IMSL calls can be replaced with calls to Intel Math Kernel Library routines. I do have one side-comment: in your newer version where you used the "F90" interfaces to IMSL, please call the generic names of the routines and not D_ or S_ variants.
Gratefully,
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try to use AMD CodeAnalyst. It can provide You info about time spent in called routines (I hope it will be able identify IMSL routines one by one) http://developer.amd.com/cpu/codeanalyst/Pages/default.aspx it is free.
Jakub
Jakub
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pedro,
Does your program contain any REAL*16 or COMPLEX*32 code or calls, eg beginning with DQ or ZQ?
The only area I am aware of where the latest IMSL is substantially slower than the one shipped with CVF is in the use of quad precision (REAL*16) arithmetic. That is because CVF did not support REAL(16), so the IMSL authors had implemented an approximate form of extended precision. The Intel compiler does support fullIEEE quad precision, but does so in software, so it is rather slow, significantly slower than the approximate extended precision implemented in the IMSL shipping with CVF. Certain double precision routines from IMSL use quad precision internally to improve accuracy. If you have a tool such as the Intel VTune Performance Analyzer, you might be able to generate a call graph that would show calls to DQ or ZQ functions. I have seen an application that called the linear solver DLSACG that slowed down by a factor of between 2X and 4X due to this effect.
If this is indeed the explanation of your observations, you might consider trying the single precision version of the solver you are using, since I believe that in this case, the intermediate sums would still be accumulated in double precision. Perhaps the accuracy would be sufficient for your needs.
Otherwise, you might consider posting to an IMSL forum,and asking whether VNI/Rogue Wave would considerreverting to the older, less accurate but faster implementation of quad precision for accumulation routines in future versions of IMSL,if the precision seems sufficient.
And as Steve has said, perhaps you could find a differentsolver that meets your needs in another library such as MKL. For solvers in either IMSL or MKL, you may be able to get a significant speedup on multi-core systems if threading is enabled.
Martyn Corden
Intel Developer Support
Does your program contain any REAL*16 or COMPLEX*32 code or calls, eg beginning with DQ or ZQ?
The only area I am aware of where the latest IMSL is substantially slower than the one shipped with CVF is in the use of quad precision (REAL*16) arithmetic. That is because CVF did not support REAL(16), so the IMSL authors had implemented an approximate form of extended precision. The Intel compiler does support fullIEEE quad precision, but does so in software, so it is rather slow, significantly slower than the approximate extended precision implemented in the IMSL shipping with CVF. Certain double precision routines from IMSL use quad precision internally to improve accuracy. If you have a tool such as the Intel VTune Performance Analyzer, you might be able to generate a call graph that would show calls to DQ or ZQ functions. I have seen an application that called the linear solver DLSACG that slowed down by a factor of between 2X and 4X due to this effect.
If this is indeed the explanation of your observations, you might consider trying the single precision version of the solver you are using, since I believe that in this case, the intermediate sums would still be accumulated in double precision. Perhaps the accuracy would be sufficient for your needs.
Otherwise, you might consider posting to an IMSL forum,and asking whether VNI/Rogue Wave would considerreverting to the older, less accurate but faster implementation of quad precision for accumulation routines in future versions of IMSL,if the precision seems sufficient.
And as Steve has said, perhaps you could find a differentsolver that meets your needs in another library such as MKL. For solvers in either IMSL or MKL, you may be able to get a significant speedup on multi-core systems if threading is enabled.
Martyn Corden
Intel Developer Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve and Martyn, I replaced the IMSL LSARG with a solver set (decomposition+solver+iterative refinement) from MKL and the performance is astonishing. For the same test case my old program takes about 2 min and the new with MKL takes 2 second. What a speed improvement! Thank you very much for the help. This forum is simply brilliant.
Gratefully,
Pedro
Gratefully,
Pedro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wonderful news! Many of our customers are finding how easy it is to get better performance by calling MKL.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page