- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is the situation different for the Intel Compiler?
many thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the Intel compilers, you have more choices that make much more difference. If you were going to run on Xeon and Opteron, I would suggest /xW /O3. You'll get better performance on the Xeon with /xN (or /xP if it's a Nocona-type Xeon), but specifying these won't let your code run on Opteron.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Incidentally, if I set "optimise for host" on the pentium-m, will it pick pentium-4 or will it default to blend?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Intel compilers have two different processor extension modes. "Require" means that you promise that the program will be run on a processor with those extensions. The command line syntax starts with /x followed by a single letter (B, W, N, P, etc.) designating the required set of processor extensions. If you run a program compiled in this mode on a processor other than the indicated type, the program may behave unpredictably or get a run-time error. If you choose codes B, N or P, an additional run-time check is added to the main program that checks for a supported Intel processor, and if not found, it exits with an error. The other codes do not cause such a run-time check.
The other mode is "use processor extensions if available, otherwise use generic code". These are the switches starting with /ax followed by one or two letters. The program will generate up to three code paths, two processor-specific and one generic, and will detect the processor type at program start and select the appropriate path. Non-Intel processors take the generic path in such cases.
I don't see the word "allow" in the text I am looking at, so I don't know what you're referring to.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear forall,
You may want to browse through the on-line article at http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm as well, since automatic vectorization hasthe potential to boost the performance of FP intensive Fortran codes.
Aart
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dont see any "W" in the Fortran optimisation options: only K, N, B, P. (ie, seems I have /K instead of W)
when I said "allow" in the last email I meant "use" - sorry for the confusion.
It seems that the "use" is the best way to go, since I assume this will generate the best code for an intel xeon and generic code for the opteron. I assume there is some increase in the size of the code but thats no big deal. Am I correct?
now, regarding the "optimise for intel processor" (options GB,G5 etc.), what are the best options for a (dual)Xeon and (dual)Opteron. Blend for opteron and P-III for the Xeon? (which from what I understand is closer to P-III than to P-4?) How does this option interact with the "use" and "request" extensions?
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
DearForall,
Code that has been automatically parallelizedqueries the runtime to determine what number of threads are best for the actual architecture it is running on. On a dual core, core with HT technology, or both, this should typically yield speedup (unless nothing is automatically parallelized, of course). Multithreaded code may exhibit a slight slowdown when run on a single core, however, even though our team tries to minimize this overhead.
You may want to consider adding OpenMP directives to make the parallelism in the program explicit if the implicit parallelism is not extracted automatically by the compiler.
Aart
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are going to run on Opteron and Xeon, use /QxW and not any of the other processor options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I dont have a problem with generating 3 different sets of exe's (for the pentium-m vs xeons vs opterons) if its worth it in terms of speed (eg, if the differences hover around ~10-20% I wouldnt worry about it).
But I am somewhat confused:
If I go to project properties > configuration props > fortran > optimisation I have 3 pulldown options for processor-dependent optimisations:
1. "optimise for intel processor". this sets flags /GB, /G5, etc. I've been using P-4 (/G7) for Pentium-M runs, but not sure whats the best choice for the Xeon and Opteron;
2. "use extensions". this has flags /QaxK,/QaxN,/QaxB,/QaxP. If I understood you correctly there should be an option /QaxW, but its definitely not there. I am using Intel Fortran 8.1 standard ed.(not EMT64 - should I be?). I was going to use /QaxB for the code for the Pentium-M, /QaxW for the opteron and /QaxN (or QaxP) for the Xeon (how do I find out if it's a "nocoona" chip?)
3. "require extensions". same flags as above but without the "a", eg, /QxK. Again, definitely cant see /QxW. I am not going to use these flags just in case the exe accidentally ends up on the wrong processor and generates bad results.
Did I understand your suggestions correctly?
also, I am going to use optimisation /O3 as recommended.
thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear forall,
If the size of the application and compile-time requirements allow, also consider the /Qipo switch to enable inter-procedural optimizations of the complete program. In fact, the shorthand /fast currently expands into /QxP /O3 /Qipo /Qprec-div-. For your use setting, you would like to start with /QaxP /O3 /Qipo (since Qprec-div- only helps if your application performs a lot of FP divisions and is numerically stable enough to allow a few additional optimizations on them).
Aart
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The /QxW option is not available from the property page. Officially, it has been superseded by /QxN, but if you are compiling for Opteron, you can't use that. So click on Command Line and type in /QxW manually.
The "Optimize for" option is similar to CVF's /tune switch. It has a smaller effect and adjusts for the fact that some processors "prefer" certain instructions over others, even though both support both instructions. You should select "Optimize for" for the processor you expect to run on most often - it may cause other processors to run a bit slower but the program will still run. This is something you need to test for yourself.
There is a lot of information on these topics in "Volume 2" of the Intel Fortran Programmer's Manual.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I should point out that I have the following setup: a static library that is compiled separately with switch /QxW and then I link to it a Fortran Windows application, also with /QxW (other options same as far as I can see, have checked them). Definitely typed /QxW in the Fortran command prompt. The compilation is carried out on a Pentium-Dothan 735, with the exes to be moved to a Xeon and Opteron (I was trying /QxN for the Xeons and /QxW for the Opteron as suggested earlier by Steve).
I rebuilt the projects from scratch just in case and now get the following error when building the Fort-Windows app with /QxW: "error LNK2019: unresolved external symbol _vmldExp2 referenced in function _sumgauss". function sumgauss is rather simple and contains the following line "ex(:)=exp(-arg(:)**2)" which I assume is the cause of the problem (Exp2??). There were no problems building the static library.
Now, when using /QxN (or /QxP) instead of /QxW (in both the lib and the fortran-windows app) the compilation of the library seems to abort at some file (which didnt have problems at other settings). No internal error - just 'compilation aborted (code 1)'. The same file in both cases.
Message Edited by forall on 02-19-2005 07:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The additional library invoked by /Qparallel or /Qopenmp is libguide.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page