- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Running the example given here - http://www.polyhedron.com/openmp - which claims it is for Intel Fortran I get the serial version to run OK, but when I then enable OpenMP I get a message that the timing routine CLOCKX cannot be found. Does OpenMP inhibit the use of the portability module IFLIB? What alternate should I try?
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried this with 11.0.072 and had no problems building the program. What is the exact error you get? Can you attach the build log (see below for attach instructions)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
------ Build started: Project: OpenMPTest, Configuration: Debug|Win32 ------
Deleting intermediate files and output files for project 'OpenMPTest', configuration 'Debug|Win32'.
Compiling with Intel Fortran 9.1 C:Program FilesIntelCompilerFortran9.1IA32...
ifort /nologo /Zi /Od /Qparallel /fpscomp:nolibs /module:"Debug/" /object:"Debug/" /traceback /check:bounds /libs:static /dbglibs /c /Qvc7.1 /Qlocation,link,"D:Program FilesMicrosoft Visual Studio .NET 2003Vc7bin" "d:My DocumentsVisual Studio ProjectsOpenMPTestOpenMPTestOpenMPTest.f90"
Linking...
Link /OUT:"Debug/OpenMPTest.exe" /INCREMENTAL:NO /NOLOGO /DEBUG /PDB:"Debug/OpenMPTest.pdb" /SUBSYSTEM:CONSOLE "Debug/OpenMPTest.obj"
Link: executing 'link'
OpenMPTest.obj : error LNK2019: unresolved external symbol _CLOCKX referenced in function _MAIN__
Debug/OpenMPTest.exe : fatal error LNK1120: 1 unresolved externals
OpenMPTest build failed.
Source code:
Deleting intermediate files and output files for project 'OpenMPTest', configuration 'Debug|Win32'.
Compiling with Intel Fortran 9.1 C:Program FilesIntelCompilerFortran9.1IA32...
ifort /nologo /Zi /Od /Qparallel /fpscomp:nolibs /module:"Debug/" /object:"Debug/" /traceback /check:bounds /libs:static /dbglibs /c /Qvc7.1 /Qlocation,link,"D:Program FilesMicrosoft Visual Studio .NET 2003Vc7bin" "d:My DocumentsVisual Studio ProjectsOpenMPTestOpenMPTestOpenMPTest.f90"
Linking...
Link /OUT:"Debug/OpenMPTest.exe" /INCREMENTAL:NO /NOLOGO /DEBUG /PDB:"Debug/OpenMPTest.pdb" /SUBSYSTEM:CONSOLE "Debug/OpenMPTest.obj"
Link: executing 'link'
OpenMPTest.obj : error LNK2019: unresolved external symbol _CLOCKX referenced in function _MAIN__
Debug/OpenMPTest.exe : fatal error LNK1120: 1 unresolved externals
OpenMPTest build failed.
Source code:
[plain]program OpenMPTest USE IFPORT implicit none ! Variables integer, parameter:: NumSteps = 20000000 ! 2E7 double precision:: StartTime, StopTime double precision:: e, pi, factorial, product integer :: i character :: a*1 call clockx(StartTime) !$OMP PARALLEL SECTIONS SHARED (e, pi) !$OMP SECTION print *, 'Calculation of e begun ...' e = 1d0 factorial = 1d0 do i = 1, NumSteps factorial = factorial * i e = e + 1d0 / factorial end do print *, 'Calculation of e completed ... ', e !$OMP SECTION print *, 'Calculation of PI started ...' pi = 0d0 do i = 0, NumSteps * 10 pi = pi + 1d0 / (4d0 * i + 1d0) - 1d0 / (4d0 * i + 3d0) end do pi = pi * 4d0 print *, 'PI calculated ...', pi product = e * pi !$OMP END PARALLEL SECTIONS call clockx(StopTime) print *, 'Reached result ', product, ' in ', (StopTime - StartTime) / 1d6, " seconds" print *, 'Press Q and [RETURN] to quit' read *, a end program OpenMPTest[/plain]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Explicitly adding lipifport.lib on the linker page resolved the problem. I've no idea why the library was found under one setting (parallelisation disabled) and not the other. I didn't need to specify a path, so that wasn't the root cause.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have /fpscomp:nolibs set in the project. This will prevent the portability library from being linked in.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
You have /fpscomp:nolibs set in the project. This will prevent the portability library from being linked in.
Ah! I thought that the portability library was an Intel thing, not a PowerStation thing.
Thank you.
Now for some curious things.
On an AMD Sempron the serial version takes c. 13 seconds; the parallel version c. 6 seconds. (XP as OS)
On an Intel Q6600 the serial version takes 17 seconds, the parallel version 11 seconds (Vista as OS)
Why do I see a better speedup on a machine that is single core? Is this a reflection of how bad Vista is as an OS?
(A different serial program, using VB6, saw the Q6600 running c. 8 times faster than the Sempron.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why are you comparing Fortran debug mode with VB6 optimized mode? Did you set KMP_AFFINITY? Windows 7 should be less dependent on KMP_AFFINITY than Vista, if that's what you mean by "how bad Vista is." Vista-64 usually performs OK otherwise.
True, it's usually relatively easy to get threaded performance scaling when you choose options to make your serial version as slow as possible.
True, it's usually relatively easy to get threaded performance scaling when you choose options to make your serial version as slow as possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The VB6 program is not being compared with the Fortran. It is being compared for the performance differebnces I can expect between the Sempron & the Q6600. After a week on the Sempron I ran the same job on the Q6600, & since it was VB6 I dedicated a core to the program. (Easy to do using Task Manager when a program runs for days; not so easy with a small test that runs for 6 seconds.) After 5 days the Q6600 had completed, while the Sempron was still around half-way through. The program is not linear in compute time with progress, as it is Monte-Carlo based, but neverless the Q6600 was considerably faster than the Sempron. So I was expecting a similar result with my first foray into OpenMP: that the Sempron, a single core, would show little improvement with parallelisation, that the Q6600 would, and that the Q6600 would be faster. None of these expectations materialised. The availability of SSE2 may explain the Sempron results - I don't know enough about the Sempron & the Intel compiler to be sure. As for the Q6600 I had noticed that Vista seems to task switch between the cores excessively, resuting in inefficient usage. I was told that Vista does this to improve responsiveness for typical GUI-oriented applications, rather than numerically-intensive programs.
BUT given the screen shots on the web page from which I took the code that I have used I was, again, expecting that I woudl see better utilisation of the cores. As for KMP_AFFINITY, I haven't encountered that - I didn't see it in the Intel documentation, nor in what little I have read on OpenMP. I typed KMP_AFFINITY into the compiler help file & got a page on the MOD() function. My guess would be that KMP_AFFINITY came a little later than my version of the compiler.
BUT given the screen shots on the web page from which I took the code that I have used I was, again, expecting that I woudl see better utilisation of the cores. As for KMP_AFFINITY, I haven't encountered that - I didn't see it in the Intel documentation, nor in what little I have read on OpenMP. I typed KMP_AFFINITY into the compiler help file & got a page on the MOD() function. My guess would be that KMP_AFFINITY came a little later than my version of the compiler.
I have the basics of what I wanted - my first experiment with OpenmP, and the results on two processors showing that there is a useful benefit from enabling parallelisation. What remains is curiosity about specific results.
As for Vista - I have noticed that my 32-bit Vista often behaves sluggishly DESPITE the performance meters indicating the CPU activity is low & physical free RAM is plentiful. It may be that the sluggishness is IO-bound; but opening a small Adobe document while using a newsreader are not commonly regarded as high-IO activities. This is all an aside. If there is someway of indicating that I want cores to be dedicated to a progarm programmatically, then I am here to learn.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fortran and OpenMP don't provide a direct way to lock threads to cores, because supporting the various possible platforms and allowing other applications to coexist effectively becomes impractical. C/C++ programmers often advocate that.
According to the docs, the latest ifort has the /Qpar-affinity=compact option, which you would set, if you don't intend to SET KMP_AFFINITY=. This would set the behavior of ifort OpenMP or auto-parallel similar to what it is with PGI OpenMP. It will work to keep threads 0 and 1 together on one cache, and 2 and 3 on the other. The docs say the command line option over-rides the run-time environment variable, which seems the wrong way around to me. In fact, I'll submit an issue asking for an explanation. If you also set the verbose modifier, it will give some information on what it is doing. You may not like what it tells you on the AMD platform, but at least it does no harm.
The KMP_AFFINITY environment variable (with no par-affinity) goes back to ifort 10.1. You can't rely on good quad core performance with older versions.
With past ifort versions, if you wanted SSE code to run on AMD, you would have to specify it at compile time (e.g. /QxW), as well as building in release mode. It's possible that your AMD executes your debug x87 code more efficiently than the Intel CPU does.
According to the docs, the latest ifort has the /Qpar-affinity=compact option, which you would set, if you don't intend to SET KMP_AFFINITY=
The KMP_AFFINITY environment variable (with no par-affinity) goes back to ifort 10.1. You can't rely on good quad core performance with older versions.
With past ifort versions, if you wanted SSE code to run on AMD, you would have to specify it at compile time (e.g. /QxW), as well as building in release mode. It's possible that your AMD executes your debug x87 code more efficiently than the Intel CPU does.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another curio: on the Q6600, in release mode, the OpenMP takes 2.7 seconds; but the non-OpenMP version takes 0.7. The only difference between them is I have the OpenMP setting set to Disable or Generate parallel code.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page