The OpenMP parallel sections are indicated by the blue arrows in the master thread. The other threads should have all the same parallel sections, but as can be seen, each one quits showing them at some point with the rest of the time being reported as waiting for the barrier at it's last blue arrow. The same is true if I do a Concurrency analysis, but if I do a Hotspots analysis it shows all of the OpenMP parallel sections correctly in all threads. Does anyone have any idea what is going on here?
Intel VTune Amplifier XE 2011.4 build 176374
no kernel driver, installed as non-root
ifort 11.1 20100806
Linux CentOS 5.5 kernel 2.6.18-238.19.1.el5
Dell Precision M4500 with quad core Core i7 and hyperthreading (8 virtual cores)
Also you can use "Add Files" to add jpeg file as attachment.
It's better that you can provide test code, so other can reproduce this problem and investigate why.
[root@NHM02 peter]# source /opt/intel/compilerpro-12.0.0.048/bin/compilervars.sh intel64
[root@NHM02 peter]# ifort -g -openmp -openmp-report -fpp openmp_sample.f90 -o openmp_sample.ifort
openmp_sample.f90(82) (col. 7): remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
openmp_sample.f90(73) (col. 7): remark: OpenMP DEFINED REGION WAS PARALLELIZED.
[root@NHM02 peter]# source /opt/intel/vtune_amplifier_xe_2011/amplxe-vars.sh
Copyright (C) 2009-2011 Intel Corporation. All rights reserved.
Intel VTune Amplifier XE 2011 (build 176374)
[root@NHM02 peter]# amplxe-cl -collect locksandwaits -- ./openmp_sample.ifort
Warning: Symbol file is not found. The call stack passing through the module /opt/intel/composerxe-2011.0.048/compiler/lib/intel64/libiomp5.so may be incorrect
Range to check for Primes: 1 10000000
We are using 8 thread(s)
Number of primes found: 664579
Number of 4n+1 primes found: 332181
Number of 4n-1 primes found: 332398
Using result path `/home/peter/r000lw'
Executing actions 74 % Generating a report
Average Concurrency: 6.841
Elapsed Time: 0.540
CPU Time: 3.200
Wait Time: 0.686
Executing actions 99 % done
It seem that all OpenMP* parallel sections can be displayed, andwait time &counts of barriers & join (sync-objs) alsowere displayed.
PMU event counts are not inLocksandWaits analysis.
Isit your application specific issue?
What do you mean by upload a jpeg file? How do I do that, or where do I upload it to?
The "insert/edit image" button in the text editor asks for a URL for the image. I put in "file:///local/..." giving the full path name of the file expecting it would then attach the file the same as other programs, like my email client. But it just inserted an html link based on that address, which of course won't work for a file on my local disk.
The "add files" button brings up a window that says "ALL folders" in the upper left, has a box in which I can type and "search folder" in the upper right, and a box in which I can type followed by "create folder" and "delete" in what appears to be the main window area. Beats me how I use this to add a file. I typed in the full path to the directory containing the file in the box in front of "search folder" and clicked on "search folder", which only cleared the box.
I'm using firefox 3.6.18 on Linux as my browser.
Thanks for any further assistance to educate me on this text editor,
The adaptive finite element code is over 100,000 lines, so I can't just post it here, and it will take a while to reduce it down to something reasonable. But if you want to try the full code, you can download it from http://math.nist.gov/phaml and buld it as follows:
tar -xzf phaml-1.9.1.tar.gz
./mkmkfile.sh F90 intel PARALLEL openmp PARLIB none GRAPHICS none
edit src/Makefile to add -g to FFLAGS and CFLAGS
edit Makefile to add -g to FFLAGS
then use phaml as the target for the VTune analysis
After doing above, you cando "Add to Editor", or do "Add as Attachment".
I'm using IE8, but not sure for Firefox - at least you can add as attachment, am I right?
I still have to go the directory examples/simple anddo "make", generate phaml, for simple test.
Then do : amplxe-cl -collect locksandwaits -- ./phaml, it seemed everythingwas OK.
It did make sense on difference about OMP parallel regions between my example and your phaml. Because OMP parallel code in my example continually worked, but OMP parallel code in yourphaml worked intermittently.
Given that it works correctly on your computer, I must have something wrong in my installation or environment or something. Any clues on how to track that down? How closely can you approximate my actual environment, given in the original post? For the Locks and Waits analysis, it shouldn't matter that I don't have a kernel driver, right?
If you have concerns (says around .15s to .2s, all the worker threads stop showing "running" or "OpenMP regions" pretty early) on your results, please zip/attach result directory - I would like to look into.
Collect additional information aboutyour system and environment run feedback tool,
I finally got access to a Windows machine so I could try posting with IE -- same thing. Above the text editor window there are two buttons, "Toggle HTML/Visual Editor" and "Add Files". When I click on "Add Files" a window pops up. Across the top in a blue field, on the left is a label "Add Files", which does nothing, and on the right is "close" which closes the window when clicked. Below that is a white area with "ALL Files" on the left, which is just text that does nothing when clicked, and on the right is a white text box followed by a grey button "search folder". If I type text in the box and click "search folder" it just clears the box. Below that is a blue area with a white text box on the left followed by two buttons "create folder" and "Delete". That's it. I'm seeing exactly the same thing in Windows/IE and Linux/Firefox.
Maybe we could take this offline where I can just email you the images and requested files.
I think I'm experiencing the same issue.
I'm running VTune XE 2011 (build 186533) to try and determine why I'm not getting the speedup I'd hope for with OpenMP.
I've tried using both STATIC and DYNAMIC schedulers on my parallel loops but with both, VTune is showing OMP Worker Thread 1 # (I'm only using 2 threads) to only occasionally be taking part in the work. When it's not shown as working, it's marked as Waiting.
It would be useful to know if this is:
a) A problem with my application (great - let's fix it and my code runs faster) or...
b) A bug in VTune (not so great)
Please use latest Update 5, which has some issues fixed for OMP.
Thanks for your response.
What scheduler would you recommend to use on a Core 2 Duo processor?
I will update my compiler and VTune to the latest versions and re-try.
Thanks for the suggestions,
The scheduler of OMP you used - STATIC or DYNAMIC, it doesn't depend on processor type.
As I know, STATIC means that you can estimate workload of each iteration in loop, so OMP scheduler will execute you expect; DYNAMIC meansif someshort thread terminates quickly, OMP scheduler will assignother work to run.