Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5116 Discussions

Locks and Waits analysis seems to loose data

wfmitchell
Beginner
1,228 Views
I'm new to VTune Amplifier XE 2011. I attempted to do a Locks and Waits analysis on my Fortran OpenMP code and find that the worker threads seem to be missing a lot of events, as in this screenshot.

The OpenMP parallel sections are indicated by the blue arrows in the master thread. The other threads should have all the same parallel sections, but as can be seen, each one quits showing them at some point with the rest of the time being reported as waiting for the barrier at it's last blue arrow. The same is true if I do a Concurrency analysis, but if I do a Hotspots analysis it shows all of the OpenMP parallel sections correctly in all threads. Does anyone have any idea what is going on here?

Intel VTune Amplifier XE 2011.4 build 176374
no kernel driver, installed as non-root
ifort 11.1 20100806
Linux CentOS 5.5 kernel 2.6.18-238.19.1.el5
Dell Precision M4500 with quad core Core i7 and hyperthreading (8 virtual cores)
OMP_NUM_THREADS 4
0 Kudos
15 Replies
wfmitchell
Beginner
1,228 Views
OK, can someone tell me how to insert a jpeg image? The "insert/edit image" button in this text composer didn't work.

Thanks,
wfmitchell
0 Kudos
Peter_W_Intel
Employee
1,228 Views
After you uploaded jpeg file, please insert it into the text.

Also you can use "Add Files" to add jpeg file as attachment.

It's better that you can provide test code, so other can reproduce this problem and investigate why.

Thanks, Peter
0 Kudos
Peter_W_Intel
Employee
1,228 Views
Here I have an example program (compute primes in OMP parallel...). See attached files (built with ifort 12.0 and use VTune Amplifier XE Update 4)

[root@NHM02 peter]# source /opt/intel/compilerpro-12.0.0.048/bin/compilervars.sh intel64

[root@NHM02 peter]# ifort -g -openmp -openmp-report -fpp openmp_sample.f90 -o openmp_sample.ifort
openmp_sample.f90(82) (col. 7): remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
openmp_sample.f90(73) (col. 7): remark: OpenMP DEFINED REGION WAS PARALLELIZED.

[root@NHM02 peter]# source /opt/intel/vtune_amplifier_xe_2011/amplxe-vars.sh
Copyright (C) 2009-2011 Intel Corporation. All rights reserved.
Intel VTune Amplifier XE 2011 (build 176374)

[root@NHM02 peter]# amplxe-cl -collect locksandwaits -- ./openmp_sample.ifort
Warning: Symbol file is not found. The call stack passing through the module /opt/intel/composerxe-2011.0.048/compiler/lib/intel64/libiomp5.so may be incorrect
Range to check for Primes: 1 10000000
We are using 8 thread(s)
Number of primes found: 664579
Number of 4n+1 primes found: 332181
Number of 4n-1 primes found: 332398
Using result path `/home/peter/r000lw'
Executing actions 74 % Generating a report

Summary
-------

Average Concurrency: 6.841
Elapsed Time: 0.540
CPU Time: 3.200
Wait Time: 0.686
Executing actions 99 % done

It seem that all OpenMP* parallel sections can be displayed, andwait time &counts of barriers & join (sync-objs) alsowere displayed.


PMU event counts are not inLocksandWaits analysis.

Isit your application specific issue?

Regards, Peter

0 Kudos
wfmitchell
Beginner
1,228 Views
I guess I'm just completely clueless about the text editor used by this forum. Please forgive my ignorance.

What do you mean by upload a jpeg file? How do I do that, or where do I upload it to?

The "insert/edit image" button in the text editor asks for a URL for the image. I put in "file:///local/..." giving the full path name of the file expecting it would then attach the file the same as other programs, like my email client. But it just inserted an html link based on that address, which of course won't work for a file on my local disk.

The "add files" button brings up a window that says "ALL folders" in the upper left, has a box in which I can type and "search folder" in the upper right, and a box in which I can type followed by "create folder" and "delete" in what appears to be the main window area. Beats me how I use this to add a file. I typed in the full path to the directory containing the file in the box in front of "search folder" and clicked on "search folder", which only cleared the box.

I'm using firefox 3.6.18 on Linux as my browser.

Thanks for any further assistance to educate me on this text editor,
wfmitchell
0 Kudos
wfmitchell
Beginner
1,228 Views
With openmp_sample I get similar results to what you show, except that the master thread shows up as Running. I also get expected results with the Locks and Waits tutorial. So it is probably application specific. One difference with my code from your sample and the tutorial is that I have many OpenMP parallel regions interspersed with sequential code rather than a single parallel region.

The adaptive finite element code is over 100,000 lines, so I can't just post it here, and it will take a while to reduce it down to something reasonable. But if you want to try the full code, you can download it from http://math.nist.gov/phaml and buld it as follows:

tar -xzf phaml-1.9.1.tar.gz
cd phaml-1.9.1
./mkmkfile.sh F90 intel PARALLEL openmp PARLIB none GRAPHICS none
edit src/Makefile to add -g to FFLAGS and CFLAGS
make
cd examples/simple
edit Makefile to add -g to FFLAGS
make

then use phaml as the target for the VTune analysis

Thanks,
wfmitchell
0 Kudos
Peter_W_Intel
Employee
1,228 Views
Just use "Add Files" button to browse a file to be uploaded (you may create a folder first).

After doing above, you cando "Add to Editor", or do "Add as Attachment".

I'm using IE8, but not sure for Firefox - at least you can add as attachment, am I right?

Regards, Peter

0 Kudos
Peter_W_Intel
Employee
1,228 Views
Thanks for the URL of the example, and instructions to build.

I still have to go the directory examples/simple anddo "make", generate phaml, for simple test.

Then do : amplxe-cl -collect locksandwaits -- ./phaml, it seemed everythingwas OK.

It did make sense on difference about OMP parallel regions between my example and your phaml. Because OMP parallel code in my example continually worked, but OMP parallel code in yourphaml worked intermittently.



Regards, Peter



0 Kudos
wfmitchell
Beginner
1,228 Views
Nope. The add files button in the text editor opens a window similar to what you show here, but there is no way to navigate through directories with it. It just has the "search folder" and "create folder" text boxes, and clicking on search folder just clears whatever I typed in the text box. Maybe it just doesn't work with Linux/Firefox. I'll give it a try with IE but it will be a few days before I have access to a Windows computer.
0 Kudos
wfmitchell
Beginner
1,228 Views
Thanks for running my program and showing the result. That looks like what I expect. Wish I could show you what mine looks like :( All the worker threads stop showing "running" or "OpenMP Regions" pretty early, say around .15s to .2s on the image above, and show it in a "wait" state for the rest of the timeline.

Given that it works correctly on your computer, I must have something wrong in my installation or environment or something. Any clues on how to track that down? How closely can you approximate my actual environment, given in the original post? For the Locks and Waits analysis, it shouldn't matter that I don't have a kernel driver, right?

Thanks,
Bill
0 Kudos
Peter_W_Intel
Employee
1,228 Views
For Locksandwaits analysis and other user mode sampling collections, there is no vtune driver required.

If you have concerns (says around .15s to .2s, all the worker threads stop showing "running" or "OpenMP regions" pretty early) on your results, please zip/attach result directory - I would like to look into.

Collect additional information aboutyour system and environment run feedback tool,
amplxe-feedback -create-bug-report

Regards, Peter
0 Kudos
wfmitchell
Beginner
1,228 Views

I finally got access to a Windows machine so I could try posting with IE -- same thing. Above the text editor window there are two buttons, "Toggle HTML/Visual Editor" and "Add Files". When I click on "Add Files" a window pops up. Across the top in a blue field, on the left is a label "Add Files", which does nothing, and on the right is "close" which closes the window when clicked. Below that is a white area with "ALL Files" on the left, which is just text that does nothing when clicked, and on the right is a white text box followed by a grey button "search folder". If I type text in the box and click "search folder" it just clears the box. Below that is a blue area with a white text box on the left followed by two buttons "create folder" and "Delete". That's it. I'm seeing exactly the same thing in Windows/IE and Linux/Firefox.

Maybe we could take this offline where I can just email you the images and requested files.

Bill

0 Kudos
mostlyAtNight
Beginner
1,228 Views
Hi Guys,

I think I'm experiencing the same issue.

I'm running VTune XE 2011 (build 186533) to try and determine why I'm not getting the speedup I'd hope for with OpenMP.

I've tried using both STATIC and DYNAMIC schedulers on my parallel loops but with both, VTune is showing OMP Worker Thread 1 # (I'm only using 2 threads) to only occasionally be taking part in the work. When it's not shown as working, it's marked as Waiting.

It would be useful to know if this is:

a) A problem with my application (great - let's fix it and my code runs faster) or...
b) A bug in VTune (not so great)

Kind regards,

Pete
0 Kudos
Peter_W_Intel
Employee
1,228 Views
It doesn't make sense to use 2 threads for STATIC and DYNAMIC schedulers, please use more threads - for example, "export OMP_NUM_THREADS=8"

Please use latest Update 5, which has some issues fixed for OMP.

Regards, Peter
0 Kudos
mostlyAtNight
Beginner
1,228 Views
Hi Peter,

Thanks for your response.

What scheduler would you recommend to use on a Core 2 Duo processor?

I will update my compiler and VTune to the latest versions and re-try.

Thanks for the suggestions,

Kind regards,

Pete
0 Kudos
Peter_W_Intel
Employee
1,228 Views
Hi Pete,

The scheduler of OMP you used - STATIC or DYNAMIC, it doesn't depend on processor type.

As I know, STATIC means that you can estimate workload of each iteration in loop, so OMP scheduler will execute you expect; DYNAMIC meansif someshort thread terminates quickly, OMP scheduler will assignother work to run.

Regards, Peter
0 Kudos
Reply