- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I am using Vtune(2015.1.1.380310) to run bandwidth analysis on MIC. Sometime I end up with large profile data and then Vtune takes long time. I have two questions:
- today I run Vtune with 120 threads on single MIC card for simulation time of 160 seconds. After end of program, I see no progress / messages from Vtune in three hours. Is this normal? (or Vtune hang?) I can see the ample-runss and ample-python processes running on host. How to check if it is hang or not?
- this is important: If I try to kill / suspend /cacnel the Vtune analysis with Ctl^C/killing process then I have to restart the sep server. I have to ask sysadmins to do this all the time.
Are these know issues? Any workaround to safely cancel the vtune analysis with MIC? (I am familiar with pause-resume api and other ways to reduce the size of profile collection)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pramod:
Are you using the command line or GUI to start the collection? If GUI, try pressing "Cancel" button. If command line, try 'amplxe-cl -C cancel -r <resultsdir>'. Canceling will lose results, but may gracefully terminate. You could also try "stop' command.
I don't think three hours without any progress is normal. Are you writing your project and result files to a local drive or an NFS drive? Performance of the NFS drive can impact results finalization time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for quick response! I am writing to local drive which is sufficiently fast. Actually, the analysis with 60 threads finishes quickly with 300MB of profile data but 120 threads is stuck.
I am using command line. I tried your command but got following error:
amplxe-cl -C cancel -r vtune_paper_hpcopt_120t_120f_bandwidth/ amplxe: Fatal error: The specified result directory "...path../vtune_paper_hpcopt_120t_120f_bandwidth" does not provide a path to the running collection. Please specify a valid path.
I see that the Vtune is still running with following result directory:
I see running processes $ ps -aux | grep amplx kumbhar 22846 0.0 0.0 3590036 40832 pts/4 Sl+ 14:49 0:00 amplxe-cl -data-limit=0 -resume-after=120 -collect bandwidth -r vtune_paper_hpcopt_120t_120f_bandwidth --target-system=mic-native:0 --search-dir=. -- KMP_AFFINITY=verbose,balanced KMP_PLACE_THREADS=60c,2t OMP_NUM_THREADS=120 LD_LIBRARY_PATH=/opt/..../lib /opt/intel/impi/4.1.2.040/mic/bin/mpiexec.hydra -n 1 hpcopt.prof-none.mic_linux -e 2 kumbhar 22863 0.1 0.0 1292072 23100 pts/4 Sl+ 14:49 0:24 /opt/intel/vtune_amplifier_xe_2015.1.1.380310/bin64/amplxe-python /opt/intel/vtune_amplifier_xe_2015.1.1.380310/bin64/amplxe-runss.py --target-system=mic-native:0 --no-modules --log-folder=/tmp/amplxe-log-kumbhar/2015-04-13-Mon-14-49-41-446011.amplxe-cl/ --ui-output-format xml --option-file /home/k......./vtune_paper_hpcopt_120t_120f_bandwidth/config/runsa.options kumbhar 22867 0.0 0.0 11173328 31724 pts/4 Sl+ 14:49 0:15 /opt/intel/vtune_amplifier_xe_2015.1.1.380310/bin64/../bin64/amplxe-runss --ui-output-format xml --result-dir /home/kumbhar/workarena/systems/jknc/repos/bbp/coreneuron/paper/results/vtune_manual/vtune_paper_hpcopt_120t_120f_bandwidth --option-file /home/kum............/vtune_paper_hpcopt_120t_120f_bandwidth/config/runsa.options
Any suggestion?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the result directory is:
$ ls vtune_paper_hpcopt_120t_120f_bandwidth/ config data.0 runtool.22867.ipc sqlite-db vtune_paper_hpcopt_120t_120f_bandwidth.amplxe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like you should reduce sampling rate when increasing number of threads, either by increasing sample after values or expected run time in advanced section of GUI menu.
I too have had to ring up the sysadmin after hanging vtune.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The command line also supports the "estimated duration". See '-target-duration-type' and possible values. Default is 'short', meaning, one to 15 minutes.
-target-duration-type=veryshort | short | medium | long
Pramod said he already knew how to "limit data collection", so I focused on his request to "stop data collection."
Ah! But, I see Pramod has *disabled* the data limit ('-data-limit=0')!! Pramod, this is highly discouraged exactly for the reason you are experiencing!! You can try *raising* the limit, but if you remove the limit and collect a lot of data, bad things will happen! :(
What happens if you don't remove the limit (i.e., let it default to the 500 MB limit) and profile your app? Does it work? Does it not collect data for the entire run? What "elapsed time" is reported by VTune Amplifier? Do you *need* to profile the entire run, or is there initialization processing that can be skipped? You can control when data collection starts from the GUI or with command-line options, as well as the API, which you already mentioned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks again for all info! Our application has initialisation phase which was taking 140 seconds and then solver phase of 10 seconds. Initially I wasn't able to see the solver in the profile. So I put data data-limit=0 (i.e. unlimited) and -resume-after=120. This is where I made a mistake! I wanted resume after 120 seconds and not 120 milliseconds.
I could easily try above suggested options or add pause/resume api but the problem is I can't kill the Vtune due to aforementioned reason: I have to wait for sysadmins tomorrow to restart the sep server if I kill Vtune analysis :)
Has vtune collected too much data for the above run and thats the reason it's slow / not responding? not sure though as I see few MBs in the result directory after 7 hours:
$ du -h vtune_paper_hpcopt_120t_120f_bandwidth 1.6M vtune_paper_hpcopt_120t_120f_bandwidth/sqlite-db 524K vtune_paper_hpcopt_120t_120f_bandwidth/data.0 68K vtune_paper_hpcopt_120t_120f_bandwidth/config 2.2M vtune_paper_hpcopt_120t_120f_bandwidth
In short, it will be great to have clean way to terminate the Vtune analysis.
Thank you for all your quick help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all! today I removed -data-limit=0 and added -resume-after to skip first 140 seconds initialisation phase. I was able to collect the profiling results with 120 and 240 threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim Prince wrote:
In principle you can start paused and resume after 120 seconds but if you need to resume at a repeatable point adding the vtune API call is better. As Mr a said you should limit collection data set to avoid hang.
Tim is right! Using VTune Pause/Resume API is precise to collect data that you want.
Another approach is to start up collection with pause mode (“-start-paused”), in first console; then open second console to do "amplxe-cl -command resume -r r000?" - you need to specify right vtune result generated in first console. The benefit is that you don't need to insert VTune APIs in code.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page