I am using Vtune(2015.1.1.380310) to run bandwidth analysis on MIC. Sometime I end up with large profile data and then Vtune takes long time. I have two questions:
Are these know issues? Any workaround to safely cancel the vtune analysis with MIC? (I am familiar with pause-resume api and other ways to reduce the size of profile collection)
Are you using the command line or GUI to start the collection? If GUI, try pressing "Cancel" button. If command line, try 'amplxe-cl -C cancel -r <resultsdir>'. Canceling will lose results, but may gracefully terminate. You could also try "stop' command.
I don't think three hours without any progress is normal. Are you writing your project and result files to a local drive or an NFS drive? Performance of the NFS drive can impact results finalization time.
Thanks for quick response! I am writing to local drive which is sufficiently fast. Actually, the analysis with 60 threads finishes quickly with 300MB of profile data but 120 threads is stuck.
I am using command line. I tried your command but got following error:
amplxe-cl -C cancel -r vtune_paper_hpcopt_120t_120f_bandwidth/ amplxe: Fatal error: The specified result directory "...path../vtune_paper_hpcopt_120t_120f_bandwidth" does not provide a path to the running collection. Please specify a valid path.
I see that the Vtune is still running with following result directory:
I see running processes $ ps -aux | grep amplx kumbhar 22846 0.0 0.0 3590036 40832 pts/4 Sl+ 14:49 0:00 amplxe-cl -data-limit=0 -resume-after=120 -collect bandwidth -r vtune_paper_hpcopt_120t_120f_bandwidth --target-system=mic-native:0 --search-dir=. -- KMP_AFFINITY=verbose,balanced KMP_PLACE_THREADS=60c,2t OMP_NUM_THREADS=120 LD_LIBRARY_PATH=/opt/..../lib /opt/intel/impi/4.1.2.040/mic/bin/mpiexec.hydra -n 1 hpcopt.prof-none.mic_linux -e 2 kumbhar 22863 0.1 0.0 1292072 23100 pts/4 Sl+ 14:49 0:24 /opt/intel/vtune_amplifier_xe_2015.1.1.380310/bin64/amplxe-python /opt/intel/vtune_amplifier_xe_2015.1.1.380310/bin64/amplxe-runss.py --target-system=mic-native:0 --no-modules --log-folder=/tmp/amplxe-log-kumbhar/2015-04-13-Mon-14-49-41-446011.amplxe-cl/ --ui-output-format xml --option-file /home/k......./vtune_paper_hpcopt_120t_120f_bandwidth/config/runsa.options kumbhar 22867 0.0 0.0 11173328 31724 pts/4 Sl+ 14:49 0:15 /opt/intel/vtune_amplifier_xe_2015.1.1.380310/bin64/../bin64/amplxe-runss --ui-output-format xml --result-dir /home/kumbhar/workarena/systems/jknc/repos/bbp/coreneuron/paper/results/vtune_manual/vtune_paper_hpcopt_120t_120f_bandwidth --option-file /home/kum............/vtune_paper_hpcopt_120t_120f_bandwidth/config/runsa.options
Looks like you should reduce sampling rate when increasing number of threads, either by increasing sample after values or expected run time in advanced section of GUI menu.
I too have had to ring up the sysadmin after hanging vtune.
The command line also supports the "estimated duration". See '-target-duration-type' and possible values. Default is 'short', meaning, one to 15 minutes.
-target-duration-type=veryshort | short | medium | long
Pramod said he already knew how to "limit data collection", so I focused on his request to "stop data collection."
Ah! But, I see Pramod has *disabled* the data limit ('-data-limit=0')!! Pramod, this is highly discouraged exactly for the reason you are experiencing!! You can try *raising* the limit, but if you remove the limit and collect a lot of data, bad things will happen! :(
What happens if you don't remove the limit (i.e., let it default to the 500 MB limit) and profile your app? Does it work? Does it not collect data for the entire run? What "elapsed time" is reported by VTune Amplifier? Do you *need* to profile the entire run, or is there initialization processing that can be skipped? You can control when data collection starts from the GUI or with command-line options, as well as the API, which you already mentioned.
thanks again for all info! Our application has initialisation phase which was taking 140 seconds and then solver phase of 10 seconds. Initially I wasn't able to see the solver in the profile. So I put data data-limit=0 (i.e. unlimited) and -resume-after=120. This is where I made a mistake! I wanted resume after 120 seconds and not 120 milliseconds.
I could easily try above suggested options or add pause/resume api but the problem is I can't kill the Vtune due to aforementioned reason: I have to wait for sysadmins tomorrow to restart the sep server if I kill Vtune analysis :)
Has vtune collected too much data for the above run and thats the reason it's slow / not responding? not sure though as I see few MBs in the result directory after 7 hours:
$ du -h vtune_paper_hpcopt_120t_120f_bandwidth 1.6M vtune_paper_hpcopt_120t_120f_bandwidth/sqlite-db 524K vtune_paper_hpcopt_120t_120f_bandwidth/data.0 68K vtune_paper_hpcopt_120t_120f_bandwidth/config 2.2M vtune_paper_hpcopt_120t_120f_bandwidth
In short, it will be great to have clean way to terminate the Vtune analysis.
Thank you for all your quick help!
Thank you all! today I removed -data-limit=0 and added -resume-after to skip first 140 seconds initialisation phase. I was able to collect the profiling results with 120 and 240 threads.
Tim Prince wrote:
In principle you can start paused and resume after 120 seconds but if you need to resume at a repeatable point adding the vtune API call is better. As Mr a said you should limit collection data set to avoid hang.
Tim is right! Using VTune Pause/Resume API is precise to collect data that you want.
Another approach is to start up collection with pause mode (“-start-paused”), in first console; then open second console to do "amplxe-cl -command resume -r r000?" - you need to specify right vtune result generated in first console. The benefit is that you don't need to insert VTune APIs in code.