I am having endless problems getting the VTune 2016 Beta to profile my application. The application seg faults when run in VTune with a completely unhelpful stack trace despite debugging symbols being available. The application runs fine both through my debugger and on the command line.
I am currently trying to work through its warnings to figure out what might be the problem and one thing that sticks out is the following
strace: /opt/intel_beta/vtune_amplifier_xe_2016.0.1.414512/lib64/pinruntime/glibc/libc.so.6: version `GLIBC_2.4' not found (required by strace)
Checking the folder above, libc.so.6 is a symbolic link to libc-2.3.4.so in the same folder. What worries me is that VTune is using the system strace tool and my system is on libc.2.12.so.
For reference I am running on CentOS 6.6, VTune supports CentOS 6.5 but glibc hasn't changed between these versions.
Help desperately needed!
We are going to need a little more information:
And, just as a reminder, this *is* a beta. I'm sorry it didn't work out of the box for you, but that is part of the point of a beta program. ;) If you work with us, we will try to figure out why it won't profiler your app.
I have tried the nqueens sample program (parallel version) in Fortran and with the sample program both analyses modes work fine.
Coincidentally, the Basic Hotspots mode now seems to work with my application. The code has been developed since my earlier test in the week, but I haven't resolved any bugs or other issues. What I have done though is reduce the test case. The application I am running is a time-based simulation with statistical samples, and I have reduced both the statistical samples and the length of time the simulation is to simulate.
Running the Advanced Hotspots (AH) analysis still causes VTune to crash, and any attempt to reopen the AH results folder after VTune restart causes VTune to crash again.
I did make sure to specify that the application I am profiling in VTune has a run period of greater than 15mins (as was the original test case), thereby setting the collection limit to 0 which I believe is unlimited. I haven't changed this in the updated test case which runs for less than 15mins.
Have you installed VTune under administrative priveledges so our driver was built and installed to the system for HW-based collections like advanced-hotspots or the installation was done under user account (so VTune uses perf)?
It is easy to check:
>lsmod | grep sep
Thanks & Regards, Dmitry
Yes, I installed VTune as root and made sure that the hardware driver is running and installed. The screen output from the command you suggested is:
lsmod | grep sep sep3_15 547790 0
Okay, you've shared some more info that is important. #1, you've confirmed that both analysis types work on the sample. We know the driver is installed correctly and that the instrumentation-based Basic Hotspots works on your system.
#2, you've told us that you are collecting data for more than 15 minutes. How long, exactly, are you collecting data? If you are collecting data for an hour or more, that is a going to be a problem. What type of processing is your app doing? If it is computationally intensive, you should be able to sample for a shorter period of time (during the computational phase of your app) and get a "representative" profile of what is happening during the compute time. There are several ways to limit when you collect data including, the Start Paused button in the GUI along with Resume button after starting it (so you can "resume" collection once your app is in the computational phase, then press Stop after a couple of minutes of data collection), the Collection Control API that you can insert into your source code so that data is only collected between calls to __itt_resume() and __itt_pause(), or using a smaller workload that completes within a few minutes. In general, the longer the data collection the longer finalization and results display is going to take (no surprise, right?). So try to pick a workload that causes the app to perform its typical processing but for a short duration.
#3, if you have results that regularly crash the VTune Amplifier app when you attempt to open them, please zip/tar them up and submit an issue with them attached (perhaps you have already done that?). We can analyze the results and possible determine the cause.
Ok, I will have a look profiling only the computational part of my code. Do you happen to have a guide or example to using the Collection Control API? Might I find more info in the help document somewhere?
The original simulation should have taken 15-20 mins, and certainly not hours! The problem is that the application is very RAM and computationally intensive. Typical RAM usage is around 5G.
The results I have that crash the VTune app are extremely large, and compressed amount to 1.5G of data. The runtime that generated this data was under 10mins. I have a premier support issue open with a similar request.
And thank you for your help so far :)
You are welcome. And, yes, if you do a web search (and I am saying this for the benefit of all readers) on "VTune Amplifier XE collection control API" you will find this page in the online help and possibly this article (though it is directed at Windows* users and written for the 2011 version, it is still applicable). Our help documentation is installed with the product AND available online. Using a web search that starts with "VTune Amplifier" will usually yield good results.
If you can reproduce the crash with a smaller data set, that would be great. If you "stop" data collection (use the Stop button in the GUI or see the help for info on how to stop collection from the command line, if you are using that) say after 1 min, does it still cause the crash?
Thanks, I will have a go with the collection control API which will really help with the profiling. My application has a large overhead in initialisation which I am not interested in profiling at the moment.
For reference, I found the following page which gives an example of use within Fortran which is the language I am developing in.
I will also try a smaller dataset for the Advanced HotSpot analysis and see if this is more successful for me.
I have made us of the collection control API in my Fortran code and have reduced the profiling section to a smaller set where I want to focus my efforts. This is really helpful, I will make more use of this fine-grained control!
The Basic Hotspots analysis is works fine with the ittnotify module and I can pause and resume collection at the points I want through the code execution.
Now I can also run the Advanced Hotspots to completion without VTune crashing although it doesn't seem to respond to the resume calls in the ittnotify module. At first I wondered why after the analysis VTune reported no data, but after some more investigation it's because the resume call hasn't been intercepted by the Advanced Hotspots analysis. The test case I am using works fine with the Basic analysis.
Running the Advanced Hotspots analysis and manually resuming the 'collection' it now works and correctly reports results.
Is there a different call for the Advanced Hotspots analysis within ittnotify?
No, there is nothing special about Advanced Hotspots vs. Basic Hotspots with respect to the Collection Control API. It should work. You are using the same workload for both analysis types, correct?
I just tested it on my Windows* laptop with the 2016 Beta Update 1 and didn't have any problems. I will test on Linux when I can get access to a system.
> strace: /opt/intel_beta/vtune_amplifier_xe_2016.0.1.414512/lib64/pinruntime/glibc/libc.so.6: version `GLIBC_2.4' not found (required by strace)
I don't know if this is a pin relocation error. Please have a look in this article. (Both VTune Amplifier XE and Inspector XE use PIN technology)
MrAnderson: Yes, I was using the same workload for both analysis types.
Peter Wang: I checked to see if '/etc/ld.so.preload' existed on my system, but it doesn't. As far as I am aware, I am using the system libc located in '/lib64/libc-2.12.so'. In the '/lib64' folder there is a symbolic link from 'libc.so.6' to '/lib64/libc-2.12.so'.
Ok, I tried the Concurrency analysis with a workload (test case in my terminology) that works with Basic Hotspots. In this analysis mode VTune keeps crashing. Relaoding the results file after restarting VTune also causes it to crash.
From what I can gather, the workload test has completed and VTune is collecting the results.
Is this perhaps another case where my collection results are too large for VTune? I am still confused as to why VTune crashes!
We are confused, too, since we didn't design it to crash. ;)
Basic Hotspots, Concurrency, and Locks and Waits all use PIN instrumentation to collect results (and, results are collected *during* the application run, not at the end). Advanced Hotspots without stacks only uses hardware-based sampling via the processor's PMU (performance monitoring unit). Thus, the two separate crashes are not related.
Have you uploaded at least one set of results that cause a crash to the issue your submitted at Premier Support?
Haha, at least we are in the same boat!
I haven't been able to upload a result set that caused VTune to crash previously because the examples I had were 5GB uncompressed, 1.5GB compressed! I presume I probably overloaded VTune. (That case was VTune crashing under Advanced Hotspots).
I now have a case with VTune crashing under Concurrency analysis, and this results folder is only 15MB. I will upload this to the issue I have open with Premier Support. I will also run this case with TRACE logging enabled and upload the last 3 folders from /tmp.
I understand now that the analysis modes are different, but in both cases it appears that the crash occurs after collection has completed. Let's hope it is a common issue :)
@ EWan T
1. Was it possible that your application or libraries built with static linking? This interferes that PIN cannot do dynamic instrumenting these module.
2. Result directory will be helpful, as well as dump file and stack trace.