The system I'm currently profiling is a Linux AWS instance, it's a Scala app (so running on JVM) running in a docker container. When I profile using a command line like:
vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -finalization-mode=full
The majority of the recorded time is listed as 'Outside any known module', and covers the code that I'm mainly interested in profiling. I had initially suspected that it was as a result of the code being built without debug symbols, however I built a new version with both Java and Scala compilers set to record full debug symbols, and this didn't improve things.
Can you advise on what I'm missing that could enable me to get full stack info via VTune? I could remove the application from the docker container if that would help, but this is slightly more complicated than it sounds in a prod-like environment and so I'd prefer to avoid it at this stage unless I know it's got a good chance of success.
Could you please check , If kernel pointers information was explicitly hidden by setting the kptr_restrict to a non-zero value, hardware event-based analysis results may not contain functions from kernel modules. As a result, you may see the CPU time associated with the [Outside any known module] item. To workaround this problem for the current session, set the contents of the /proc/sys/kernel/kptr_restrict file to 0 before starting the VTune Profiler as follows:
sysctl -w kernel.kptr_restrict=0
In addition to the post above, please refer to this article https://software.intel.com/en-us/vtune-cookbook-profiling-in-docker-container to know about profiling a Java app running inside Docker container.
Hi Arun and Denis,
Apologies for the delayed response, I was away on Friday and busy on other tasks on Monday.
I had already set kernel.kptr_restrict to 0 (and baked it into the image being used for the profiling runs), I have rechecked to make sure it is actually applied in a running instance before profiling and it was set correctly.
I had read through your cookbook article Denis (this was what I was basing my profiling around), obviously there are some differences like the actual application running in the container; I also had to run vTune on the command line on the AWS instance because the way our images are setup leads to me needing to run VTune with sudo which I can't do via the GUI from a remote machine. The command line I posted above should be equivalent to the setup noted in your article (I did generate the command line from my workstation), is that correct?
Another thing on the permissioning side with docker; I initially setup docker to run with '--cap-add=SYS_PTRACE', but I also just did a run using '--privileged', but get the same results where all the interesting data is 'Outside any known module'
Do you know of anything else I should be looking out for? If nothing else springs to mind, then I think my final attempt to get it working will be to break down the docker image and run the app directly on the AWS instance - I'm not sure whether that will make a difference, but it does at least remove a layer.
Could you please try the following things and let us know if you see Outside any known module
1) Profile your Scala app using "Attach to Process" mode
2) Remove your Scala app from the docker container and profile it with VTune using "Attach to Process" and "Launch" modes
What does "java --version" say?
Sorry for the delay in getting back to you. Just before your message, I had finally removed the app from the Docker container; this yielded slightly more information when running system profiling, however I was still getting the majority of information 'Outside any known module'. Subsequently running VTune with 'Attach to Process' seemed to eliminate 'Outside any known module' (or the parts that were, were too small to notice in the overall profiling results). So we do have some useful results now.
One thing I do get is 'Skipped stack frames' - does form a reasonable amount of time from the profiling sessions. If I understand correctly, this comes about because we're not collecting stacks deep enough - I did set it to 4096 bytes, but this is the maximum allowed without the VTune Sampling Driver installed. Am I correct in thinking that, given that I'm running on a non-metal AWS instance which is virtualised, it isn't possible to use the Sampling Drivers?
On your other question, the Java version is:
OpenJDK Runtime Environment Zulu11.29+3-CA (build 11.0.2+7-LTS) OpenJDK 64-Bit Server VM Zulu11.29+3-CA (build 11.0.2+7-LTS, mixed mode)
VTune supports "Profile System" for Java if it runs inside Docker container only. It is expected that you see [Outside any known module] if you run "Profile System" for a Java app running outside Docker and consuming most of CPU.
What type of analysis for Java you are interested - User-mode Hotspots, Hardware event-based Hotspots, Threading or something else? Could you please try User-mode Hotspots which is not limited by stack size. Also for better quality of stack walking please use -Xcomp flag for java.
Running the system profile while the service was running in Docker was the first thing I had tried, but I found I was getting the [Outside any known module] there. The app actually runs fairly light on the target system ~50% of available resources as we see an increase in latency if we go much above that for any individual instance.
We were looking at Hardware event-based Hotspots and also wanted to check Threading (not done yet). I am going to try user-mode as, for the instance type we usually run on, we don't have a full board or even socket to ourselves and so user-mode is the only option. Based on what you've said, it sounds like it would be beneficial. I'll try the -Xcomp flag also as you suggested.
Thanks again for the advice!