I've been using VTune for years and find it invaluable for profiling native code. The ability to profile Python, either "purely" or in a mixed setting, is also extremely useful. Unfortunately, this capability has always been a bit temperamental, and in more recent versions of VTune, I haven't been able to get it to work at all. In particular, although I can usually manage to extract native-level information from a run using Python, I am completely unable to get that information to associate with Python source code.
I note that my difficulties are shared by another forum user. Indeed, that post mirrors my experiences quite closely. Unfortunately, that post contained no clear resolution to the problem.
To be clear, I realise that there is a galaxy of possible system configurations, and of course I don't expect the VTune developers to anticipate all of them. But it would be very helpful if we could establish which configuration patterns SHOULD work, so that (again, as suggested by the other poster in the previous thread) I can focus my efforts on finding and eliminating potential environment problems at my end.
So I'll give a MWE for my particular case, but I would like to first ask a series of somewhat open-ended questions, and maybe start a discussion that could be really valuable to others in the future. Here goes:
1) Roughly speaking, what mechanism is VTune using to sample the Python runtime and attribute those samples to .py source lines? In particular, does this mechanism rely on any of the standard (Python) libraries? (Of course, that is assuming that there is some kind of explicit interaction with the Python runtime during sampling..?) I ask this so that I can better troubleshoot in case particular code, C/C++ extensions, etc. are somehow interfering with that mechanism.
2) Precisely which versions (at least up to minor version, i.e. 3.7, 3.8, etc.) of the standard CPython environment are known to be compatible with modern versions of VTune? (Let's say VTune 2020 and later.) If there are modern versions of Python which are NOT expected to work with one or all versions of VTune, why are they not? Is it due to versioned incompatibilities in some underlying protocol (perhaps relevant to Q1 above), or is it simply that VTune accesses API functionality in the Python shared library which might change between major Python versions?
3) Are there any known incompatibilities with build options for the standard CPythons? For example, I usually compile Python executables using the LTO and profile-guided optimisation exposed through the standard configure script.
4) Are there any anticipated issues using symlinks to binaries, specifically those created in virtualenvs? For example, if I have a virtualenv in /home/person/.virtualenvs/main_venv, and run VTune with the binary path given as /home/person/.virtualenvs/main_venv/bin/python3, which is actually a symlink to /usr/bin/python3, should this work? (I would expect so, but if not, it would be good to know, since virtualenvs are so central to modern Python development workflows.)
5) It seems like the use of AMPLXE_RUNTOOL_OPTIONS=--no-altstack is effectively mandatory for profiling scripts run in modern Python executables. Are there any potential downsides or failure cases associated with this option, or is it safe to use "everywhere, everywhen"? (Also, if it is mandatory, could this perhaps be exposed clearly through the GUI? I don't think it currently is, so the prospective user has no way of knowing about it until they first encounter problems and do the requisite search.)
Now to my particular case. I'll give a minimum example demonstrating the problems I'm having. I'll try and give as much information as possible, so please forgive me if this is a lot of text to parse. If there is further information required, please don't hesitate to let me know.
I am running Ubuntu 18.04 LTS, with Intel Parallel Studio 2020 Update 2 (so VTune 2020 Update 2). I am running with the relevant environment variables set in vtune-vars.sh -- specifically, the PATH and VTUNE_PROFILER_2020_DIR variables are updated and set according to that script. I have built and installed the hardware sampling drivers without problems. The output from the vtune-self-checker.sh script indicates that all profiling types are functioning normally, except for Linux kernel analysis, which I don't need.
I'm using a stock version of CPython 3.7.9, obtained from the Python website in tar.gz format and compiled with LTO and profile-guided optimisation. Compilation was done using the version of GCC 7.5.0 available through apt, with precise version string "gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0".
The Python installation doesn't live in the standard /usr location, but is rather stored in /home/myusername/packages/python-3.7.9 and exposed via a modulefile, which sets (among others) PATH and LD_LIBRARY_PATH appropriately.
I have created a virtualenv called test_venv using this binary, activated it, and installed numpy 1.21.1 into it. This setup is similar to our standard development setup, and works flawlessly in normal day-to-day operation.
I have tried to profile the following toy script. This script runs for a few seconds on my machine, which should be sufficient (I expect) to generate enough useful sampling information.
import numpy as np def do_work(n): arr = np.random.random(size=(n, n)) for i in range(100): arr = np.dot(arr, arr) def main(): do_work(2000) if __name__ == '__main__': main()
When considering this script, please note that I have tried to follow the instructions found elsewhere on this forum regarding "sufficient stack depth" such that the script can be usefully profiled. The stack here should be at least three levels deep -- the top-level module code calling the main() function, which in turn calls the do_work() function. (If this isn't a sufficiently deep stack, exactly how much deeper must it be?)
Using the VTune GUI, my workflow is as follows:
1) Launch the GUI using "AMPLXE_RUNTOOL_OPTIONS=--no-altstack vtune-gui"
2) Create a new project, which is placed in /home/my_username/vtune_bug/test_project
3) Launch a Hotspots analysis using user-mode sampling on localhost. The application given is the full (NOT relative) path to the virtualenv binary, i.e. /home/my_username/.virtualenvs/test_venv/bin/python3. The application parameters are just the full path to the script: /home/my_username/vtune_bug/script.py. I have unchecked "Use application directory as working directory", and set the value of "Working directory:" as /home/my_username/vtune_bug/ (i.e. the location of the script). Under Advanced, I select Mixed for code profiling mode. All other values are left as default.
4) Run the analysis by clicking the Play button. The analysis runs, my cores spin up, the script finishes, and the analysis is finalised. The finalisation contains warnings about debugging information for some numpy libraries, but the end result is "Finalization complete with warnings".
5) I go to the Bottom-up analysis window, which is set to the default "Function / Call Stack" view. Here, I see the standard native function profiling information, but absolutely nothing in terms of Python source.
I have repeated the above for all of the four available code profiling modes (Auto, Native, Mixed, and Managed). Obviously, I wouldn't expect anything from Native, but it seems not to matter which I choose -- the results from all are identically devoid of Python-level information.
So. Is there anything here that I am doing which is clearly incorrect? I'm happy to experiment with different scripts, values, etc. -- please just let me know. I can also provide screenshots if that helps.
One final comment: In general, I cannot use the Intel Python distribution in my work, primarily due to various unavoidable conflicts between the MKL version it bundles and that/those I use in other software. However, just in case my own Python is to blame, I have attempted to perform the above workflow using the IPD binary from Parallel Studio 2020 Update 2, using the optimised Numpy 1.18.5 that comes with it. In this case, the only difference is that I'm passing the full path to that binary rather than the virtualenv binary. The output in the naive profiling is slightly different, as is to be expected, but there is still no attributed Python source information.
Thank you very much in advance for any assistance you can provide!
I wondered if there was any update on this?
I know it's quite a broad question and might take some time, so I don't want to be pushy.
We are sorry for the delay in response.
> Could you please share with us the reason for using Vtune 2020?
> Could you please try with the latest version vtune and check whether you are able to get the information to associate with Python source code.
We will try to answer 2nd and 4th questions you raised in the thread
2) Precisely which versions (at least up to minor version, i.e. 3.7, 3.8, etc.) of the standard CPython environment are known to be compatible with modern versions of VTune?
A: As mentioned in the solution of the thread link you posted, VTune does support the stock python 3.8 and IDP 3.7 for Launch mode only. Attach mode may cause problems. The User-Mode Hotspots, Threading and Memory Consumption analysis are supported. You will not see Python user code if you run the Hardware event-based Hotspots analysis.
3) We were able to profile a sample application built using
cpython: 3.11(https://github.com/python/cpython )
We are checking whether there are any known incompatibilities with build options.
4) Are there any anticipated issues using symlinks to binaries, specifically those created in virtualenvs?
A: There are no issues using symlinks to binaries in virtualenvs. We confirmed it from our end. Steps followed:
To create virtualenv:
pip install virtualenv virtualenv virtualenv_name virtualenv -p <python path> virtualenv_name source virtualenv_name/bin/activate
ln -s target_path link_path
We are discussing 1), 3), and 5) questions with the internal team, we will get back to you as soon as we get an update.
Hello-Apologies for the delayed response. I have reviewed the issue and went ahead and filed a bug directly with the VTune development team. I will provide an update as soon as they are able to review the issue. Thanks!
Hello- as I mentioned, I filed this issue directly with the Intel VTune development team and we do not actually support VTune 2020 anymore. Can you please try again with the latest version of VTune (2021.8.0) and let us know if the issue still appears? Thanks!
Thanks for the response. I apologise for being slow to reply myself.
I have tried this with VTune 2021.5.0 and Python 3.7.9. It seems that the source attribution works for the test script I posted above, but it's still pretty unreliable for more complicated pieces of code. Nevertheless, since I don't have time to engineer a minimal working example, I guess you can close this ticket with my thanks.
I do think that it would be very helpful to the community to have more information available about how VTune is actually interfacing with Python to do source code attribution, and I would politely suggest that the documentation regarding explicitly supported versions of Python could be improved. I'm still completely unclear as to why this wasn't working before. In particular, I'm not sure whether this was an actual bug in VTune, something to do with my environment or setup, if "something" in the dependency path changed in the meantime, etc.