We have a simple Go and DPDK application that listens only for interrupt signals. I'm trying to setup vtune analysis via CLI/remote on a target Linux machine running the app. I'm using vtune_amplifier_2019.4.0.597835.
For the command
./amplxe-cl -collect hotspots \
-result-dir ~/vtune_results/run1 \
-target-process nff-go-nat \
-knob sampling-mode=sw \
-search-dir ~/go/src/github.com/intel-go/nff-go-nat/ \
I see the error, despite using the workaround "-run-pass-thru=--profiling-signal=40".
amplxe: Error: Application sets its own handler for signal 38 that is used for internal needs of the tool. Collection cannot continue. Refer to the Troubleshooting section of the online help for possible workarounds.
amplxe: Collection failed.
amplxe: Internal Error
Any suggestions will be much appreciated.
my suggestion is to switch on using hw-based hotspots analysis. An example of command line will look like:
./amplxe-cl -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -target-process nff-go-nat
Thank you for the response. Is there a way I can still get true software call stack analysis? Also I read that VTune provides extended support to Go runtime. Any suggestions to enable that, currently the configuration assumes that the code is Java/Python based?
What do you mean by true software call stack? The command line that I wrote previously contains a knob to collect stacks. If stack quality is not enough (it can be due to not enough stack size parameter) then my suggestion is to install intel drivers and set stack size to unlimited value (this option is available in last releases). Also please check the following topic in a part relating to current limitations: https://software.intel.com/en-us/vtune-amplifier-help-go-applications-support.
>>Also I read that VTune provides extended support to Go runtime. Any suggestions to enable that, currently the configuration assumes that the code is Java/Python based?
please give me a context what kind of extended support you've read about. Actually I don't remember any special setting for Go runtime, all what we have should work by default.
Thank you. The command line worked for me. Some questions which may be outside of this thread.
>> please give me a context what kind of extended support you've read about. Actually I don't remember any special setting for Go runtime, all what we have should work by default.
1. I was referring to the same link - https://software.intel.com/en-us/vtune-amplifier-help-go-applications-support.and assumed there is a Golang option for "mrte-type" knob setting.
2. Will the call stack info from usermode sampling be different from hw sampling data. I assume hw sampling profiles the system and uses PMU counters. Is there any other difference?
3. Also any suggestions as to why the Golang app is crashing? Are there any changes required from the app itself?
4. Can you point me to additional intel drivers for call stack analysis?
5. The app does CGO calls to DPDK libraries. Is there a way to include source code/binaries to get more call stack info? I used source-search-dir and search-dir but I hope they are recursive and can find dpdk sources.
The 2 is a discussion question. The short answer is that it (stack info) can be different so as different collectors have different mechanism of stack collectiion. The difference in quality should look like [skipped frames] at the top of stack (this relates to hw-based sampling and itcan be solved by increasing stack size parameter up to unlimited), the difference in the middle or bottom of stack is considered like a bug. Also the difference can be observed for runtimes like OpenMP and TBB, sw-based (user-mode) stack sampling collector has a special feature for the Intel parallel runtimes knowing as stack stitching. Also a famous user visible difference is that hw-based sampling can gather stacks from ring 0.
I think that all these differences are not your case.
Sorry, I haven't any ideas about the question 3. Do you observe crash under collection? if not then the best advise will be just to try to debug it, as I know gdb supports Go runtime. In case the crash reproduces only under collection then you have to give us additional details.
4 is quite simple, you have sepdk directory in the vtune package. The instruction to build it here: https://software.intel.com/en-us/vtune-amplifier-help-building-and-installing-the-sampling-drivers-for-linux-targets . As a short instruction you have to go to <vtune>/sepdk/src and use:
./insmod-driver -g <user_group>
In most cases this is enough. And as I said previously to use sampling with drivers you have to set stack size to unlimited value.
Relating to 5 using the search-dir is a right approach. If it is too hard to set it each time then my suggestion is to use GUI where you can set them once for a project and then just collect required results.
Thank you Vladimir.
I could sync up with you internally to work on the issues related to user mode sampling, if needed. We observe the crash only upon collection.
We assume the bottle neck is from some of the cgo calls in the app - part code implemented in C and part code involving dpdk libraries. So from our profiling, we would like to further understand if the bottleneck is from part of the code implemented in C after the cgo calls. I tried search-dir and source-search-dir but the stack trace ends at the entry point of cgo calls. I see them as runtime.asmcgocall. Can this be refined further?
Also I see some skipped stack frames. For stack size it says the default value is 0 which indicates the stack size is unlimited. Should I provide this explicitly? https://software.intel.com/en-us/vtune-amplifier-help-stack-size
>>I could sync up with you internally to work on the issues related to user mode sampling, if needed. We observe the crash only upon collection.
Yes, let discuss it offline. And relating to runtime.asmcgocall question I think we have to discuss it offline too, I'll join somebody of Go experts if you're still interesting in this.
>>Also I see some skipped stack frames. For stack size it says the default value is 0 which indicates the stack size is unlimited. Should I provide this explicitly? https://software.intel.com/en-us/vtune-amplifier-help-stack-size.
The default stack size is 1024 so yes you have to set it explicitly. (Depending on VTune update you have to set 0 or "unlimited" value).