I'm evaluating the usage of vtube 9.1 to check the performance of modules under Linux.
We wrote a component which runs in IBM WebSphere Message Broker under linux RH. The component is a shared library which isactivated by WMB. Running sampling of the code wasnot sufficient since I could see the problem in one of the down most functions, however sampling didnot provided me with call graph (or relevant path to the suspected function).
Hence I have tried to work with the callgraph. Unfortunately running the WMB with the callgraph caused the parent program which calls my module to function improperly.
The chain of reaction is as follows:
Using a script: ./mqsistart WMB_BK2 - to start the process ...
The script actually launches ./mqsistart.bin WMB_BK2 ...
This mqsistart.bin launches DataFlowEngine process which actually calls my module
Onemore pointto add is that the DataFlowAngine internally uses a java VM.
When running this code under callgraph the DataFlowEngine refuses to work correctly, it starts report an error to stdout `Faild to find VM - aborting`
I could see DataFlowEngine restarted few times (probably by some watch dog process which was also launched) during this short episods that DataFlowEngine was running I could see callgraphs where created but it was useless or me since my code did not have the chance to start during this short periods
I have made callgraph use the same java as WMB, run the callgraph in batch mode vtl, disabled all function instrumentation ... nothing helped.
As last resort I have tried other callgraph profiler Valgrind (open source) and surprisingly Valgrind worked fine ...
Any clue to what am I doing wrong ?
Sorry that I'm not familiar with IBM* WebSphere,and don't knowwhat is JVM version you used.
I provide you below info from release notes - hope it helps.
1. Supported JVMs
On IA-32 systems:
- Sun* J2SE 5.0 and 6
- IBM* JDK 1.4.2 and 5.0
- BEA JRockit 1.4.2, 5.0 and 6
On systems with Intel 64:
- Sun J2SE 5.0 and 6
- BEA JRockit 5.0 and 6
2. Call graph limitation
Back to yourproblem - "DataFlowAngine internally uses a java VM.
When running this code under callgraph the DataFlowEngine refuses to work correctly, it starts report an error to stdout `Faild to find VM - aborting`" I think that you have to add option "-agentlib:java=cg"in DataFlowAngine which internally calls to JVM.
Thanks for the answer however I can not pass the said parameter to the DataFlowEngine since the chain of reaction is that is that I activate the following
Script-A --> launches Process-A
Process-A --> launches Process-B
Process-B --> launches DataFlowEngine
I insisted to continue investigating this issue since I really want it to work. Well I have discovered the following interesting behaviour.
The error message 'Failed to load VM - aborting' is returned by the function JNI_CreateJavaVM() fromlibjvm.so while it tries to launch thejavaVM inside the native code of DataFlowEngine.
Looking at a correct activation of DataFlowEngine (without VTune) I have descovered that two differentversions of this library are concurrently loaded into the DataFlowEngine prcoess
One from ..../jre15/bin/classic/libjvm.so and the other from ..../jre15/bin/j9vm/libjvm.so
Looking at the VTune cache (and the report during callgraph) I can see that the only version which is used when the program runs under VTune is the one in ..../jre15/bin/classic/libjvm.so. I could not look into the DataFlowEngine when it is under VTune because it stops before I get the chance to investigate it.
I have tried to add the ..../jre15/bin/j9vm to the LD_LIBRARY_PATH but it did not help.
Also looking at the VTune cache I understand that it would not help since the name libjvm.so is used in the cache so only one version can co-exist
To find a way around I even tried to force DataFlowEngine to use only a single copy (by eliminating one of the copies), but this broke DataFlowEngine and it responded in similar way to VTune by reporting 'Failed to load VM - aborting'.
Well I wanted to see if I could instract VTune to ignore instrumenting this libjvm.so but I did not find any way of doing it. (I could not even disable the functions inside libjvm.so since the GUI was allowing only 'minimal' and the function button was grayed - disabled)
I have even tried to remove manually the libjvm.so from the VTune cache (while is was instrumenting my program)but VTune insisted to recreate it again.
Well - Iguess the solution would be either to avoid (some how) the instrumentation of this libjvm.so and have VTune allowing DataFlowEngine use those two original libraries, or have VTune support multiple instances of the same library.
Thank you for detail description.
I don't think that VTune Analyzer can support directly on your working mode - based onyour current call sequence, there is no opportunity for the tool to pass (or manually pass) java option such as "-agentlib:javaperf=cg" to JVM. Thus, hooked functions in supported JVM can't return performance data to VTune Analyzer.
Only wayof usingthe tool is to do UnitTestfor Java code in my view -build a tester (java script) to call yourjava classes.
You are welcome to visit https://premier.intel.com to submit a new feature request to support- Run Java from an C/C++ application. (Now only support Java Application, Java Applet,Java Script in the product)
Well I was probably forgot to mansion the most important fact. The code I want to analyze is not a Java code. Java is only a side effectof the environment, i.e. probably the DataFlowEngine supports also Java modules.
In our case the DataFlowEngine runs our native code which is a shared library written in C++so the
DataFlowEngine --> loads our shared library.
So we do not care about Java at all it is just in the way. The DataFolowEngine is part of the Broker infrastrucutre. So if I could only have VTune skip the instrumentation of libjvm.so, it will continue and hopefully load my shared library which is the real entity I would like to analyze.
I spent more time on investigating the issue ... and ....
The Vtune has a bug in the implementation of linux system call dladdr() the problem seems to be minor however it causes the entire WebSphere to fail.
The problem is that dladdr() is a function that is given an address to any arbitrary function and it returns the file-path to the library that implements the function.
The regular dladdr() returns the correct path however the Vtune version occasionally returns an extra / slash between the file name and the folder.
This happens if the path in the LD_LIBRARY_PATH to that library happens to contain a slash / at the end.
Important! Please note that the problem occurs only if the dladdr() is referred in a library and the call is made to internal function (not used by the executable) and the library is not in the same folder as the executable but accessed using LD_LIBRARY_PATH environment variable.
To demonstrate the problem I have attached an example code:
TheApp.cpp has the following code:
extern "C" void libExternalFunction();
int main(int argc,const char* argv)
---------- libExternalFunction() is implemented in libTheLib.so
TheLib.cpp Is the code for libTheLib.so
extern "C" void libInternalFunction()
extern "C" void libExternalFunction()
if (dladdr((const void*)libInternalFunction,&info)==0)
printf("dladdr() returned %s\n",info.dli_fname);
------- libInternalFunction() is an internal function that we want to know its implementing lib
------- libExternalFunction() locates the name of the implementing library of libInternalFunction() and prints it.
------ The application TheApp is located in ./vtuen.problem
------ The library libTheLib.so is located in ./vtuen.problem/subdir
For the application to run properly we have to add the ./vtuen.problem/subdir/ to the LD_LIBRART_PATH. Important please note the slash at the end of .subdir/ this is causing the problem.
Use the . ./envvars.sh to set the LD_LIBRARY_PATH
When running TestApp from the command prompt
dladdr() returned /../vtune.problem/subdir/libTheLib.so --- is correctly reported.
When running it from Vtune it wrongly reports :
dladdr() returned /../vtune.problem/subdir//libTheLib.so
Note the extra slash / between subdir and libTheLib.so
I have noticed that Vtune creates code in the cash which is instrumented: so the libTheLib.so library actually runs from the Vtune cache. However to correctly report the dladdr(), Vtune intercepts the dladdr() and fixes the address to point to the correct (original) path. This is done using the RT_catch_dladdr() function which is located in the libRTEnvSupport.so. It first extracts the real path from the using the original dladdr() and then using the name only, it locates the original path which is actually placed in the libTheLib.so instance which is placed in the cache. I could see that libTheLib.so in the Vtune cache have the wrong path in it. I have concluded the problem occurs during the instrumentation phase.
Well I have tried to see if the LD_LIBRARY_PATH Im suing contains the extra slash, but it is not. It is probably created as a chain of reaction I have already described, and I could not find the actual location where it happens. Anyhow it is a bug since it changes the behavior of the program when running under Vtune.
Attached are the files with recreation of the problem
Is it possible to fix this one so I can continue with the evaluation?