VTune affects behavior of API function AllocConsole?
[Platform Discription]: OS - Windows 2000 Server (SP3), IDE - Visual C++ 6.0, Profiler - VTune 6.0
[Program Discription]: This is a program performing speech recognition tasks. The main window is a Document-View window, in which the speech waveform is displayed. The construction function of the document object create a console object, which is used for text information output. The console object is implemented via Mark Nelson's ConStream class.
[Problem Discription]: Under Visual C++ Debug environment, the program can run properly. When it is invoked by VTune, an assertion failure dialog pops up (this assertion failure only occurs when the call graph is enabled in VTune, if I only collects the sampling and the counter monitor data, it runs properly in VTune), with following information:
Debug Assertion Failed! Program: ... ... est_word_boundary.exe File: fdopen.c Line: 53 Expression: (unsigned)filedes < (unsigned)_nhandle For information on how your program can cause an assertion failure, see the Visual C++ documentation on asserts. (Press Retry to debug the application)
I searched "fdopen" in my project, and found out that only the ConStream::Open() function called a function named "_fdopen". This function has 2 parameters: int filedes, REG2 const _TSCHAR *mode in which filedes is the handle referring to open file, and mode refers to the file mode to use ("r", "w", "a", etc.)
In ConStream::Open(), "filedes" gets its value from the return value of function _open_osfhandle. When the program is invoked by VTune, this value is -1, while when running under Visual C++ Debugger, the value is 4. This is because the first parameter of function _open_osfhandle gets its value from m_hConsole, which is the return value of API function GetStdHandle(). When running under VTune, m_hConsole is 0. Function _open_osfhandle checks its first parameter and when it is 0, _open_osfhandle returns -1. When this value is converted unsigned in function _fdopen, it makes the assertion "(unsigned)filedes < (unsigned)_nhandle" failed, because here _nhandle is 32.
Several lines of key codes in ConStream::Open() are as follows: AllocConsole(); m_hConsole = GetStdHandle( STD_OUTPUT_HANDLE ); int handle = _open_osfhandle( (long) m_hConsole, _O_TEXT ); I added some debug codes to ConStream::Open() and found:
Running under Visual C++ Debug environment: AllocConsole return value: 1 after AllocConsole returns, GetLastError() indicates: The operation completed successfully m_hConsole: 7 handle: 4
Running under VTune: AllocConsole return value: 1 after AllocConsole returns, GetLastError() indicates: The operation completed successfully m_hConsole: 0 handle: -1
By inserting an assertion line of code into ConStream::Open(): _ASSERTE(1 == 2), I manage to force the program running under VTune to break, so that I can debug it and compare the debug process with that of from the Visual C++ debug environment. I found out that GetStdHandle copies a value from a certain memory address as its return value, and this particular value (7 under VC, 0 under VTune) is written to memory at this address during the course of AllocConsole(). At this point I could make a preliminary conclusion that different running environment seems to affect the behavior of function AllocConsole.
After all, the most important problem facing me
is: how can I make the program run properly under VTune if I want to obtain the call graph data?
Hi Leech, I reproduced your problem. Assuming you are using VTune 6.0, please try the following work around:
1. Create new Call Graph activity using CG Wizard. In the Step 2 specify "No application to launch" and "Modify default configuration bla-bla-bla". In the Step 3 specify your EXE as "Module of interest". 2. Open Configure CallGraph dialog. 3. Open Advanced. 4. Check ON "Allow in-place instrumentation" box. Press OK twice. 5. Find your EXE in the grid, scroll the grid to the right and find "Instrumented Module Name and Location" cell for the EXE. 6. Double-click on the cell and put there full path to your EXE. This will configure VTune CallGraph to replace your original EXE with instrumented one. Original EXE will be renamed to .on_the_place_backup. 7. Run. VTune will instrument your application and ask you to run it manually. 8. Run your application outside of VTune. When Application will start, VTune will write "Data collection started" in the output window. 9. After running test case, close your application. 10. After closing your application press "Stop" button on VTune's tool bar.
That's all. Please reply, if this allowed you to continue.
I am sure that more precise answer you'll get from kdmitry, but until then ... 1) You can ignore this warning message. 2) The fact you get the error message is a good sign! You started very advanced mode of running the call graph! And you are on you way to get results! Please try to locate the file "MFC42D_C__WINNT_system32.DLL " on your disk. When found, add this path to the environment variable PATH. Most likely you'll find the file in "cache directory". You can see the path to the cache directory in the advanced configuration for the call graph collector. (You've been already there when you check the "allow inplace instumentation ....")
This sounds interesting and exciting. Just before I checked here to see your reply, I also came to the idea that probably I should add the cache directory storing the instrumented dlls to the path environment variable. I right-clicked on "My computer", then choose "Properties" and the "advanced" tab on the dialog. Then I clicked on the "Environment Variables" button. There are two categories of environment variables settings on the poped up dialog. One is for the current user, the other is for the system wide. At first I only added the cache directory to the user PATH variable and left the system PATH variable unchanged. I ran the program and got the "unable to find dll" error again. Then I saw your post addressing this problem. Gee, the feeling is hard to describe. :) Then I think it over and noticed I didn't changed the system wide PATH variable. I went back to change it, and I can enjoy the call graph result now! It's really an exciting experience. Thank you! I guess I've leant much these days.
I don't think that this is risky. In you cache dirictory "instrumented" binaries/libraries are placed. Their name is changed also (you can see this). Natuarally no one of the regular executables are linked with these "renamed" binaries. That's why I think it is not risky.
But, personally I do not recomend you to do this. In my mind it is preferable to add the cache dir path to the PATH env variable only inside the command console where from you start your already pre-instrument executable.
Now that I can collect data successfully in the project created by the Call Graph wizard. But I cannot find clockticks data in the columns to the right of the source pane. The tutorial of VTune 6.0 says such information can be displayed in the source pane. When I tried to create a project with both Call Graph and Sampling information collected. It can only perform one of the task, say, only Call Graph or Sampling data can be collected. How can I get clockticks data for Call Graph?
"drag & drop" ! i.e. open a view (like source view) from one activity resulst (let say sampling) and drag and drop on the top of this view another node of activity result or sub activity resulst (let say call graph)
Hi Leech, First of all, I'm glad that was able to help you (even when I forgot about cache directory at all :( ).
Do you really want me to describe the bug ? It was wrong condition in if() statement. I cannot do the same on your machine because of 2 reasons: 1. I fixed it in much-much newer VTune sources, than you have (those, that probably will come out in the next release after 7.0) 2. I'm in VTune developer team, so I cannot contact you directly, sorry. I asked our support person to look inside this topik and help you.
Now about the work around itself. Our bug was in the code, that started instrumented application. So I asked you to configer VTune in the way, it never starts the app - "No application to launch".
But using only "No application to launch" could not help, because in such a case VTune replaces original EXE with a special wrapping EXE, that sets up PATH, does some other required tasks and starts up the instrumented application. And here is the same problem once more ! We try to write a good code, so we shared the same procedure in both scenarious ! The solution was in asking you to configure VTune for in-place EXE instrumentation. It is the special mode in Call Graph profiler, that should be used when wrapping EXE cannot be (ex. for some reason COM EXE servers don't work if we start them through wrapping EXE, or some GUI testing tools mulfunction in the same case). When in-place EXE instrumentation is used in addition to "No application to launch" VTune does not use wrapping EXE, but inserts required code directly inside your application's instrumented EXE. The drawback of this solution: the user have to specify required environment manually (as you done with PATH).
Yes, by dragging and dropping I can see clockticks data in source view now. But I have another problem. I don't know if I have configured the sampling activity right, because I found some functions have nothing displayed in their "clockticks" columns of the source view. But I'm sure the function is called in the program and I can step into it in Visual C++ debugger. Besides, some parts of other functions also have their corresponding clockticks fields in the source view left blank. And I'm sure these parts of codes (and the corresponding disassembled codes) are executed.
My question is: 1. What's the precise meaning of the term "clocktick"? 2. Is my problem of blank "clockticks" field caused by mis-configuration of the sampling activity? If so, how should I configure it properly?
Thank you very much. Your explanation was of great help to me. Could you give me a little more help? I don't know why some lines of codes have blank "clockticks" fields in the source view. I describe the problem in the post immediately above this one (the one replying to dbricker).
I agree with dbricker, that it is better to open a new thread for Sampling. But, in any case, I'll answer you here.
Sampling is a VERY complicated profiler. It is impossible to describe all it abilities in a short post, so I'll describe only the very basics.
In order to be able to read sampling results, you need to understand, how sampling works. First of all - sampling is a statistical profiler, which means that it results are meaningful only for the most frequently executed code. Each Intel processor has at least one special register - event counter. This counter is configurable - you can specify to the hardware what specific hardware event you want to count and the maximum count value. When processor counts maximum number of event, it generates hardware interrupt, which is handled by VTune driver. VTune driver notifies active process, active thread and next (for Pentium, current for Itanium) instruction to be executed. All this data is written into VTune database, post-processed and presented later to you.
You need to understand 2 basic issues: 1. Counters are handled by hardware globally for all processes and even OS itself. 2. Interrupts are generated only after specified number of events occur. This means, that all this number of events are associated with one single instruction, that generated only the last event.
Now let's try to understand, what does this mean for clockticks event. Clockticks event counts time. Most of this events will be associated with the most frequently executed parts of your code. If some instruction caught maximum clockticks, this means that either this instruction itself or other instructions around it are very heavy or executed many times. So, optimizing this single part of code will give you maximal possible performance boost (of course, most times modifying algorithm may give you much higher boost, but we are speaking now about local optimizations only).
What is the most fundamental problem of the sampling ? Sampling gives you only flat results. Ex: let's assume, you have put "printf(....)" in for() loop that executed 10000000 times. This is your "main". Sampling will point to some code inside printf() function itself or, even, inside disk driver. Does this mean, that the problem is in the "C" library of OS ? NO - the problem is in your code - if you will move printf() out of the loop, the problem will disappear. Here is where Call Graph comes - with much more overhead and less accuracy in time values themselves you will get much more data - you will get the control flow graph with call counts.
By the way, you can drag-and-drop data from both Call Graph and Sampling activity results into the Source View window. Each drop will add one or more data column. You can even drag-and-drop Call Graph results into Sampling tables and vise versa - this will allow you to view all relevant data in the same table.
Thanks for your comprehensive explanation! They greatly improve my understanding on the sampling scheme. I also read the help menu, but found it hard to understand. Maybe it's because of my poor English reading ability. :(
I also found this thread too long for others to grasp the key idea. I will try to focus on one topic in one thread in the future.