I will have to ask you some questions so that I can better understand the execution nature of your application.
For example, how long does the application run? Of that time, how much of it is actually code in dll A? If dll A is simply loading dll B and calling into it, you may never see any samples in dll A. Sampling gives you a statistical representation of where theprocessor is spending its time.
Are you configuring the VTune analyzer to "launch" your application? I assume you are collecting Clockticks and Instructions Retired?
If you want to get an accurate picture of the calling relationships, you can use call graph (available in the Wizards list).
dll A is the real heart of the application. In fact, the exe file really has only one job, parse the application arguments, put them into a structure that is easily interpreted and then load and initialize dll A passing the structure to it. From there dll A takes over as the actual application that is running.
dll B is only one of many dlls that dll A might call during an application run. However, ecause of the nature of the test I am giving it, dll A must call various functions in dll B many thousands of times during the test run. In fact, the fact that the modules that are identified by VTune as dll B and dll C are listing out the particular functions that they are (all of which actually reside in dll C), tells me that the code in dll B is being exercised.
The test lasts for 800 seconds (13.3 minutes). Yes, I have VTune starting the application and I have VTune delaying the beginning of sampling for 300 seconds (5 minutes) due to the long ramp-up that the application test takes before it really settles into the processing intensive steady state where I expect dll A to be calling dll B on a regular and often basis.
I have VTune collecting clockticks, instructions retired, and instructions retired / clockticks (CPI). I am also sampling several system wide parameters such as threads, Processor time, processor queue length, context switches, and several others. (I tried to go over to the test machine and get the full list of sample items for you. The machine is being used for other performance testing at the moment and they have me locked out. If you need that list, I can get it later today.
Does any of this information help?
Sorry for maybe appearing dense, but if dll A is only calling into dll B, then you may still not see any samples in dll A. I mean, dll A must be doing something CPU-intensive to show up in the results.
The other thought is, are you starting from the Process view? Press the Process button at the top of the Sampling view and then select your process from the list of processes running on the system. Next, press the Modules button (or Threads if you want to limit which threads you look at) and see if dll A shows up.
Finally, (sorry if you already said this) are you building dll A with full debug information? It should still be the "Release" build, but you should include debug info.
dll A is calling dll B a lot. It is also calling several other dlls , and it is doing a lot of its own processing as well.
I do not have a good idea of the proportion of time spent in each of these.
That's one of the reasons for running VTune.
Try opening the DLL in the Static Module Viewer:
1. Select the Open Static Module Viewer from the File menu.
2. Review the module and function list to ensure that the expected routines are present with the Functions with source information check box checked. If modules or functions are missing, ensure you have created a binary with debug information.