Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Robert_M__Münch
Beginner
40 Views

Restricting data collection in a DLL is slower then whole app sampling...

My context:

  • Own made DLL with functions that should be sampled
  • Parent application loading the DLL not of interest

Approach: Use ITT API to specify what to sample

Solution:

  1. #include <ittnotify.h>
  2. Use __itt_resume() and __itt_pause() to bracket code sections of interest
  3. No domain, handle etc. setup, keep it dead simple, expect that function names are shown later

Problem: The guarded function now runs endless, much slower than sampling the whole app.

Any idea how this can happen? I would expect that sampling is much faster, and that I later only see the sampled code section in the viewer.

 

0 Kudos
4 Replies
TimP
Black Belt
40 Views

I suppose the __itt call might suppress optimization, particularly if it depends on IPO. Then those calls might need to be moved further out in function nest.

 

 

Robert_M__Münch
Beginner
40 Views

Tim P. wrote:

I suppose the __itt call might suppress optimization, particularly if it depends on IPO. Then those calls might need to be moved further out in function nest.

The __itt* calls are outside any loops etc. The calls guard a function that has a very long run time, which I want to measure & optimize.

BTW: When adding any of the "domain" or "handle" calls, the program crashes. Hence I only use the "resume" and "pause" calls.

Vitaly_S_Intel
Employee
40 Views

Can you provide source snippet of your ITT API usage? You can leave ITT calls only. Are you only enabling/disabling collection via ITT API or you have your app instrumented with ITT API (e.g. tasks, frames, etc)?

Also, how much time the pair of itt_resume/itt_pause is called?

I also assume you're using "Start Paused" button to start profiling.

Robert_M__Münch
Beginner
40 Views

This is the overall code structure. I commented the domain stuff as it crashes the program.


// #include <ittnotify.h>
#include "abc.h"

/*__itt_domain* domain_qrdll;
__itt_string_handle* handle_qrdll;

void itt_initialize() {
	// Create a domain that is visible globally
	__itt_domain* domain_qrdll = __itt_domain_create("qrdll");
	// Create string handles which associates with the "qrdll" task.
	__itt_string_handle* handle_qrdll = __itt_string_handle_create("qrdll");
}
*/

void my_dll_function(
	/* long running computation */
	const int n_ind, 
	const real_t x[/* n_ind */]
) {
//	__itt_task_begin(domain_qrdll, __itt_null, __itt_null, handle_qrdll);
//	__itt_resume();

		
	for(j = 0; j < n_ind; j++)
               df = (real_t) 0.0;
	for(i = 0; i < n_obs; i++) {
		ri = - (real_t) dep;
		for(j = 0; j < n_ind; j++)
			df += ind[i * n_ind + j];
	}

//	__itt_pause();
//	__itt_task_end(domain_qrdll);
}

I only try to enable / disable collection, nothing fancy required. The function can be called a hundred couple of times. And yes, I start the application in "Start paused" state. 

 

Reply