Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Profiling I/O bound applications with Vtune

Endre_S_
Beginner
784 Views

Hi,

I am currently using the basic analysis tools in the Vtune profiler. I see that CPU bound profiling is well supported and also lock and spin waiting time is measured. How about I/O operations and syscall waiting times (i.e. wall clock times)? Are they measured in Vtune at all? If yes, which view supposed to show them? Are they only available in the advanced analysis?

Thanks

0 Kudos
1 Solution
Peter_W_Intel
Employee
784 Views

Thank you Mr.Anderson, provide test case:-)

File I/O and waitpid() can be captured by basic hotspots as well as locksandwaits. You can see your results, read()/write() already were in your bottom-up report. However, waitpid() takes extreme short CPU time at a time, you can see screen shot (advanced-hotspots) from Mr.Anderson - "Wait Rate" is High. Most likely the developer will put waitpid() in loop...

Actually your signalprocmask() took 7ms, its parent function (your interest of) didn't spend CPU time - but display 7ms as Total time.

View solution in original post

0 Kudos
11 Replies
Peter_W_Intel
Employee
784 Views

There are two options: 1) you can use advanced-hotspots analysis which provides Spin Time (measure thread's wait time due to sync object) & Overhead Time (extra time spent on thread's behaviors); 2) you can use locksandwaits analysis to measure all sync-objects wait time & IO wait time.

In your case, you need to use locksandwaits analysis to meet your IO waits measuring request.

0 Kudos
Endre_S_
Beginner
784 Views

Thanks for the hint! I was trying to use the locks and waits view, but the results seemed suspicous. There is one function that executes several seconds according to wallclock time (based on the logfile the application produces), however in the locks and waits view, it only shows 0.1seconds for wait time in total.

Do I interpret it correctly and the time in the column "Wait time: total by thread concurrency" contains times from syscalls like select, sleep, read/write as well? According to the help popup, it should, but then the value is weirdly low.

At this point of course I cannot rule out that maybe I make a mistake, so I just would like to understand if my assumption on the values in vtune is correct.

Thanks

 

0 Kudos
Alexandr_K_Intel
Employee
784 Views

Hi,

  That is correct: Locks and Waits analysis traces various IO calls (such as fwrite, write, fread etc.)  and Wait time column is shown basing on this data. But the set of functions is limited and if you use a function that we do not trace, then its time will not be added to Wait time. What is the function that you think should produce significant Wait time in your result?

0 Kudos
Endre_S_
Beginner
784 Views

ALEXANDR K. (Intel) wrote:

Hi,

  That is correct: Locks and Waits analysis traces various IO calls (such as fwrite, write, fread etc.)  and Wait time column is shown basing on this data. But the set of functions is limited and if you use a function that we do not trace, then its time will not be added to Wait time. What is the function that you think should produce significant Wait time in your result?

Hi,

ah, excluding some system calls from tracing may explain a lot.

The prime suspect is launching a new process (fork) and waiting for it with waitpid(). (I had the "follow child processes" option disabled when I ran the analysis, but it seemed reasonable that the fork and waitpid() would be measured anyway in the parent process).

Other suspects are sleep variants on linux (sleep, nanosleep, etc) and network I/O.

Is there perhaps some detailed information available about the traced system calls and the ones that are skipped? Are they configurable what to include/exclude?

Thanks

 

0 Kudos
Peter_W_Intel
Employee
784 Views

@ Endre S

Is it possible that you change "Call Stack Mode:" to  "User/system function" in bottom of bottom-up view to compare two results? If you can sent me results in private message, I would like to study these.

0 Kudos
Endre_S_
Beginner
784 Views

Hi,

I sent a PM with the details.

Thanks

0 Kudos
Peter_W_Intel
Employee
784 Views

@ Endre S

Thank you to provide results to me.

Basic hotspots uses 10ms as sample interval, function of your interest ( signal process) only spent 10ms, and its parent function spent 70ms, that was why this function was missed at second result.

My opinion is to call more times (e.g 100 times) of this function (or trapped ), so this function will not be missed in VTune's report.

Advanced-hotspots uses 1ms as sample interval by default, it can capture samples on this function or its child - if they spent time 10ms. (Add option "-knob collection-detail=stack-sampling" to collect call stack info)

0 Kudos
Endre_S_
Beginner
784 Views

I understand that the 10ms time resolution may miss functions that are running close to the resolution time period, however I still do not understand why that method was reported being waiting for 70ms only, while according to wall-clock time it is close to 2 seconds.

I am still curious what was missed. For me it looks like that waitpid() is one that is excluded, but perhaps there are others too. Is there any information available on which system calls are not measured by Vtune in the "locks and waits" view, or is it a hard question to answer (platform, etc dependent perhaps)?

Thanks

0 Kudos
David_A_Intel1
Employee
784 Views

Hi Endre S.,

Have you tried Advanced Hotspots + stacks and context switches?  I profiled an app that uses waitpid() and see wait time attributed to the waitpid() routine.

waitpid.PNG

0 Kudos
Peter_W_Intel
Employee
785 Views

Thank you Mr.Anderson, provide test case:-)

File I/O and waitpid() can be captured by basic hotspots as well as locksandwaits. You can see your results, read()/write() already were in your bottom-up report. However, waitpid() takes extreme short CPU time at a time, you can see screen shot (advanced-hotspots) from Mr.Anderson - "Wait Rate" is High. Most likely the developer will put waitpid() in loop...

Actually your signalprocmask() took 7ms, its parent function (your interest of) didn't spend CPU time - but display 7ms as Total time.

0 Kudos
Endre_S_
Beginner
784 Views

That sounds very promising, unfortunately at the moment I could only try with the basic analysis (due to root access limitation upon installation), so I was not able to have a look at those reports. But your demo looks good and I'm happy that I got confirmation on that File I/O and waitpid also measured. Thanks a lot for your efforts helping me!

0 Kudos
Reply