Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Help finding slow code

dajum
Novice
464 Views
I've attached the results from my test run of my program.  I think this code should run in about 1-2 seconds yet it takes about 15.  I'm trying to see why.  The profiling seems to suggest all the time is spent outside of my code.  Is that correct?  How to I find what is making it so slow.
Thanks,
Dave
0 Kudos
3 Replies
Peter_W_Intel
Employee
464 Views
Thanks for your report.

I saw that you worked on 8-core CPU, and 8 OpenMP* threads. It seemed that 1s-5s was to start up threads, 5.5s - 15s was real work to run 8 OpenMP* threads.

Each OpenMP* thread uses 20%-60% CPU time, so whole CPU usage was 160% - 480% during 1.5s to 15s, we expect that 800% CPU usage for OpenMP* threads' running.

You may reduce wait time in OpenMP region, or increase workload to reduce wait count - then improve the performance.

OMP.png

I don't know why you said run in about 1-2 seconds.

Regards, Peter
0 Kudos
dajum
Novice
464 Views
I have one routine that opens a number of files ( 6600 files in 23 directories) during the course of the run.  The old method was to write all the data into one file.  That takes about 1.1 seconds to run.  I split it into many files and tried to parallelize the process to speed it up.  But now it takes 15 seconds.  So I'd like to see what is taking it so long by using the profiler.  But the default results do not help me in any way.  The listing of where the time is spent doesn't seem right.  
I ran the same run through GLOWCODE and now have a lot of information about where the time is spent. It is amazing what a little detail can tell you.  Time 0-5 is mostly spent deleting files and nothing to do with starting up threads. There isn't any real work being done by all the treads,  but it seems impossible to tell from VTUNE what it was doing or why.  So I also spent some time reading the posts on how VTUNE works and now understand why.  Thanks for your help.
Dave
0 Kudos
Peter_W_Intel
Employee
464 Views

"The old method was to write all the data into one file.  That takes about 1.1 seconds to run.  I split it into many files and tried to parallelize the process to speed it up.  But now it takes 15 seconds." - thank you to tell me this story.

I suspect there were more CPU time spent in disk write I/O, you can useLocksAndWaits analysis to collect performance data to find Wait Time.

Frequently disk-writing in parallel threading is not helpful on performance gain, since only one thread can use disk IO at a time, you may use memory storage instead of disk IO, dump data to files in final stage (Sorry that I don't know your algorithm in depth)

Regards, Peter

0 Kudos
Reply