- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've attached the results from my test run of my program. I think this code should run in about 1-2 seconds yet it takes about 15. I'm trying to see why. The profiling seems to suggest all the time is spent outside of my code. Is that correct? How to I find what is making it so slow.
Thanks,
Dave
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your report.
I saw that you worked on 8-core CPU, and 8 OpenMP* threads. It seemed that 1s-5s was to start up threads, 5.5s - 15s was real work to run 8 OpenMP* threads.
Each OpenMP* thread uses 20%-60% CPU time, so whole CPU usage was 160% - 480% during 1.5s to 15s, we expect that 800% CPU usage for OpenMP* threads' running.
You may reduce wait time in OpenMP region, or increase workload to reduce wait count - then improve the performance.
I don't know why you said run in about 1-2 seconds.
Regards, Peter
I saw that you worked on 8-core CPU, and 8 OpenMP* threads. It seemed that 1s-5s was to start up threads, 5.5s - 15s was real work to run 8 OpenMP* threads.
Each OpenMP* thread uses 20%-60% CPU time, so whole CPU usage was 160% - 480% during 1.5s to 15s, we expect that 800% CPU usage for OpenMP* threads' running.
You may reduce wait time in OpenMP region, or increase workload to reduce wait count - then improve the performance.
I don't know why you said run in about 1-2 seconds.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have one routine that opens a number of files ( 6600 files in 23 directories) during the course of the run. The old method was to write all the data into one file. That takes about 1.1 seconds to run. I split it into many files and tried to parallelize the process to speed it up. But now it takes 15 seconds. So I'd like to see what is taking it so long by using the profiler. But the default results do not help me in any way. The listing of where the time is spent doesn't seem right.
I ran the same run through GLOWCODE and now have a lot of information about where the time is spent. It is amazing what a little detail can tell you. Time 0-5 is mostly spent deleting files and nothing to do with starting up threads. There isn't any real work being done by all the treads, but it seems impossible to tell from VTUNE what it was doing or why. So I also spent some time reading the posts on how VTUNE works and now understand why. Thanks for your help.
Dave
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"The old method was to write all the data into one file. That takes about 1.1 seconds to run. I split it into many files and tried to parallelize the process to speed it up. But now it takes 15 seconds." - thank you to tell me this story.
I suspect there were more CPU time spent in disk write I/O, you can useLocksAndWaits analysis to collect performance data to find Wait Time.
Frequently disk-writing in parallel threading is not helpful on performance gain, since only one thread can use disk IO at a time, you may use memory storage instead of disk IO, dump data to files in final stage (Sorry that I don't know your algorithm in depth)
Regards, Peter
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page