execution speed and CPU usage

jswaugh · ‎12-11-2006

Hereis a pairof perhaps naive puzzles: In a project that, among other things,manipulateslargearrays -- say, dimension (1000,1000). Comments would be welcome.

1) The release version, optimized for speed, runs no faster, give or take a couple of percent,than the debug version.

2) Task manager shows that the main program uses 25% of CPU(Xeon) time, and the remaining 75% is used by the system idle process. Paging seems minor.

jimdempseyatthecove · ‎12-11-2006

How many cores does your system have? If 4 then 25% = 100% of 1 core.

If you have only 1 core then something in the app is causing it to block. What does the disk read/write information indicate? What does the NIC read/write indicate? Is your app performing a lot of thread safe system function calls? (e.g. DRAND).

If your app is waiting 75% of the time for someting then run debug and perform Break All and Continue10 to 20 times checking where the program is waiting.

Jim Dempsey

jswaugh · ‎12-12-2006

There is one (recent) Xeon chip -- unknown number of cores. Hyperthreading is enabled. TaskMgr shows 4 CPU's. Total CPU load shows ~25% always, divided among these processors. Breakall always stops in one or another of 3 routines, all involved with diagonalizing or otherwise handling large arrays. Remainder of the time is spent in system idle process. The only disk I/O is buffered,and break never stops there. Does this help?

TimP · ‎12-12-2006

Putting aside minor quibbles about the distinction between Xeon and Pentium-D, if you are running a single thread on Pentium-D with hyperthreading enabled, it looks as if everything you report is normal. You might get a small performance improvement by tinkering with the task manager affinity check boxes so as to restrict execution to a single core, or a larger improvement by compiling with threading enabled, setting 2 threads, and affinitizing the 2 threads to separate cores or disabling HyperThreading.

jimdempseyatthecove · ‎12-12-2006

So you have a dual core Xeon with HT. Your application is single threaded. (100% of 1/4 # CPUs) You might consider using:

OpenMP and processing your arrays yourself
A multi-threaded library (MKL)
Auto-parallelization

Without seeing your app it would be hard to advise you of the route to take.

Jim Dempsey