Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Why is CPU use only 50%?

keefer
Beginner
1,598 Views
Hi,
I am running a very CPU intensive Fortran Program, but the performance monitor shows it using only 50% of CPU time, the rest going to the Idle process. I have set the process priority to High and nothing changes. I have a 3.2GHz P4 with HT enabled running XP Pro, SP1. I get the same result with both a CVF6.6c compiled version and an IVF8.0 with the /Qparallel enabled and linking with the multi thread static run time library. How do I persuade my program to use all the CPU, if there are no other higher priority processes running?
Regards,
Keith
0 Kudos
7 Replies
TimP
Honored Contributor III
1,599 Views
CVF doesn't come with any parallelization facility.
For ifort -Qparallel, what did -Qpar_report3 tell you about reasons for not parallelizing? Auto parallel is not as effective as parallelization with OpenMP directives, even when you don't have to adjust your source code to take advantage of the latter.
Even if you manage to parallelize and get your meter up to 100%, you may be a long way from getting useful results for your effort. A 15% improvement in throughput would be excellent. "very CPU intensive" often has a personal definition, but with your meter at 50%, it does indicate you are not waiting on disk. You could be using 100% of the fpu and your meter still says 50%.
0 Kudos
Steven_L_Intel1
Employee
1,599 Views
The thought that occured to me was that with HT, a single-thread process might show as only 50% utilization.
0 Kudos
keefer
Beginner
1,599 Views
Tim,
Thanks. You provided some hints that helped interpret the performance data. I'd like to run them past you quickly to see if I'm on the right track. First, the CPU usage is the percent of both processing units(?). When I look at them separately, both usage curves go up and down, but seem to be out of phase with each other. "CPU intensive" means a lot of table look-ups and comparison, mostly integer, so I never really expected much of a performance gain with HT because I have an unfortunate mix (or lack thereof) of instructions. I do get a big kick in usage when another, unrelated process kicks in. My program is a Monte Carlo simulation, so it makes a lot of random accesses to large (8Mbyte) lists. I expect that I am cache thrashing. Would VTune tell me how much time is spent in memory latency? Would an Extreme Edition P4 help with cache thrashing?
Thanks again.
Regards,
Keith
0 Kudos
TimP
Honored Contributor III
1,599 Views
Keith,
If your program is single threaded, it doesn't make much difference how it decides which logical CPU to use. You probably start over with L1 cache misses each time the scheduler chooses to move your thread, but you don't lose anything in L2 cache. If you had 2 physical CPU's, it could make a big difference how the scheduler treats you.
Vtune ought to show whether you are spending time on cache misses. Even if you did Vtune analysis, it would be difficult to guess how a larger cache would affect your performance, unless you can confirm that you don't have enough cache misses for it to make a difference.
The problem with irregular access to memory is that hardware prefetch doesn't kick in, and you can't do much with software prefetch, if you don't have advance information on where you are going next. This would be a good case for using HT with 2 threads, if you have independent table lookups for the 2 threads. However, if you think it is dependent on cache size, you must watch out for the case where 2 threads want twice the cache of 1.
0 Kudos
TimP
Honored Contributor III
1,599 Views
In accordance with what Steve said, this is what you would expect if one of your computers is running with HyperThreading enabled, and the other disabled, if you are running a single thread application. You should have a BIOS setup option to enable or disable HyperThreading. The OS can't really tell how much more resources would be available if you were using the 2nd logical CPU. Normally, it would be far less than is implied by your performance monitor.
0 Kudos
jasonc12345
Beginner
1,599 Views

I should have a BIOS setup option to enable or disable hyperthreading, certainly.
But in fact I do not, because the people writing the BIOS options and the other upstream hardware provider all believed the things your company told them about hyperthreading, and thought that it would intelligently rout jobs, tell when the usage need was single threaded, and actually use a whole core on the one thread that needs it. But your system doesn't, and its claims that hyperthreading to two virtual cores is a strict improvement over having the physical cores working separately is stuff and nonsense. Not to put too fine a point on it, most of the non-response responses in this thread to the perfectly sensible user's question are denial and lying about the problem.

If you have a fundamentally sequential and CPU intensive job, then one physical core running at full cycles is going to smoke a busted virtual half-core with the other part idle, like a cheap cigar. And this is only approximately everything I actually care about CPU power, for. All the simple operating system tasks are not a problem as soon as I have 2 physical cores. When I need the CPU to crank is when I need sequential single thread processing at full clock rate.

So what I'd appreciate besides some honestly about the trade off and where Intel came down on it with its hyperthreading decisions is a solution that actually solves the problem for folks like the original poster and myself, who are doing heavy math and the like, and actually need max single thread resources on demand. BIOS updates to disable hyperthreading, for example. And no, telling me that I "should" already have them when I don't because your company told everyone that nobody would ever want to, is not an answer or a solution. Nor is a lecture pretending to tell me that my batch jobs that take a weekend to run, with the CPU usage pegged at a ridiculous 12.5%, don't do so, or wouldn't go faster using 25.
0 Kudos
SergeyKostrov
Valued Contributor II
1,599 Views
...My program is a Monte Carlo simulation, so it makes a lot of random accesses to large (8Mbyte) lists...

Hi,

I'd like to understand how you access these large 8MB lists. I could assume two genericcases:

-Program loads all your lists into memory and then doesn't do any I/O operations until the calculations are
completed ( CPU utilizationcould begreater than 50% because there are no I/O operations)

- Program doesn't load all your lists into memory anduses I/O operations to load a list as soon as it is
needed ( CPU utilization could be even lower than 50% because CPU doesn't do anything until I/O
operation is completed )

Also, how many 8MB lists do you have?

Best regards,
Sergey
0 Kudos
Reply