Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29393 토론

What does CPU_TIME() return on multi-threaded applications?

Tony_Garratt
초급자
1,475 조회수

Suppose I am running a Fortran application that uses some of the MKL routines that multi-thread because I want to fully exploit my mulit-core processer at the math level. What does CPU_TIME return? The timespent by the CPU core used by the application itself or the sum of the application core and the time used by MKL on the allother cores? If the later, then the CPU_TIME result would be > than the wall clock time.

On a similar topic, if I used CPU_TIME in a multi-threaded Fortran application (e.g. I used OPEN_MP), what would CPU_TIME return on the base thread and what would it return on the other threads?

0 포인트
1 응답
TimP
명예로운 기여자 III
1,475 조회수
cpu_time is expected to produce the total CPU time of all threads of a process. Needless to say, we don't often see this used as a measure of effective parallelism, although it may be better than certain proposals I've seen. On Windows 9x, cpu_time would normally give you elapsed time.
For evaluation of OpenMP applications, the standard OpenMP function omp_get_wtime() is better suited for comparing elapsed time (the w stands for "wall.") It would usually read the calendar clock on your platform, thus it wouldn't matter which thread got the time, but the overhead might be significant.
If you used the MSVC/ICL macro __rdtsc(), or possibly Fortran system_clock(), in an effort to get better resolution, you might be concerned about comparing values obtained by different threads. In practice, the cores of a single CPU share the same counter, but, at best, the counters of different CPUs would be synchronized as well as possible in power-on BIOS code, and would stay synchronized only to the extent that they share the same buss clock, as CPUs on a single motherboard normally do. In most such cases, they do stay synchronized well enough that it's not worth while to incur the overhead of something such as a critical region (which you may need anyway, if you are comparing times from different threads).
Windows also offers QueryPerformanceCounter APIs, which should do the critical region stuff for you, with the expected overhead cost.
0 포인트
응답