- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Suppose I am running a Fortran application that uses some of the MKL routines that multi-thread because I want to fully exploit my mulit-core processer at the math level. What does CPU_TIME return? The timespent by the CPU core used by the application itself or the sum of the application core and the time used by MKL on the allother cores? If the later, then the CPU_TIME result would be > than the wall clock time.
On a similar topic, if I used CPU_TIME in a multi-threaded Fortran application (e.g. I used OPEN_MP), what would CPU_TIME return on the base thread and what would it return on the other threads?
링크가 복사됨
1 응답
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
cpu_time is expected to produce the total CPU time of all threads of a process. Needless to say, we don't often see this used as a measure of effective parallelism, although it may be better than certain proposals I've seen. On Windows 9x, cpu_time would normally give you elapsed time.
For evaluation of OpenMP applications, the standard OpenMP function omp_get_wtime() is better suited for comparing elapsed time (the w stands for "wall.") It would usually read the calendar clock on your platform, thus it wouldn't matter which thread got the time, but the overhead might be significant.
If you used the MSVC/ICL macro __rdtsc(), or possibly Fortran system_clock(), in an effort to get better resolution, you might be concerned about comparing values obtained by different threads. In practice, the cores of a single CPU share the same counter, but, at best, the counters of different CPUs would be synchronized as well as possible in power-on BIOS code, and would stay synchronized only to the extent that they share the same buss clock, as CPUs on a single motherboard normally do. In most such cases, they do stay synchronized well enough that it's not worth while to incur the overhead of something such as a critical region (which you may need anyway, if you are comparing times from different threads).
Windows also offers QueryPerformanceCounter APIs, which should do the critical region stuff for you, with the expected overhead cost.
For evaluation of OpenMP applications, the standard OpenMP function omp_get_wtime() is better suited for comparing elapsed time (the w stands for "wall.") It would usually read the calendar clock on your platform, thus it wouldn't matter which thread got the time, but the overhead might be significant.
If you used the MSVC/ICL macro __rdtsc(), or possibly Fortran system_clock(), in an effort to get better resolution, you might be concerned about comparing values obtained by different threads. In practice, the cores of a single CPU share the same counter, but, at best, the counters of different CPUs would be synchronized as well as possible in power-on BIOS code, and would stay synchronized only to the extent that they share the same buss clock, as CPUs on a single motherboard normally do. In most such cases, they do stay synchronized well enough that it's not worth while to incur the overhead of something such as a critical region (which you may need anyway, if you are comparing times from different threads).
Windows also offers QueryPerformanceCounter APIs, which should do the critical region stuff for you, with the expected overhead cost.