- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Consider the following very simple OpenMP program
#define n 5000
double a
int _tmain (int argc, char *argv[])
{
omp_set_num_threads (1);
double start = omp_get_wtime ();
#pragma omp parallel for
for (int i = 0; i < n; i++)
{
c = 0;
for (int k = 0; k < n; k++)
{
c += a
}
}
printf ("time = %lf (%d)\\n", omp_get_wtime () - start, omp_get_num_threads());
}
on my Intel Core 2 Quad it runs 0.17 s with 1 thread, 0.18 s with 2 threads, 0.18 s with 4 threads
So, no speedup.
Intel Parallel Amplifier shows bad utilization, but doesn't show the reason. How can I discover the reason with Intel tools integrated in MS Visual Studio?
コピーされたリンク
4 返答(返信)
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
please check your program in debug version or release version ?
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I selected Release Win32 - it solved problem! Many thanks!
Now (for N=1000) it works with 4 times faster with 4 threads. However I
the code is incorrect from the semantics point of view (indexes should not be shared).
Why it speeds up so poorly with Debug tagret selected?
Now (for N=1000) it works with 4 times faster with 4 threads. However I
the code is incorrect from the semantics point of view (indexes should not be shared).
Why it speeds up so poorly with Debug tagret selected?
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
For inner loop, auto-vectorization from /arch:SSE2|SSE3|SSE4 or /QxSSE2... will help performance a lot.
but it's disabled in "Debug". You can use /Qvec-report[1|2|3|4|5] to show the details.
for openmp, use /Qopenmp-report[1|2]
but it's disabled in "Debug". You can use /Qvec-report[1|2|3|4|5] to show the details.
for openmp, use /Qopenmp-report[1|2]
>>icl /O2 /Qopenmp /Ob2 /Qvec-report3 /Qopenmp-report:2 u.cpp
Intel C++ Compiler for applications running on IA-32, Version 12.0.0.024 Beta Build 20100415
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
u.cpp
C:\temp\u.cpp(12): (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
C:\temp\u.cpp(13): (col. 3) remark: loop was not vectorized: not inner loop.
C:\temp\u.cpp(16): (col. 7) remark: LOOP WAS VECTORIZED.
Microsoft Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation. All rights reserved.
You can add /Qvec-report3 /Qopenmp-report2 to the project property C/C++->Advanced->Additional option.
Jennifer
