- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel compiler (for Windows, Professional v. 11.1.035) doesn't paralize loop:
for (long i=2;i
while it paralizes more complicated loop:
for (long i=2;i
where k - is a number, which I put in from keyboard.
f is simple calculated for long time function without side effects
Why compiler parallizes complex loop, but doesn't - simple one?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel compiler (for Windows, Professional v. 11.1.035) doesn't paralize loop:
for (long i=2;i
while it paralizes more complicated loop:
for (long i=2;i
where k - is a number, which I put in from keyboard.
f is simple calculated for long time function without side effects
Why compiler parallizes complex loop, but doesn't - simple one?
As always, it would be helpful to have a complete compilable test case to make sure we're all on the same page. In particular I'd be surprised if either loop would be parallelized because there is a cross iteration dependence between x and x[i-2]:
[cpp]>type bug.c int f(long i) { return i; } foo(long N, long k, double *x) { long i; for (i=2;iI'd be curious to see a compilable test case where this loop was parallelized. If I change the '2' to a 'k' then it also fails to parallelize because of a number of possible dependences (could be flow or anti, depending on the sign of 'k').=x[i-2]+f(i); } } >icl -c -Qparallel bug.c -Qpar-report3 Intel C++ Compiler Professional for applications running on Intel 64, Version 11.1 Build 20091012 Package ID: w_cproc_p_11.1.051 Copyright (C) 1985-2009 Intel Corporation. All rights reserved. bug.c procedure: f procedure: f procedure: foo procedure: foo bug.c(10): (col. 5) remark: loop was not parallelized: existence of parallel dependence. bug.c(11): (col. 9) remark: parallel dependence: assumed FLOW dependence between x line 11 and x line 11. >[/cpp]
Dale
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
for (i=1;i
These loops are independent, and compiler runs them parallel!
For the case of arbitrary k:
for (i=0;i
we will get k loops with i, such that i%k=0,i%k=1, i%k=2, i%k=3, ...,i%k=k-1
The question is that: why Intel compiler treats complex loop with arbitrary k, and doesn't - simple one with k=2?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
using namespace std;
#include
#include
const long n=30000, m=2000;
double f(long i)
{
double s=0;
for (long j=0;j
return cos(s);
}
int main()
{
clock_t start, finish;
double duration;
double s,x
long N; long k;
cout<<"Enter number k ";
cin>>k;
for (long i=0;i
start = clock();
for (long i=5;i
finish = clock();
s=0;
for(long i=0;i
duration = (double)(finish - start) / CLOCKS_PER_SEC;
printf( "%2.10f secondsnSum: %2.10fn", duration,s);
char ccc;
std::cin>>ccc;
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
using namespace std;
#include
#include
const long n=30000, m=2000;
double f(long i)
{
double s=0;
for (long j=0;j
return cos(s);
}
int main()
{
clock_t start, finish;
double duration;
double s,x
long N; long k;
cout<<"Enter number k ";
cin>>k;
for (long i=0;i
start = clock();
for (long i=5;i
finish = clock();
s=0;
for(long i=0;i
duration = (double)(finish - start) / CLOCKS_PER_SEC;
printf( "%2.10f secondsnSum: %2.10fn", duration,s);
char ccc;
std::cin>>ccc;
return 0;
}
What options did you use to compile this file? When I build it with "-parallel", while several loops are parallelized (including the one in f()), the loop in question (with either x-2 or x-k) is not parallelized. Perhaps the actual question is why does it goe faster when you change "x-2" to "x-k"? Or do you have reason to believe (e.g. par-report, examination of asm file) that it actually parallelized the loop in question?
Dale
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What options did you use to compile this file? When I build it with "-parallel", while several loops are parallelized (including the one in f()), the loop in question (with either x-2 or x-k) is not parallelized. Perhaps the actual question is why does it goe faster when you change "x-2" to "x-k"? Or do you have reason to believe (e.g. par-report, examination of asm file) that it actually parallelized the loop in question?
Dale
Compiler options:
/c /O2 /Og /Ot /Qip /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MT /GS /arch:SSE /fp:fast /FAs /Fa"Release/" /Fo"Release/" /W3 /nologo /Wp64 /Zi /Qopenmp /Qparallel
I examine asm-file. It is quite complex. To understand asm-file better, I decided to change
x=x[i-k]+f(i);
by
x=log10(abs(x[i-k]+f(i)));
The result was the same: the above example is fast. But when I putx[i-2] instead ofx[i-k] program become twice slow on my Dual core processor.
The asm-files for slow program with x[i-2] - seq_loop.asm and with x[i-k] - parallel_loop.asm. From asm-file I took only the interesting loop:
for (long i=5;i
Compiler pasted the code of function f in this loop. So in asm-files there is also loop:
for (long j=0;j
As you can see from asm code in program withx[i-2] neither first or second loop is parallel.
But in program withx[i-k] - compiler runs the loop
for (long j=0;j
parallel (it is surrounded by "call ___kmpc_serialized_parallel" and "call ___kmpc_end_serialized_parallel")
It is suprising for me. I thought, it runs loop
for (long i=5;i
parallel. I've checked and discovered, that Intel compiler can't parallise recurrent loops :-(
The new question: why does the compiler not run parrallel the loop
for (long j=0;j
when I put x[i-2] instead ofx[i-k]?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compiler options:
/c /O2 /Og /Ot /Qip /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MT /GS /arch:SSE /fp:fast /FAs /Fa"Release/" /Fo"Release/" /W3 /nologo /Wp64 /Zi /Qopenmp /Qparallel
I examine asm-file. It is quite complex. To understand asm-file better, I decided to change
x=x[i-k]+f(i);
by
x=log10(abs(x[i-k]+f(i)));
The result was the same: the above example is fast. But when I putx[i-2] instead ofx[i-k] program become twice slow on my Dual core processor.
The asm-files for slow program with x[i-2] - seq_loop.asm and with x[i-k] - parallel_loop.asm. From asm-file I took only the interesting loop:
for (long i=5;i
Compiler pasted the code of function f in this loop. So in asm-files there is also loop:
for (long j=0;j
As you can see from asm code in program withx[i-2] neither first or second loop is parallel.
But in program withx[i-k] - compiler runs the loop
for (long j=0;j
parallel (it is surrounded by "call ___kmpc_serialized_parallel" and "call ___kmpc_end_serialized_parallel")
It is suprising for me. I thought, it runs loop
for (long i=5;i
parallel. I've checked and discovered, that Intel compiler can't parallise recurrent loops :-(
The new question: why does the compiler not run parrallel the loop
for (long j=0;j
when I put x[i-2] instead ofx[i-k]?
Dale

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page