- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have developed a program and unfortunately I have speedup problem in it. My program is so big so I have tried to write a sample similar to my program, fortunately this simple program has a same problem with my program.
I need other experiences and your help if it is possible.
Thanks,
I am using VS2010 and Intel FORTRAN XE 2011
Program:
TYPE var
REAL(8),POINTER :: A, B, C
END TYPE var
REAL(8),POINTER :: A(:), B(:), C(:)
TYPE(var),POINTER :: vars(:)
TYPE(var),POINTER :: varOMP
REAL*8 t1,t2 ,ai,bi,ci,di,ei,fi
INTEGER(4) c1,c2
INTEGER N, CHUNKSIZE, I, id, f , l
PARAMETER (N=200)
PARAMETER (CHUNKSIZE=10)
Allocate (A(N), B(N), C(N),vars(N))
! initializations
DO I = 1, N
A(N) = I * 1.0
B(N) = A(N)
vars(I)%A => A(N)
vars(I)%B => B(N)
vars(I)%C => C(N)
vars(I)%A = 0.51
vars(I)%B = 0.45
ENDDO
CALL SYSTEM_CLOCK(c1)
Do Itration=1,1000000
!z$OMP PARALLEL PRIVATE(I,varOMP,ai,bi,ci,ei ,di,fi )
!z$OMP DO SCHEDULE(STATIC,CHUNKSIZE)
DO I = 1, N
varOMP => vars(I)
ai = varOMP%A
bi = varOMP%B
di = ai*2.2 + bi * 2.0 + ai*2.1
ei = ai*2.1 + bi * 2.3 + ai*2.15
fi = di * ( ai + bi )*2.1 + ei * ( di + bi )*2.1
ci = bi*2.1 + ai*2.0 + di*2.0 + ei*2.0 + fi*2.0
varOMP%C = ci
ENDDO
!z$OMP END DO
!z$OMP END PARALLEL
ENDDO
CALL SYSTEM_CLOCK(c2)
WRITE(*,*) c2-c1
STOP
END
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, in last compile I have changed !$OMP to !z$OMP for testing the program using openMP please remove z from !z$OMP in sample.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To see an advantage for threading you will need an outer parallel loop with a count such as 1000 with an inner vectorizable loop of at least the size shown. System_clock works better with 64 bit arguments.mm
threading can't compensate for inefficient data access. If multiple threads write to the same cache line (false sharing) it will perform poorly.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page