- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've set the compiler option Parallelization to "/Qparallel", but it takes no effects. I've tried this code. It's a DLL called by a .NET application:
[fortran]subroutine LRZERLEGUNG(AMatrix, DimensionN, AbsolutB, ResultX) !DEC$ ATTRIBUTES DLLEXPORT::LRZERLEGUNG implicit none ! Variables real :: T1, T2 integer, intent(in) :: DimensionN double precision, dimension(DimensionN,DimensionN) :: AMatrix double precision, dimension(DimensionN) :: AbsolutB double precision, intent(out), dimension(DimensionN) :: ResultX double precision, dimension(DimensionN)::y integer :: i,j,k ! Body of lrZerlegung CALL CPU_TIME(T1) ! Berechnung von L (Ax=LRx=b) !dir$ parallel do i = 1,DimensionN-1 do j = i+1,DimensionN ! Bestimme die i-te Spalte von L AMatrix(j,i) = AMatrix(j,i)/AMatrix(i,i) do k = i+1,DimensionN ! Datiere die j-te Zeile auf AMatrix(j,k) = AMatrix(j,k) - AMatrix(j,i)*AMatrix(i,k) end do end do end do ! Vorwaertseinsetzen !dir$ parallel !dir$ loop count min(4) do j = 1,DimensionN y(j) = AbsolutB(j) do k = 1,j-1 y(j) = y(j) - AMatrix(j,k)*y(k) end do end do ! Rueckwrtseinsetzen !dir$ parallel !dir$ loop count min(4) do j = DimensionN,1,-1 ResultX(j) = y(j) do k = j+1,DimensionN ResultX(j) = ResultX(j) - AMatrix(j,k)*ResultX(k) end do ResultX(j) = ResultX(j)/AMatrix(j,j) end do CALL CPU_TIME(T2) write( *, * ) T2-T1 end subroutine LRZERLEGUNG [/fortran]
Best regards, Marc
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The use of REAL*8 is neither a deleted or obsolete feature in the 1990, 1995 or 2003 Fortran standard. As such it is standard Fortran.
GAP is a highly advertised new feature of the "12.0" xe 2011 compilers. The loop count directive is an idea resurrected from compiler versions prior to 11.0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[bash]program matrix implicit none real(8),allocatable:: a(:,:), b(:,:),c(:,:) integer:: N = 200, i real:: t0, t1 allocate( a(N,N), b(N,N), c(N,N) ) call random_seed() call random_number(a) call random_number(b) call cpu_time(t0) !dir$ parallel allways do i = 1, 500 c = matmul(a,b) end do call cpu_time(t1) write(*,'(a, f10.3)') ' Time for matmul = ', (t1 - t0) / 5 pause end program matrix[/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-Qopt-matmul
Did you set that option?
If you're trying to find out how efficiently the threads are used, you will want to measure elapsed time as well as the total CPU time of all threads added up.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To get the correct comparison I used the openmp timer. CPU_Time does not reflect correct values.
Most likely my test below (which is little bit different in that I use random numbers in each loop) does not have large enough matrix sizes but I don't see any significant performance improvement when using /Qopt-matmul.
Abhi
---
[fortran] Program Test_Qmatmul Implicit None Real(8), Allocatable:: a(:,:), b(:,:), c(:,:) Integer :: N = 200, M = 500 Integer :: i Real(8) :: ts, te, OMP_GET_WTIME Allocate( a(N,N), b(N,N), c(N,N) ) Call random_seed() ts = OMP_GET_WTIME() !Call CPU_Time(ts) do i = 1, M Call random_number(a) Call random_number(b) c = matmul(a,b) end do te = OMP_GET_WTIME() !Call CPU_Time(te) Write(*,'(A, F0.3)') 'Time for matmul = ', (te - ts) End Program Test_Qmatmul[/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the 2003 Standard document linked in this forum, "Section 1.6.3 FORTRAN 77 compatibility" implies that REAL*8 is in the standard. I have converted many programs between compilers, including between 8-byte and non 8-byte architectures. Much public domain numerical software was developed on Control Data in 70's and 80's. With a program of unknown origin, it can be very difficult to know the required precision. KIND did not solve the problem, as often the KIND module listing is not included with the code listing. Have a look through some of the code examples in this forum.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Posts on this forum often refer to legacy code containing many extensions as "f77" perhaps because the f77 standard didn't require a way of diagnosing extensions, as f90 and later standards do.
I've been around long enough to have used f77 compilers which didn't support the REAL*8 extension, including those for Honeywell 36-bit and Harris 24-bit platforms, as well as CDC compilers which certainly didn't accept REAL*8. You might argue that when we old-timers argue about such widely accepted extensions, we're pining in vain for the return of some of the old architectures.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately I can't find my hard copy of the Fortran 77 standard.
On reflection, recalling the influence of CDC and IBM on that standard, it probably wasn't.
Was there never a reference to the REAL*8 syntax as an extension?
My apologies for the assumption of REAL*8 being included in the standard.
As a long time Fortran user, I will maintain *byte is the most concise and informative definition of required numerical precision for the numerical calculation being performed.
I recall the frustration when converting programs to run on Prime/Vax or PC, of not knowing the required precision for the calculations.
I have always been disappointed with the KIND structure and the allusion of dialing up a precision in SELECTED_REAL_KIND. What do you do when someone selects SELECTED_REAL_KIND (7,38) or SELECTED_REAL_KIND (3,3) which is the example in an oldLahey manual ?
My computer science background has been more based on numerical methods and providing the ability to use SELECTED_REAL_KIND (3,3) is not a significant improvement in the use of Fortran. What would you think the original programmer was hoping to achieve ?
If I see REAL*10 or REAL*16 in new code, at least I know I have a problem, while REAL*8 says the conversion is easy. The statement USE PRECISION is still an unknown. (Actually the use of REAL or REAL*4 was always the biggest worry)
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem with using the byte size is that this tells you nothing about the precision or range of the datatype. You were obviously never a VAX user, where there were two different representations of REAL*8, one of them with the same range as REAL*4, or VMS on Alpha where there were THREE REAL*8 representations (and two REAL*4), each with different precision and range. This is why the SELECTED_REAL_KIND intrinsic is so useful as it frees you from such issues and lets you pick the needed precision and range and lets the compiler decide which type best fits.
SELECTED_REAL_KIND(7,38) would get you REAL*4 on VAX. SELECTED_REAL_KIND(3,3) is easily satisfied by single precision - remember that these are minimums. The compiler must pick the smallest decimal precision kind that meets the criteria for both range and precision. If there is more than one, the smallest kind value is chosen. If the implememtation doesn't have a kind that meets the requirements, you'll get -1 for a kind which will almost certainly result in a compile-time error.
Consider also traditional Cray systems, where single precision is 60 (!) bits. There are also 16-bit real types used in some graphics processors.
Of course there are times when you do need to know the byte size, such as C interoperability, which is why there are constants for C_FLOAT, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
Thanks for your comments. I do now recall the longer real*8 precision. Was there a F90 VAX compiler that allowed the different real*4 and real*8 formats to be used in the same program? Our Vax went in the late 80's.
While the Lahey documented precision of REAL*4 is about 7-8 digits and 10^38 exponent, SELECTED_REAL_KIND(7,38) will get you REAL*8 on ifort and most win-32 compilers, which illistrates my problem: What did the original programmer require for his calculation and what did he get, ie what accuracy did the calculation require.
The importance of byte size has not gone away, especially when using direct access files, where it is still possible to legally mix integer, real and character, not to mention the illegal mixing via subroutine arguments that some of us still find convenient.
Never wrote code for a Cray, however CDC, IBMand ICL were a challenge.I think ICL had48 bit reals. 48 bits gave a new level of uncertainty to REAL for code from the UK.
REAL*8 is a lot more certain than these past offerings. Now, listing code without the defining module takes me back to those annoying times.
KIND isn't likely to be changed. I do use it for portability when selecting higher precision than real*8
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've never heard the CDC 60-bit format called "traditional Cray," although Seymour did have a hand in that.The Cray branded systems started out with a 64-bit format (but not with IEEE precision).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
F-float (VAX single precision) had a range from .29E-38 to 1.7E38 and 23 fraction bits, which qualifies for (7,38). IEEE single precision has effectively one less power of 2 of range due to the encodings for denormals, etc. and the range is more asymmetric than for VAX F-float (1.2E-38 to 3.4E38).
I don't think Lahey ever supported VAX floating, so their example is a bit strange.
There is also accomodation in Fortran 2008 for decimal floating point with the addition of an optional RADIX argument, so you might have more than one kind of a given byte size.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[bash]program matrix !use omp_lib Implicit None Real(8), Allocatable:: a(:,:), b(:,:), c(:,:) Integer :: N = 200, M = 500 Integer :: i Real(8) :: ts, te, OMP_GET_WTIME Allocate( a(N,N), b(N,N), c(N,N) ) Call random_seed() ts = OMP_GET_WTIME() !Call CPU_Time(ts) !$OMP PARALLEL DO do i = 1, M Call random_number(a) Call random_number(b) c = matmul(a,b) end do te = OMP_GET_WTIME() !Call CPU_Time(te) Write(*,'(A, F0.3)') 'Time for matmul = ', (te - ts) pause end program matrix[/bash]
But /Qparallel has no effects! I thought /Qparallel would parallelize the Code without using any explicit directives?!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If it were not for a requirement to trace a history in random_number, the compiler might start out by the shortcut of replacing doi=1,M by doi= M,M when you don't set /Qopenmp. I don't think the compiler would perform the analysis to decide that your call to random_seed would eliminate a requirement to follow a full random_number sequence.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I build the code with matmul and the version 12 compiler, it tells me it auto-parallelized the matmul:
c:\Projects\t.f90(16): (col. 13) remark: LOOP WAS AUTO-PARALLELIZED.
c:\Projects\t.f90(17): (col. 13) remark: LOOP WAS AUTO-PARALLELIZED.
c:\Projects\t.f90(18): (col. 12) remark: LOOP WAS AUTO-PARALLELIZED.
c:\Projects\t.f90(18): (col. 12) remark: PERMUTED LOOP WAS AUTO-PARALLELIZED.
If I add /Qguide I see this:
c:\Projects\t.f90(16): remark #30525: (PAR) Insert a "!dir$ loop count min(64)"
statement right before the loop at line 16 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 64 iterations.
c:\Projects\t.f90(17): remark #30525: (PAR) Insert a "!dir$ loop count min(64)"
statement right before the loop at line 17 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 64 iterations.
c:\Projects\t.f90(18): remark #30525: (PAR) Insert a "!dir$ loop count min(64)"
statement right before the loop at line 18 to parallelize the loop. [VERIFY] Make sure that the loop has a minimum of 64 iterations.
16 and 17 are the calls to RANDOM_NUMBER, so I would ignore that. I tried adding a directive before the MATMUL and it didn't change the output, though as I note the compiler thinks it did parallelize it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »