- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In kernel32.f90 there is an interface for the C function CreateThread to FORTAN.
It is supposed to be doing the argument-passing convention adaptation to FORTRAN.
It is supposed to take 6 arguments
The first has ro be a derived type "security attributes" It won't accept "0" there as the instructions suggest it should
The second is an integer stack size. It does seem to accept zero there, as the instructions say it should.
The third is called "thread_func" which in all examples is the name of a function (or in some examples, the name of a subroutine - withoiut a RETURN statement!)
The fourth is an integer*4 argument which will be passed to "thread_func"
The 5th is an integer argumment to receive some flags
The 6th is an integer thread_ID
Here is my simple test program
cDEC$ FIXEDFORMLINESIZE:132
USE DFLIB
USE DFWIN
type(T_SECURITY_ATTRIBUTES)Security
INTEGER THREADID,FLAGS
Security.nlength=0
IARG=7
IHANDLE= CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
WRITE(*,*)IARG
STOP
END
INTEGER FUNCTION SUMMATION(IARG)
SUMMATION=IARG
IARG=IARG**2
END
If it works it should print out 7**2 = 49
It complies but won't link: Here is the linker error message:
Compiling Fortran...
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(8) : Warning: Variable SUMMATION is used before its value has been defined
CALL CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
--------------------------------------^
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(8) : Warning: Variable FLAGS is used before its value has been defined
CALL CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
--------^
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(8) : Warning: Variable THREADID is used before its value has been defined
CALL CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
--------^
LASTTIME.OBJ - 0 error(s), 3 warning(s)
What am I doing wrong?
We need a new CREATETHREAD routine for FORTRAN with no kludgy stuff around it!
Just : CALL CREATETHREAD(SUBNAM, ARGUMENT LIST FOR SUBNAM.... AS MANY AS IT NEEDS)
Thanks
Paul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Paul,
Use of CreateThread can be disadvantages for two reasons: a) it is non-portable (Windows only), b) is specific two threads regardless of hardware available on the system. A better practice is to use OpenMP. The following program is an updated version of your program, or should I say in the spirit of your original code:
! Paul_Dent.f90
module MySort
! no data
contains
! contained procedures
! Iswap - Swap two integers
subroutine Iswap(I1, I2)
implicit none
! arguments
integer :: I1, I2
! locals
integer :: Itemp
Itemp = I1
I1 = I2
I2 = Itemp
end subroutine Iswap
!**************************************************************
! MERGE - Merges two adjacent sections of indicies
! V = array of keys
! L = array of indicies
! N = size of arrays
! I1 = starting index of section 1
! I2 = starting index of section 2
!
! section 1 = L(I1:I2-1)
! section 2 = L(I2:MIN(I2+I2-I1-1,N))
! Notes: I2-I1 is the size of section 1
! the size of section 2 is MIN(I2-I1, N-I2+1) ! don't run off end of array
!***************************************************************
RECURSIVE SUBROUTINE MERGE(V,L,N,I1,I2)
implicit none
! arguments
integer, intent(in) :: N,I1,I2
real :: V(N)
integer :: L(N)
! local variables
integer :: II1, II2,I1begin,I1end,I2end,M1,M1end
! *** Note, when N is very large, Ltemp should be allocatable
! *** for this sample program Ltemp can fit on stack
integer :: Ltemp((I2-I1)+MIN(I2-I1,N-I2+1))
! save limits
II1=I1
II2 = I2
I1begin = I1
I1end = I2-1 ! adjacent sections
I2end = MIN(I2+I2-I1-1,N) ! MIN(base of section 2 + size of section 1 minus 1, N)
M1end = size(Ltemp)
! merge sections of L into Ltemp
do M1 = 1, M1end
if(V(L(II1)) <= V(L(II2))) then
Ltemp(M1) = L(II1)
II1 = II1 + 1
if(II1 > I1end) then
if(M1 < M1end) Ltemp(M1+1:M1end) = L(II2:I2end)
exit ! exit DO M1 loop
endif
else
Ltemp(M1) = L(II2)
II2 = II2 + 1
if(II2 > I2end) then
if(M1 < M1end) Ltemp(M1+1:M1end) = L(II1:I1end)
exit ! exit DO M1 loop
endif
endif
end do
L(I1begin:I2end) = Ltemp ! move Ltemp back into section of L
END SUBROUTINE MERGE
!*****************************************************
! SORTserial - Produce a sorted list of indexes into an array of keys
! This subroutine will not create additional parallelization.
! However, it can be called from either a serial application, or
! or from a parallel region operating on a subsection of an array
! or on different arrays.
!*****************************************************
RECURSIVE SUBROUTINE SORTserial(V,L,N,Ibegin,Iend)
implicit none
integer, intent(in) :: N
REAL :: V(N)
INTEGER :: L(N)
integer, intent(in) :: Ibegin,Iend
! local variables
integer :: BatchSize ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
if(N==1) return
BatchSize = Iend-Ibegin+1
select case(BatchSize)
case(1)
return ! nothing left to sort
case(2)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
return
case(3)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
return
end select
! all other cases
BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize)
call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTserial
!*****************************************************
! SORTparallelHandSplit - Produce a sorted list of indexes into an array of keys
! This subroutine uses OpenMP parallelization in a recursive manner.
! Initial slicing is performed by hand.
!*****************************************************
RECURSIVE SUBROUTINE SORTparallelHandSplit(V,L,N,Ibegin,Iend)
use omp_lib
implicit none
integer, intent(in) :: N
REAL :: V(N)
INTEGER :: L(N)
integer, intent(in) :: Ibegin,Iend
! local variables
integer :: BatchSize ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
integer :: nThreads, iThread, myIbegin, myIend
if(N==1) return
BatchSize = Iend-Ibegin+1
select case(BatchSize)
case(1)
return ! nothing left to sort
case(2)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
return
case(3)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
return
end select
if(BatchSize == N) then
! first call
!$omp parallel shared(nThreads) private(iThread, myIbegin, myIend, BatchSize)
nThreads = omp_get_num_threads() ! all threads overstrike nThreads
iThread = omp_get_thread_num()
BatchSize = (N + nThreads - 1) / nThreads
myIbegin = Ibegin + (BatchSize * iThread)
myIend = myIbegin + BatchSize - 1
call SORTserial(V, L, N, myIbegin, myIend)
!$omp barrier
! now merge batches
do while(BatchSize < N)
! filter to even thread numbers
if(iand(iThread, 1) == 0) then
if(myIbegin + BatchSize < N) then
call MERGE(V, L, N, myIbegin, myIbegin + BatchSize)
endif
end if
BatchSize = BatchSize * 2
myIbegin = Ibegin + (BatchSize * iThread)
!$omp barrier
end do
!$omp end parallel
return
endif
! Here in parallel region with something to split
! Note, no further parallelization available
BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize-1)
call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTparallelHandSplit
end module MySort
program Paul_Dent
use MySort
use IFPORT
use omp_lib
implicit none
INTEGER, PARAMETER :: N = 10*1000*1000
REAL, ALLOCATABLE :: V(:) ! N keys
INTEGER, ALLOCATABLE :: L(:) ! N indexes
REAL :: X
INTEGER :: I
double precision :: T1, T2
ALLOCATE(V(N), L(N))
X=RAND(123) ! reseed and discard
X=RAND(-123) ! reseed and discard
DO I=1,N
V(I)=RAND(0) ! key is random number
END DO
! Set index array to initial unsorted order
DO I=1,N
L(I) = I
END DO
T1 = omp_get_wtime()
CALL SORTserial(V,L,N,1,N) ! THIS IS THE 2-THREAD VERSION.
T2 = omp_get_wtime()
WRITE(*,*)"EXECUTION TIME UNTHREADED=",T2-T1," SECONDS"
! verify
DO I=1,N-1
if(V(L(I)) > V(L(I+1))) then
WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
exit
endif
END DO
! Set index array to initial unsorted order
DO I=1,N
L(I) = I
END DO
T1 = omp_get_wtime()
CALL SORTparallelHandSplit(V,L,N,1,N) ! WHICH RUNS TWO INSTANCES OF THIS N0N-THREADED VERSION
T2 = omp_get_wtime()
WRITE(*,*)"EXECUTION TIME SORTparallelHandSplit=",T2-T1," SECONDS"
! verify
DO I=1,N-1
if(V(L(I)) > V(L(I+1))) then
WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
exit
endif
END DO
end program Paul_Dent
Notes:
In order to get meaningful timing results I increased the number of elements to 10 million.
I also made these arrays allocatable (large arrays should go on heap).
The parallel code will use as many hardware threads as are available to the program (even 1 thread).
Something for you to do is to read about generic interfaces and then add different key types (real(8), character, etc...).
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I should probably have a RETURN statement at the end of my "thred_func" routine SUMMATION. That changed nothing.
I then dcealred SUMMATION to be EXTERNAL.
This got rid of all except one linker error:
Compiling Fortran...
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(9) : Error: This actual argument must not be the name of a procedure. [SUMMATION]
IHANDLE= CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
------------------------------------------^
Error executing df.exe.
LASTTIME.OBJ - 1 error(s), 0 warning(s)
What? The name of a "procedure" (translation FUNCTION or SUBROUTINE ??) is exactly what the argument "Thread_Func" is supposed to be.
Pleas ecan we have a FORTRAN CreateThread and not a cobbled-up C-version
It should be an assembler routine compiled to .obj that is callable from FoRTRAN and in turn calls the Windows API.
Ohho! Maybe I could write that!!
Paul
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One thing that is going wrong in your code is that summation is not declared to be a function, at least not inside the main program. I found an old thread, https://community.intel.com/t5/Intel-Moderncode-for-Parallel/CreateThread-in-Fortran/td-p/769509, that sheds some light on the matter. But not enough ;). I am trying to get the program to work, based on this and other information ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is this a 32 bit build? THREADID should be integer(LPDWORD) so if it is a 64 bit build the code is wrong. The Intel interface in kernel 32 is wrong BTW as it gives integer(DWORD) which is always 32bit
I would also change the 0 to 0_SIZE_T similarly IARG it type LPVOID which chnages with 32/64 bittiness
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not using the latest compiler so my version of kernel32 may not be the version in question. ThreadId is DWORD (lpThreadId is a pointer). The interface in kernel32 specifies ThreadId being passed by reference so is passing a pointer - and is correct.
The 'thread-func' referenced by the OP is lpStartAddress which is a pointer to the code to be run by the thread and, in the original example, should be passed as Loc(SUMMATION)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW I've been using the following since Intel Visual Fortran Version 8.0. Note this predates the BIND(C) and other interoperation features. Start with the following, then add in the newer features.
! handle = CreateThread (security, stack, thread_func, argument, flags, thread_id)
integer(INT_PTR_KIND()) :: ControlPanelHandle = 0
integer(INT_PTR_KIND()) :: ControlPanelSecurity = 0 ! inherit
integer(INT_PTR_KIND()) :: ControlPanelStack = 0 ! inherit
integer(INT_PTR_KIND()) :: ControlPanelArg = 0
integer(4) :: ControlPanelFlags = 0
integer(INT_PTR_KIND()) :: ControlPanelThread_ID = 0
INTERFACE
integer(4) FUNCTION ControlPanel(arg)
!DEC$ ATTRIBUTES STDCALL, ALIAS:"_controlpanel" :: ControlPanel
integer(4),POINTER :: arg
END FUNCTION
END INTERFACE
! Create seperate thread to run the Modal Dialog control pannel
ControlPanelHandle = CreateThread(&
& NULL, ControlPanelStack, loc(ControlPanel), loc(ControlPanelArg), ControlPanelFlags, loc(ControlPanelThread_ID))
...
integer(4) FUNCTION ControlPanel(arg)
!DEC$ ATTRIBUTES STDCALL, ALIAS:"_controlpanel" :: ControlPanel
...
end FUNCTION ControlPanel
As stated earlier, the above can be tidied up. I suggest making one change at a time.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HANDLE CreateThread(
[in, optional] LPSECURITY_ATTRIBUTES lpThreadAttributes,
[in] SIZE_T dwStackSize,
[in] LPTHREAD_START_ROUTINE lpStartAddress,
[in, optional] __drv_aliasesMem LPVOID lpParameter,
[in] DWORD dwCreationFlags,
[out, optional] LPDWORD lpThreadId
);
Strictly speaking the API is expected ThreadID to be a memory address of a dword passed by VALUE. BTW are you build for 64 or 32 bit?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@andrew_4619 wrote:HANDLE CreateThread( [in, optional] LPSECURITY_ATTRIBUTES lpThreadAttributes, [in] SIZE_T dwStackSize, [in] LPTHREAD_START_ROUTINE lpStartAddress, [in, optional] __drv_aliasesMem LPVOID lpParameter, [in] DWORD dwCreationFlags, [out, optional] LPDWORD lpThreadId );
Strictly speaking the API is expected ThreadID to be a memory address of a dword passed by VALUE. BTW are you build for 64 or 32 bit?
I would say instead that it wants either a dword passed by reference (it is an output argument) or a null pointer. In Paul's initial post, it's a default integer (it would have been better to use INTEGER(DWORD)) passed by reference. Here is the KERNEL32 module's declaration:
FUNCTION CreateThread( &
lpThreadAttributes, &
dwStackSize, &
lpStartAddress, &
lpParameter, &
dwCreationFlags, &
lpThreadId)
import
integer(HANDLE) :: CreateThread ! HANDLE
!DEC$ ATTRIBUTES DEFAULT, STDCALL, DECORATE, ALIAS:'CreateThread' :: CreateThread
!DEC$ ATTRIBUTES REFERENCE, ALLOW_NULL :: lpThreadAttributes
TYPE (T_SECURITY_ATTRIBUTES) lpThreadAttributes ! LPSECURITY_ATTRIBUTES lpThreadAttributes
integer(DWORD) dwStackSize ! DWORD dwStackSize
integer(LPVOID) lpStartAddress ! LPTHREAD_START_ROUTINE lpStartAddress
integer(LPVOID) lpParameter ! LPVOID lpParameter
integer(DWORD) dwCreationFlags ! DWORD dwCreationFlags
integer(DWORD) lpThreadId ! LPDWORD lpThreadId
!DEC$ ATTRIBUTES REFERENCE, IGNORE_LOC, ALLOW_NULL :: lpThreadId
END FUNCTION
Note the directive on line 18. This allows you to pass a DWORD integer by reference (as he did), LOC(DWORD integer) as older code might have done, or NULL(). I'm probably the one who updated this declaration as I did a LOT of work on this module before I retired.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SECURITY_ATTRIBUTES is passed by reference - what you want there is NULL(). This will turn into a zero by value - that's what the ALLOW_NULL attribute on the argument is for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was wondering the other day about this, can I do this with a subroutine?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>can I do this with a subroutine?
No. The THREAD_START_ROUTINE is a function taking a pointer to argument(s) and returns a DWORD result.
Maybe you should consider OpenMP. Sketch:
use omp_lib
... ! in serial (master) code section
!$OMP PARALLEL ! optional add private and/or shared clauses
!$OMP MASTER ! only master thread runs following section
... ! optional setup
!$OMP TASK ! optional add private and/or shared clauses
CALL FOOBAR(A, B, C)
!$OMP END TASK ! task continues to run after this
... ! optional concurrent code
!$OMP END MASTER
! Note, this section runs in parallel, task may be running
! no code follows until END PARALLEL
! unless you want to...
!$OMP END PARALLEL ! waits for task completion
...
See: TASK (intel.com) for additional information.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jimdempseyatthecove , thanks.
I bought the text book, The openmp common core, it is useful and I am slowly working through it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for all the hints. I am now in business.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How do you post code? I don't see the button
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Paul_Dent wrote:
How do you post code? I don't see the button
Click on the ... next to the quote button. A second row of controls will appear. You want this:
Click that, select Fortran as the language, paste in the box.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is a threaded sort-merge I got work in case it helps somebody else
The objective was to take an existing recursive sort merge routine and without changing it, make two
threads execute one recursion each in parallel, so halving the run time (which it does)
The main puzzle was that CreateThread only allows passing one argument to the thread function
I got round that using a wrapper routine to place the addresses of the caller's arguments (only arrays need this) in a named common block and passing the name of the common bock as the single allowed argument to another routine that unpacks them to put in the original routine's argument list To pack an array address, you use %LOC(A) to get the address of array A into IaddressA then you put %VAL(IaddressA) into the call in place of A,. At first I thought the compiler would complain that %VAL was not he same type as the original argument, but I figure what the % does is to tell the compiler to ignore type mismatch, because it worked for real, complex, strings etc
Cheers, Paul
cDEC$ FIXEDFORMLINESIZE:132
REAL V(30000)
INTEGER L(30000)
X=RAND(123)
X=RAND(-123)
N=30000
DO 2 I=1,N
V(I)=RAND(0)
2 CONTINUE
TIM=TIMEF()
CALL SORTMERGE(V,L,1,N,ISTART) ! THIS IS THE 2-THREAD VERSION.
TIM=TIMEF()
WRITE(*,*)"EXECUTION TIME FOR 2-THREADS=",TIM," SECONDS"
C PRINT OUT A FEW SORTED VALUES
NEXT=ISTART
DO 1 I=1,5
WRITE(*,*)V(NEXT)
NEXT=L(NEXT)
1 CONTINUE
TIM=TIMEF()
CALL SORT(V,L,1,N,ISTART) ! WHICH RUNS TWO INSTANCES OF THIS N0N-THREADED VERSION
TIM=TIMEF()
WRITE(*,*)"EXECUTION TIME UNTHREADED=",TIM," SECONDS"
NEXT=ISTART
DO 3 I=1,5
WRITE(*,*)V(NEXT)
NEXT=L(NEXT)
3 CONTINUE
STOP
END
C*************************************************
SUBROUTINE SORTMERGE(V,L,IFIRST,N,ISTART)
USE DFLIB
USE DFWIN
type(T_SECURITY_ATTRIBUTES)Security
INTEGER THREADID1,THREADID2
EXTERNAL SORTWRAPPER
REAL V(*)
INTEGER L(*)
COMMON /SORTARGS/IADDRESSV,IADDRESSL,I1ST,ISTART1,ISTART2,NN
IARG1=-1
IARG2=-2
IADDRESSV=%LOC(V)
IADDRESSL=%LOC(L)
NN=N ! BECAUSE DUMMY ARGUMENT CAN'T BE IN COMMON
I1ST=IFIRST ! BECAUSE DUMMY ARGUMENT CAN'T BE IN COMMON
Security.nlength=0
IHANDLE1= CreateThread (Security,0,%LOC(SORTWRAPPER),%LOC(IARG1),0,%LOC(THREADID1))
IHANDLE2= CreateThread (Security,0,%LOC(SORTWRAPPER),%LOC(IARG2),0,%LOC(THREADID2))
C THE THREADS WILL SIGNAL WHEN THEY'RE DONE BY SETTING IARG1 and IARG2 TO +1
99 IF((IARG1.LT.0).OR.(IARG2.LT.0))GO TO 99
CALL MERGE(V,L,ISTART1,ISTART2,ISTART)
RETURN
END
C**************************************************************
C THIS ROUTINE FILLS THE LINKED LIST L(1:N-1) WITH THE INDICES
C OF THE NEXT LARGEST VALUE IN SEQUENCE, ISTART BEING THE FIRST
C AND LARGEST VALUE. L(N)=0 IS A STOP INDICATOR
C**************************************************************
RECURSIVE SUBROUTINE SORTWRAPPER(IARG)
COMMON /SORTARGS/IADDRESSV,IADDRESSL,IFIRST,ISTART1,ISTART2,N
IF(IARG.EQ.-1)CALL SORT(%VAL(IADDRESSV),%VAL(IADDRESSL),IFIRST,RSHIFT(N,1),ISTART1)
IF(IARG.EQ.-2)CALL SORT(%VAL(IADDRESSV),%VAL(IADDRESSL),IFIRST+RSHIFT(N,1),N-RSHIFT(N,1),ISTART2)
IARG=1
RETURN
END
C*****************************************************
C SET IFIRST TO 1 ON ENTRY
C RETURNS A LINKED LIST L OF SORTED V VALUES, HIGHEST TO LOWEST
C K=L(ISTART) IS THE INDEX OF THE LARGEST
C THEN L(K) IS THE NEXT LARGEST AND SO ON
C*****************************************************
RECURSIVE SUBROUTINE SORT(V,L,IFIRST,N,ISTART)
REAL V(*)
INTEGER L(*)
IF(N.EQ.1)THEN
ISTART=IFIRST
L(ISTART)=0
ELSE
CALL SORT(V,L,IFIRST,RSHIFT(N,1),ISTART1)
CALL SORT(V,L,IFIRST+RSHIFT(N,1),N-RSHIFT(N,1),ISTART2)
CALL MERGE(V,L,ISTART1,ISTART2,ISTART)
ENDIF
RETURN
END
C**************************************************************
C THIS ROUTINE MERGES TWO LINKED LISTS
C THE FIRST STARTS AT ISTART1 AND THE SECOND STARTS AT ISTART2
C THE OUTPUT IS THE START OF THE MERGED LIST ISTART AND THE LAST
C INDEX IN A LINKED LIST CHAIN IS 0 AS A STOP INDICATOR
C***************************************************************
RECURSIVE SUBROUTINE MERGE(V,L,NEXT1,NEXT2,ISTART)
DIMENSION V(*),L(*)
C WRITE(*,*)"IN MERGE"
IFLAG=1
D=V(NEXT1)-V(NEXT2)
IF(D.GT.0)CALL UPDINTS(L,ISTART,IFLAG,NEXT1,NEXT2,IFLAG)
IF(D.LE.0)CALL UPDINTS(L,ISTART,IFLAG,NEXT2,NEXT1,IFLAG)
IF(IFLAG.LT.0)RETURN
I=ISTART
99 D=V(NEXT1)-V(NEXT2)
IF(D.GT.0)CALL UPDINTS(L,L(I),I,NEXT1,NEXT2,IFLAG)
IF(D.LE.0)CALL UPDINTS(L,L(I),I,NEXT2,NEXT1,IFLAG)
IF(IFLAG.GE.0)GO TO 99
RETURN
END
C***********************************************
RECURSIVE SUBROUTINE UPDINTS(L,LI,I,NEXT1,NEXT2,IFLAG)
INTEGER L(*)
LI=NEXT1
IF(L(NEXT1).NE.0)THEN
NEXT1=L(NEXT1)
I=L(I)
ELSE
L(NEXT1)=NEXT2
IFLAG=-1
ENDIF
RETURN
END
C******************
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Paul,
Use of CreateThread can be disadvantages for two reasons: a) it is non-portable (Windows only), b) is specific two threads regardless of hardware available on the system. A better practice is to use OpenMP. The following program is an updated version of your program, or should I say in the spirit of your original code:
! Paul_Dent.f90
module MySort
! no data
contains
! contained procedures
! Iswap - Swap two integers
subroutine Iswap(I1, I2)
implicit none
! arguments
integer :: I1, I2
! locals
integer :: Itemp
Itemp = I1
I1 = I2
I2 = Itemp
end subroutine Iswap
!**************************************************************
! MERGE - Merges two adjacent sections of indicies
! V = array of keys
! L = array of indicies
! N = size of arrays
! I1 = starting index of section 1
! I2 = starting index of section 2
!
! section 1 = L(I1:I2-1)
! section 2 = L(I2:MIN(I2+I2-I1-1,N))
! Notes: I2-I1 is the size of section 1
! the size of section 2 is MIN(I2-I1, N-I2+1) ! don't run off end of array
!***************************************************************
RECURSIVE SUBROUTINE MERGE(V,L,N,I1,I2)
implicit none
! arguments
integer, intent(in) :: N,I1,I2
real :: V(N)
integer :: L(N)
! local variables
integer :: II1, II2,I1begin,I1end,I2end,M1,M1end
! *** Note, when N is very large, Ltemp should be allocatable
! *** for this sample program Ltemp can fit on stack
integer :: Ltemp((I2-I1)+MIN(I2-I1,N-I2+1))
! save limits
II1=I1
II2 = I2
I1begin = I1
I1end = I2-1 ! adjacent sections
I2end = MIN(I2+I2-I1-1,N) ! MIN(base of section 2 + size of section 1 minus 1, N)
M1end = size(Ltemp)
! merge sections of L into Ltemp
do M1 = 1, M1end
if(V(L(II1)) <= V(L(II2))) then
Ltemp(M1) = L(II1)
II1 = II1 + 1
if(II1 > I1end) then
if(M1 < M1end) Ltemp(M1+1:M1end) = L(II2:I2end)
exit ! exit DO M1 loop
endif
else
Ltemp(M1) = L(II2)
II2 = II2 + 1
if(II2 > I2end) then
if(M1 < M1end) Ltemp(M1+1:M1end) = L(II1:I1end)
exit ! exit DO M1 loop
endif
endif
end do
L(I1begin:I2end) = Ltemp ! move Ltemp back into section of L
END SUBROUTINE MERGE
!*****************************************************
! SORTserial - Produce a sorted list of indexes into an array of keys
! This subroutine will not create additional parallelization.
! However, it can be called from either a serial application, or
! or from a parallel region operating on a subsection of an array
! or on different arrays.
!*****************************************************
RECURSIVE SUBROUTINE SORTserial(V,L,N,Ibegin,Iend)
implicit none
integer, intent(in) :: N
REAL :: V(N)
INTEGER :: L(N)
integer, intent(in) :: Ibegin,Iend
! local variables
integer :: BatchSize ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
if(N==1) return
BatchSize = Iend-Ibegin+1
select case(BatchSize)
case(1)
return ! nothing left to sort
case(2)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
return
case(3)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
return
end select
! all other cases
BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize)
call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTserial
!*****************************************************
! SORTparallelHandSplit - Produce a sorted list of indexes into an array of keys
! This subroutine uses OpenMP parallelization in a recursive manner.
! Initial slicing is performed by hand.
!*****************************************************
RECURSIVE SUBROUTINE SORTparallelHandSplit(V,L,N,Ibegin,Iend)
use omp_lib
implicit none
integer, intent(in) :: N
REAL :: V(N)
INTEGER :: L(N)
integer, intent(in) :: Ibegin,Iend
! local variables
integer :: BatchSize ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
integer :: nThreads, iThread, myIbegin, myIend
if(N==1) return
BatchSize = Iend-Ibegin+1
select case(BatchSize)
case(1)
return ! nothing left to sort
case(2)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
return
case(3)
if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
return
end select
if(BatchSize == N) then
! first call
!$omp parallel shared(nThreads) private(iThread, myIbegin, myIend, BatchSize)
nThreads = omp_get_num_threads() ! all threads overstrike nThreads
iThread = omp_get_thread_num()
BatchSize = (N + nThreads - 1) / nThreads
myIbegin = Ibegin + (BatchSize * iThread)
myIend = myIbegin + BatchSize - 1
call SORTserial(V, L, N, myIbegin, myIend)
!$omp barrier
! now merge batches
do while(BatchSize < N)
! filter to even thread numbers
if(iand(iThread, 1) == 0) then
if(myIbegin + BatchSize < N) then
call MERGE(V, L, N, myIbegin, myIbegin + BatchSize)
endif
end if
BatchSize = BatchSize * 2
myIbegin = Ibegin + (BatchSize * iThread)
!$omp barrier
end do
!$omp end parallel
return
endif
! Here in parallel region with something to split
! Note, no further parallelization available
BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize-1)
call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTparallelHandSplit
end module MySort
program Paul_Dent
use MySort
use IFPORT
use omp_lib
implicit none
INTEGER, PARAMETER :: N = 10*1000*1000
REAL, ALLOCATABLE :: V(:) ! N keys
INTEGER, ALLOCATABLE :: L(:) ! N indexes
REAL :: X
INTEGER :: I
double precision :: T1, T2
ALLOCATE(V(N), L(N))
X=RAND(123) ! reseed and discard
X=RAND(-123) ! reseed and discard
DO I=1,N
V(I)=RAND(0) ! key is random number
END DO
! Set index array to initial unsorted order
DO I=1,N
L(I) = I
END DO
T1 = omp_get_wtime()
CALL SORTserial(V,L,N,1,N) ! THIS IS THE 2-THREAD VERSION.
T2 = omp_get_wtime()
WRITE(*,*)"EXECUTION TIME UNTHREADED=",T2-T1," SECONDS"
! verify
DO I=1,N-1
if(V(L(I)) > V(L(I+1))) then
WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
exit
endif
END DO
! Set index array to initial unsorted order
DO I=1,N
L(I) = I
END DO
T1 = omp_get_wtime()
CALL SORTparallelHandSplit(V,L,N,1,N) ! WHICH RUNS TWO INSTANCES OF THIS N0N-THREADED VERSION
T2 = omp_get_wtime()
WRITE(*,*)"EXECUTION TIME SORTparallelHandSplit=",T2-T1," SECONDS"
! verify
DO I=1,N-1
if(V(L(I)) > V(L(I+1))) then
WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
exit
endif
END DO
end program Paul_Dent
Notes:
In order to get meaningful timing results I increased the number of elements to 10 million.
I also made these arrays allocatable (large arrays should go on heap).
The parallel code will use as many hardware threads as are available to the program (even 1 thread).
Something for you to do is to read about generic interfaces and then add different key types (real(8), character, etc...).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note, the merge subroutine can be improved upon.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I followed what you said and it worked and I was able to post some code (just a test) below
cDEC$ FIXEDFORMLINESIZE:132
C****************************************************************************
C TESTS A WAY TO WRITE MATH ROUTINES IN WHICH THREADING IS INVISBLE TO THE CALLER
C****************************************************************************
REAL A(16384),B(16384)
X=RAND(123)
X=RAND(-123)
N=16384
A(1:N)=RAND(0)
B(1:N)=RAND(0)
C measure time to peform 400 DOT PRODUCTS of 16384 VALUES
TIM=TIMEF()
DO 1 I=1,400
X=DOT(A,B,N)
1 CONTINUE
TIM=TIMEF()
WRITE(*,*)TIM
c compare with the time to do it without threads
TIM=TIMEF()
DO 2 I=1,400
X=RECURSIVEDOT(A,B,N)
2 CONTINUE
TIM=TIMEF()
WRITE(*,*)TIM
PAUSE
STOP
END
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page