Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

CreateThread

Paul_Dent
New Contributor I
3,308 Views

In kernel32.f90 there is an interface for the C function CreateThread to FORTAN.

It is supposed to be doing the argument-passing convention adaptation to FORTRAN.

It is supposed to take 6 arguments

The first has ro be a derived type "security attributes"  It won't accept "0" there as the instructions suggest it should

The second is an integer stack size.  It does seem to accept zero there, as the instructions say it should.

The third is called "thread_func"  which in all examples is the name of a function (or in some examples, the name of a subroutine - withoiut a RETURN statement!)

The fourth is an integer*4 argument which will be passed to "thread_func"

The 5th is an integer argumment to receive some flags

The 6th is an integer thread_ID

 

Here is my simple test program

cDEC$ FIXEDFORMLINESIZE:132
USE DFLIB
USE DFWIN
type(T_SECURITY_ATTRIBUTES)Security
INTEGER THREADID,FLAGS
Security.nlength=0
IARG=7
IHANDLE= CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
WRITE(*,*)IARG
STOP
END
INTEGER FUNCTION SUMMATION(IARG)
SUMMATION=IARG
IARG=IARG**2
END

 

If it works it should print out 7**2  = 49

It complies but won't link: Here is the linker error message:

Compiling Fortran...
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(8) : Warning: Variable SUMMATION is used before its value has been defined
CALL CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
--------------------------------------^
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(8) : Warning: Variable FLAGS is used before its value has been defined
CALL CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
--------^
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(8) : Warning: Variable THREADID is used before its value has been defined
CALL CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
--------^

LASTTIME.OBJ - 0 error(s), 3 warning(s)

 

What am I doing wrong?

 

We need a new CREATETHREAD routine for FORTRAN with no kludgy stuff around it!

 

Just :  CALL CREATETHREAD(SUBNAM, ARGUMENT LIST FOR SUBNAM.... AS MANY AS IT NEEDS)

 

Thanks

 

Paul

 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
3,122 Views

Paul,

Use of CreateThread can be disadvantages for two reasons: a) it is non-portable (Windows only), b) is specific two threads regardless of hardware available on the system.  A better practice is to use OpenMP. The following program is an updated version of your program, or should I say in the spirit of your original code:

!  Paul_Dent.f90 
module MySort
! no data

contains
! contained procedures
! Iswap - Swap two integers
subroutine Iswap(I1, I2)
    implicit none
    ! arguments
    integer :: I1, I2
    ! locals
    integer :: Itemp
    
    Itemp = I1
    I1 = I2
    I2 = Itemp
end subroutine Iswap

!**************************************************************
! MERGE - Merges two adjacent sections of indicies
! V = array of keys
! L = array of indicies
! N = size of arrays
! I1 = starting index of section 1
! I2 = starting index of section 2
!
! section 1 = L(I1:I2-1)
! section 2 = L(I2:MIN(I2+I2-I1-1,N))
! Notes: I2-I1 is the size of section 1
! the size of section 2 is MIN(I2-I1, N-I2+1) ! don't run off end of array
!***************************************************************

RECURSIVE SUBROUTINE MERGE(V,L,N,I1,I2)
    implicit none
    ! arguments
    integer, intent(in) :: N,I1,I2
    real :: V(N)
    integer :: L(N)
    ! local variables
    integer :: II1, II2,I1begin,I1end,I2end,M1,M1end
    ! *** Note, when N is very large, Ltemp should be allocatable
    ! *** for this sample program Ltemp can fit on stack
    integer :: Ltemp((I2-I1)+MIN(I2-I1,N-I2+1))
    
    ! save limits
    II1=I1
    II2 = I2
    I1begin = I1
    I1end = I2-1                ! adjacent sections
    I2end = MIN(I2+I2-I1-1,N)   ! MIN(base of section 2 + size of section 1 minus 1, N)
    M1end = size(Ltemp)
    ! merge sections of L into Ltemp
    do M1 = 1, M1end
        if(V(L(II1)) <= V(L(II2))) then
            Ltemp(M1) = L(II1)
            II1 = II1 + 1
            if(II1 > I1end) then
                if(M1 < M1end) Ltemp(M1+1:M1end) = L(II2:I2end)
                exit ! exit DO M1 loop
            endif
        else
            Ltemp(M1) = L(II2)
            II2 = II2 + 1
            if(II2 > I2end) then
                if(M1 < M1end) Ltemp(M1+1:M1end) = L(II1:I1end)
                exit ! exit DO M1 loop
            endif
        endif
    end do
    L(I1begin:I2end) = Ltemp ! move Ltemp back into section of L
END SUBROUTINE MERGE
    
!*****************************************************
! SORTserial - Produce a sorted list of indexes into an array of keys
! This subroutine will not create additional parallelization.
! However, it can be called from either a serial application, or
! or from a parallel region operating on a subsection of an array
! or on different arrays.
!*****************************************************

RECURSIVE SUBROUTINE SORTserial(V,L,N,Ibegin,Iend)
    implicit none
    integer, intent(in) :: N
    REAL :: V(N)
    INTEGER :: L(N)
    integer, intent(in) :: Ibegin,Iend
    ! local variables
    integer :: BatchSize    ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
    
    if(N==1) return
    BatchSize = Iend-Ibegin+1
    select case(BatchSize)
    case(1)
        return  ! nothing left to sort
    case(2)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        return
    case(3)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
        return
    end select
    ! all other cases
    BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
    call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize)
    call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
    call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTserial
    
!*****************************************************
! SORTparallelHandSplit - Produce a sorted list of indexes into an array of keys
! This subroutine uses OpenMP parallelization in a recursive manner.
! Initial slicing is performed by hand.
!*****************************************************

RECURSIVE SUBROUTINE SORTparallelHandSplit(V,L,N,Ibegin,Iend)
    use omp_lib
    implicit none
    integer, intent(in) :: N
    REAL :: V(N)
    INTEGER :: L(N)
    integer, intent(in) :: Ibegin,Iend
    ! local variables
    integer :: BatchSize    ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
    integer :: nThreads, iThread, myIbegin, myIend
    
    if(N==1) return
    
    BatchSize = Iend-Ibegin+1
    select case(BatchSize)
    case(1)
        return  ! nothing left to sort
    case(2)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        return
    case(3)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
        return
    end select
    if(BatchSize == N) then
        ! first call
        !$omp parallel shared(nThreads) private(iThread, myIbegin, myIend, BatchSize)
        nThreads = omp_get_num_threads()    ! all threads overstrike nThreads
        iThread = omp_get_thread_num()
        BatchSize = (N + nThreads - 1) / nThreads
        myIbegin = Ibegin + (BatchSize * iThread)
        myIend = myIbegin + BatchSize - 1
        call SORTserial(V, L, N, myIbegin, myIend)
        !$omp barrier
        ! now merge batches 
        do while(BatchSize < N)
            ! filter to even thread numbers
            if(iand(iThread, 1) == 0) then
                if(myIbegin + BatchSize < N) then
                    call MERGE(V, L, N, myIbegin, myIbegin + BatchSize)
                endif
            end if
            BatchSize = BatchSize * 2
            myIbegin = Ibegin + (BatchSize * iThread)
            !$omp barrier
        end do
        !$omp end parallel
        return
    endif
    
    ! Here in parallel region with something to split
    ! Note, no further parallelization available
    BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
    call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize-1)
    call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
    call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTparallelHandSplit
end module MySort

program Paul_Dent
    use MySort
    use IFPORT
    use omp_lib
    implicit none
    INTEGER, PARAMETER :: N = 10*1000*1000
    REAL, ALLOCATABLE :: V(:)    ! N keys
    INTEGER, ALLOCATABLE :: L(:) ! N indexes
    REAL :: X
    INTEGER :: I
    double precision :: T1, T2
    
    ALLOCATE(V(N), L(N))

    X=RAND(123)     ! reseed and discard
    X=RAND(-123)    ! reseed and discard
    
    DO I=1,N
        V(I)=RAND(0)    ! key is random number
    END DO
    
    ! Set index array to initial unsorted order
    DO I=1,N
        L(I) = I
    END DO
    
    T1 = omp_get_wtime()
    CALL SORTserial(V,L,N,1,N)                                                         ! THIS IS THE 2-THREAD VERSION.
    T2 = omp_get_wtime()

    WRITE(*,*)"EXECUTION TIME UNTHREADED=",T2-T1," SECONDS"
    ! verify
    DO I=1,N-1
        if(V(L(I)) > V(L(I+1))) then
            WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
            exit
        endif
    END DO

    
    ! Set index array to initial unsorted order
    DO I=1,N
        L(I) = I
    END DO
    
    T1 = omp_get_wtime()
    CALL SORTparallelHandSplit(V,L,N,1,N)                                                                     ! WHICH RUNS TWO INSTANCES OF THIS N0N-THREADED VERSION
    T2 = omp_get_wtime()

    WRITE(*,*)"EXECUTION TIME SORTparallelHandSplit=",T2-T1," SECONDS"

    ! verify
    DO I=1,N-1
        if(V(L(I)) > V(L(I+1))) then
            WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
            exit
        endif
    END DO

end program Paul_Dent

Notes:

In order to get meaningful timing results I increased the number of elements to 10 million.

I also made these arrays allocatable (large arrays should go on heap).

The parallel code will use as many hardware threads as are available to the program (even 1 thread).

 

Something for you to do is to read about generic interfaces and then add different key types (real(8), character, etc...).

 

Jim Dempsey

View solution in original post

0 Kudos
18 Replies
Paul_Dent
New Contributor I
3,302 Views

I should probably have a RETURN statement at the end of my "thred_func" routine SUMMATION. That changed nothing.

 

I then dcealred SUMMATION to be EXTERNAL.

This got rid of all except one linker error:

Compiling Fortran...
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR
C:\Program Files\Microsoft Visual Studio\MyProjects\test\LASTTIME\LASTTIME.FOR(9) : Error: This actual argument must not be the name of a procedure. [SUMMATION]
IHANDLE= CreateThread (Security,0,SUMMATION,IARG,FLAGS,THREADID)
------------------------------------------^
Error executing df.exe.

LASTTIME.OBJ - 1 error(s), 0 warning(s)

 

What?  The name of a "procedure" (translation FUNCTION or SUBROUTINE ??)  is exactly what the argument "Thread_Func" is supposed to be.

 

Pleas ecan we have a FORTRAN CreateThread and not a cobbled-up C-version

It should be an assembler routine compiled to .obj that is callable from FoRTRAN and in turn calls the Windows API.

 

Ohho! Maybe I could write that!!

 

Paul

 

 

0 Kudos
Arjen_Markus
Honored Contributor I
3,279 Views

One thing that is going wrong in your code is that summation is not declared to be a function, at least not inside the main program. I found an old thread, https://community.intel.com/t5/Intel-Moderncode-for-Parallel/CreateThread-in-Fortran/td-p/769509, that sheds some light on the matter. But not enough ;). I am trying to get the program to work, based on this and other information ...

0 Kudos
andrew_4619
Honored Contributor III
3,275 Views

Is this a 32 bit build? THREADID should be integer(LPDWORD) so if it is a 64 bit build the code is wrong.  The Intel interface in kernel 32 is wrong BTW as it gives integer(DWORD) which is always 32bit

 

I would also change the 0 to 0_SIZE_T similarly IARG it type LPVOID which chnages with 32/64 bittiness 

0 Kudos
mfinnis
New Contributor II
3,266 Views

I'm not using the latest compiler so my version of kernel32 may not be the version in question. ThreadId is DWORD (lpThreadId is a pointer). The interface in kernel32 specifies ThreadId being passed by reference so is passing a pointer - and is correct.

The 'thread-func' referenced by the OP is lpStartAddress which is a pointer to the code to be run by the thread and, in the original example, should be passed as Loc(SUMMATION)

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,259 Views

FWIW I've been using the following since Intel Visual Fortran Version 8.0. Note this predates the BIND(C) and other interoperation features. Start with the following, then add in the newer features.

! handle = CreateThread (security, stack, thread_func, argument, flags, thread_id) 
integer(INT_PTR_KIND()) :: ControlPanelHandle = 0
integer(INT_PTR_KIND()) :: ControlPanelSecurity = 0   ! inherit
integer(INT_PTR_KIND()) :: ControlPanelStack = 0      ! inherit
integer(INT_PTR_KIND()) :: ControlPanelArg = 0
integer(4) :: ControlPanelFlags = 0
integer(INT_PTR_KIND()) :: ControlPanelThread_ID = 0

INTERFACE
    integer(4) FUNCTION ControlPanel(arg)
        !DEC$ ATTRIBUTES STDCALL, ALIAS:"_controlpanel" :: ControlPanel
        integer(4),POINTER :: arg
    END FUNCTION
END INTERFACE

! Create seperate thread to run the Modal Dialog control pannel
ControlPanelHandle = CreateThread(&
            & NULL, ControlPanelStack, loc(ControlPanel), loc(ControlPanelArg), ControlPanelFlags, loc(ControlPanelThread_ID))
...

integer(4) FUNCTION ControlPanel(arg)
!DEC$ ATTRIBUTES STDCALL, ALIAS:"_controlpanel" :: ControlPanel
...
end FUNCTION ControlPanel

As stated earlier, the above can be tidied up. I suggest making one change at a time.

Jim Dempsey

 

0 Kudos
andrew_4619
Honored Contributor III
3,244 Views
HANDLE CreateThread(
  [in, optional]  LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  [in]            SIZE_T                  dwStackSize,
  [in]            LPTHREAD_START_ROUTINE  lpStartAddress,
  [in, optional]  __drv_aliasesMem LPVOID lpParameter,
  [in]            DWORD                   dwCreationFlags,
  [out, optional] LPDWORD                 lpThreadId
);

Strictly speaking the API is expected ThreadID to be a memory address of a dword passed by VALUE.   BTW are you build for 64 or 32 bit?

 

0 Kudos
Steve_Lionel
Honored Contributor III
3,214 Views

@andrew_4619 wrote:
HANDLE CreateThread(
  [in, optional]  LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  [in]            SIZE_T                  dwStackSize,
  [in]            LPTHREAD_START_ROUTINE  lpStartAddress,
  [in, optional]  __drv_aliasesMem LPVOID lpParameter,
  [in]            DWORD                   dwCreationFlags,
  [out, optional] LPDWORD                 lpThreadId
);

Strictly speaking the API is expected ThreadID to be a memory address of a dword passed by VALUE.   BTW are you build for 64 or 32 bit?

 


I would say instead that it wants either a dword passed by reference (it is an output argument) or a null pointer. In Paul's initial post, it's a default integer (it would have been better to use INTEGER(DWORD)) passed by reference. Here is the KERNEL32 module's declaration:

FUNCTION CreateThread( &
        lpThreadAttributes, &
        dwStackSize, &
        lpStartAddress, &
        lpParameter, &
        dwCreationFlags, &
        lpThreadId)
import
  integer(HANDLE) :: CreateThread ! HANDLE
    !DEC$ ATTRIBUTES DEFAULT, STDCALL, DECORATE, ALIAS:'CreateThread' :: CreateThread
!DEC$ ATTRIBUTES REFERENCE, ALLOW_NULL :: lpThreadAttributes
  TYPE (T_SECURITY_ATTRIBUTES) lpThreadAttributes ! LPSECURITY_ATTRIBUTES lpThreadAttributes
  integer(DWORD) dwStackSize ! DWORD dwStackSize
  integer(LPVOID) lpStartAddress ! LPTHREAD_START_ROUTINE lpStartAddress
  integer(LPVOID) lpParameter ! LPVOID lpParameter
  integer(DWORD) dwCreationFlags ! DWORD dwCreationFlags
  integer(DWORD) lpThreadId ! LPDWORD lpThreadId
  !DEC$ ATTRIBUTES REFERENCE, IGNORE_LOC, ALLOW_NULL :: lpThreadId
 END FUNCTION

Note the directive on line 18. This allows you to pass a DWORD integer by reference (as he did), LOC(DWORD integer) as older code might have done, or NULL(). I'm probably the one who updated this declaration as I did a LOT of work on this module before I retired.

0 Kudos
Steve_Lionel
Honored Contributor III
3,248 Views

SECURITY_ATTRIBUTES is passed by reference - what you want there is NULL(). This will turn into a zero by value - that's what the ALLOW_NULL attribute on the argument is for.

0 Kudos
JohnNichols
Valued Contributor III
3,235 Views

I was wondering the other day about this, can I do this with a subroutine? 

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,231 Views

>>can I do this with a subroutine? 

No. The THREAD_START_ROUTINE is a function taking a pointer to argument(s) and returns a DWORD result.

Maybe you should consider OpenMP. Sketch:

 

use omp_lib
... ! in serial (master) code section
!$OMP PARALLEL ! optional add private and/or shared clauses
!$OMP MASTER ! only master thread runs following section
... ! optional setup
!$OMP TASK  ! optional add private and/or shared clauses
CALL FOOBAR(A, B, C)
!$OMP END TASK ! task continues to run after this
... ! optional concurrent code
!$OMP END MASTER
! Note, this section runs in parallel, task may be running
! no code follows until END PARALLEL
! unless you want to...
!$OMP END PARALLEL ! waits for task completion
...

 

See: TASK (intel.com) for additional information.

 

Jim Dempsey

 

0 Kudos
JohnNichols
Valued Contributor III
3,219 Views

@jimdempseyatthecove , thanks.  

 

I bought the text book, The openmp common core, it is useful and I am slowly working through it.  

 

0 Kudos
Paul_Dent
New Contributor I
3,168 Views

Thanks for all the hints. I am now in business.

 

 

0 Kudos
Paul_Dent
New Contributor I
3,160 Views

How do you post code? I don't see the button

0 Kudos
Steve_Lionel
Honored Contributor III
3,147 Views

@Paul_Dent wrote:

How do you post code? I don't see the button


Click on the ... next to the quote button. A second row of controls will appear. You want this:

Steve_Lionel_0-1643415221117.png

Click that, select Fortran as the language, paste in the box.

0 Kudos
Paul_Dent
New Contributor I
3,152 Views

Here is a threaded sort-merge I got work in case it helps somebody else

The objective was to take an existing recursive sort merge routine and without changing it, make two 

threads execute one recursion each in parallel, so halving the run time (which it does)

The main puzzle was that CreateThread only allows passing one argument to the thread function

I got round that using a wrapper routine to place the addresses of the caller's arguments (only arrays need this)  in a named common block and passing the name of the common bock as the single allowed argument to another routine that unpacks them to put in the original routine's argument list To pack an array address, you use %LOC(A) to get the address of array A  into IaddressA then you put %VAL(IaddressA) into the call in place of A,. At first I thought the compiler would complain that %VAL was not he same type as the original argument, but I figure what the % does is to tell the compiler to ignore type mismatch, because it worked for real, complex, strings  etc

Cheers, Paul

 

 

cDEC$ FIXEDFORMLINESIZE:132

REAL V(30000)

INTEGER L(30000)

X=RAND(123)

X=RAND(-123)

N=30000

DO 2 I=1,N

V(I)=RAND(0)

2 CONTINUE

TIM=TIMEF()

CALL SORTMERGE(V,L,1,N,ISTART)                                                         ! THIS IS THE 2-THREAD VERSION.

TIM=TIMEF()

WRITE(*,*)"EXECUTION TIME FOR 2-THREADS=",TIM," SECONDS"

C PRINT OUT A FEW SORTED VALUES

NEXT=ISTART

DO 1 I=1,5

WRITE(*,*)V(NEXT)

NEXT=L(NEXT)

1 CONTINUE

TIM=TIMEF()

CALL SORT(V,L,1,N,ISTART)                                                                     ! WHICH RUNS TWO INSTANCES OF THIS N0N-THREADED VERSION

TIM=TIMEF()

WRITE(*,*)"EXECUTION TIME UNTHREADED=",TIM," SECONDS"

NEXT=ISTART

DO 3 I=1,5

WRITE(*,*)V(NEXT)

NEXT=L(NEXT)

3 CONTINUE

STOP

END

C*************************************************

SUBROUTINE SORTMERGE(V,L,IFIRST,N,ISTART)

USE DFLIB

USE DFWIN

type(T_SECURITY_ATTRIBUTES)Security

INTEGER THREADID1,THREADID2

EXTERNAL SORTWRAPPER

REAL V(*)

INTEGER L(*)

COMMON /SORTARGS/IADDRESSV,IADDRESSL,I1ST,ISTART1,ISTART2,NN

IARG1=-1

IARG2=-2

IADDRESSV=%LOC(V)

IADDRESSL=%LOC(L)

NN=N ! BECAUSE DUMMY ARGUMENT CAN'T BE IN COMMON

I1ST=IFIRST ! BECAUSE DUMMY ARGUMENT CAN'T BE IN COMMON

Security.nlength=0

IHANDLE1= CreateThread (Security,0,%LOC(SORTWRAPPER),%LOC(IARG1),0,%LOC(THREADID1))

IHANDLE2= CreateThread (Security,0,%LOC(SORTWRAPPER),%LOC(IARG2),0,%LOC(THREADID2))

C THE THREADS WILL SIGNAL WHEN THEY'RE DONE BY SETTING IARG1 and IARG2 TO +1

99 IF((IARG1.LT.0).OR.(IARG2.LT.0))GO TO 99

CALL MERGE(V,L,ISTART1,ISTART2,ISTART)

RETURN

END

C**************************************************************

C THIS ROUTINE FILLS THE LINKED LIST L(1:N-1) WITH THE INDICES

C OF THE NEXT LARGEST VALUE IN SEQUENCE, ISTART BEING THE FIRST

C AND LARGEST VALUE. L(N)=0 IS A STOP INDICATOR

C**************************************************************

RECURSIVE SUBROUTINE SORTWRAPPER(IARG)

COMMON /SORTARGS/IADDRESSV,IADDRESSL,IFIRST,ISTART1,ISTART2,N

IF(IARG.EQ.-1)CALL SORT(%VAL(IADDRESSV),%VAL(IADDRESSL),IFIRST,RSHIFT(N,1),ISTART1)

IF(IARG.EQ.-2)CALL SORT(%VAL(IADDRESSV),%VAL(IADDRESSL),IFIRST+RSHIFT(N,1),N-RSHIFT(N,1),ISTART2)

IARG=1

RETURN

END

C*****************************************************

C SET IFIRST TO 1 ON ENTRY

C RETURNS A LINKED LIST L OF SORTED V VALUES, HIGHEST TO LOWEST

C K=L(ISTART) IS THE INDEX OF THE LARGEST

C THEN L(K) IS THE NEXT LARGEST AND SO ON

C*****************************************************

RECURSIVE SUBROUTINE SORT(V,L,IFIRST,N,ISTART)

REAL V(*)

INTEGER L(*)

IF(N.EQ.1)THEN

ISTART=IFIRST

L(ISTART)=0

ELSE

CALL SORT(V,L,IFIRST,RSHIFT(N,1),ISTART1)

CALL SORT(V,L,IFIRST+RSHIFT(N,1),N-RSHIFT(N,1),ISTART2)

CALL MERGE(V,L,ISTART1,ISTART2,ISTART)

ENDIF

RETURN

END

C**************************************************************

C THIS ROUTINE MERGES TWO LINKED LISTS

C THE FIRST STARTS AT ISTART1 AND THE SECOND STARTS AT ISTART2

C THE OUTPUT IS THE START OF THE MERGED LIST ISTART AND THE LAST

C INDEX IN A LINKED LIST CHAIN IS 0 AS A STOP INDICATOR

C***************************************************************

RECURSIVE SUBROUTINE MERGE(V,L,NEXT1,NEXT2,ISTART)

DIMENSION V(*),L(*)

C WRITE(*,*)"IN MERGE"

IFLAG=1

D=V(NEXT1)-V(NEXT2)

IF(D.GT.0)CALL UPDINTS(L,ISTART,IFLAG,NEXT1,NEXT2,IFLAG)

IF(D.LE.0)CALL UPDINTS(L,ISTART,IFLAG,NEXT2,NEXT1,IFLAG)

IF(IFLAG.LT.0)RETURN

I=ISTART

99 D=V(NEXT1)-V(NEXT2)

IF(D.GT.0)CALL UPDINTS(L,L(I),I,NEXT1,NEXT2,IFLAG)

IF(D.LE.0)CALL UPDINTS(L,L(I),I,NEXT2,NEXT1,IFLAG)

IF(IFLAG.GE.0)GO TO 99

RETURN

END

C***********************************************

RECURSIVE SUBROUTINE UPDINTS(L,LI,I,NEXT1,NEXT2,IFLAG)

INTEGER L(*)

LI=NEXT1

IF(L(NEXT1).NE.0)THEN

NEXT1=L(NEXT1)

I=L(I)

ELSE

L(NEXT1)=NEXT2

IFLAG=-1

ENDIF

RETURN

END

C******************

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,123 Views

Paul,

Use of CreateThread can be disadvantages for two reasons: a) it is non-portable (Windows only), b) is specific two threads regardless of hardware available on the system.  A better practice is to use OpenMP. The following program is an updated version of your program, or should I say in the spirit of your original code:

!  Paul_Dent.f90 
module MySort
! no data

contains
! contained procedures
! Iswap - Swap two integers
subroutine Iswap(I1, I2)
    implicit none
    ! arguments
    integer :: I1, I2
    ! locals
    integer :: Itemp
    
    Itemp = I1
    I1 = I2
    I2 = Itemp
end subroutine Iswap

!**************************************************************
! MERGE - Merges two adjacent sections of indicies
! V = array of keys
! L = array of indicies
! N = size of arrays
! I1 = starting index of section 1
! I2 = starting index of section 2
!
! section 1 = L(I1:I2-1)
! section 2 = L(I2:MIN(I2+I2-I1-1,N))
! Notes: I2-I1 is the size of section 1
! the size of section 2 is MIN(I2-I1, N-I2+1) ! don't run off end of array
!***************************************************************

RECURSIVE SUBROUTINE MERGE(V,L,N,I1,I2)
    implicit none
    ! arguments
    integer, intent(in) :: N,I1,I2
    real :: V(N)
    integer :: L(N)
    ! local variables
    integer :: II1, II2,I1begin,I1end,I2end,M1,M1end
    ! *** Note, when N is very large, Ltemp should be allocatable
    ! *** for this sample program Ltemp can fit on stack
    integer :: Ltemp((I2-I1)+MIN(I2-I1,N-I2+1))
    
    ! save limits
    II1=I1
    II2 = I2
    I1begin = I1
    I1end = I2-1                ! adjacent sections
    I2end = MIN(I2+I2-I1-1,N)   ! MIN(base of section 2 + size of section 1 minus 1, N)
    M1end = size(Ltemp)
    ! merge sections of L into Ltemp
    do M1 = 1, M1end
        if(V(L(II1)) <= V(L(II2))) then
            Ltemp(M1) = L(II1)
            II1 = II1 + 1
            if(II1 > I1end) then
                if(M1 < M1end) Ltemp(M1+1:M1end) = L(II2:I2end)
                exit ! exit DO M1 loop
            endif
        else
            Ltemp(M1) = L(II2)
            II2 = II2 + 1
            if(II2 > I2end) then
                if(M1 < M1end) Ltemp(M1+1:M1end) = L(II1:I1end)
                exit ! exit DO M1 loop
            endif
        endif
    end do
    L(I1begin:I2end) = Ltemp ! move Ltemp back into section of L
END SUBROUTINE MERGE
    
!*****************************************************
! SORTserial - Produce a sorted list of indexes into an array of keys
! This subroutine will not create additional parallelization.
! However, it can be called from either a serial application, or
! or from a parallel region operating on a subsection of an array
! or on different arrays.
!*****************************************************

RECURSIVE SUBROUTINE SORTserial(V,L,N,Ibegin,Iend)
    implicit none
    integer, intent(in) :: N
    REAL :: V(N)
    INTEGER :: L(N)
    integer, intent(in) :: Ibegin,Iend
    ! local variables
    integer :: BatchSize    ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
    
    if(N==1) return
    BatchSize = Iend-Ibegin+1
    select case(BatchSize)
    case(1)
        return  ! nothing left to sort
    case(2)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        return
    case(3)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
        return
    end select
    ! all other cases
    BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
    call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize)
    call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
    call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTserial
    
!*****************************************************
! SORTparallelHandSplit - Produce a sorted list of indexes into an array of keys
! This subroutine uses OpenMP parallelization in a recursive manner.
! Initial slicing is performed by hand.
!*****************************************************

RECURSIVE SUBROUTINE SORTparallelHandSplit(V,L,N,Ibegin,Iend)
    use omp_lib
    implicit none
    integer, intent(in) :: N
    REAL :: V(N)
    INTEGER :: L(N)
    integer, intent(in) :: Ibegin,Iend
    ! local variables
    integer :: BatchSize    ! *** do not initialize here as with C/C++ (it makes it SAVE/static)
    integer :: nThreads, iThread, myIbegin, myIend
    
    if(N==1) return
    
    BatchSize = Iend-Ibegin+1
    select case(BatchSize)
    case(1)
        return  ! nothing left to sort
    case(2)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        return
    case(3)
        if(V(L(Ibegin)) > V(L(Iend))) call Iswap(L(Ibegin), L(Iend))
        if(V(L(Ibegin)) > V(L(IBegin+1))) call Iswap(L(Ibegin), L(IBegin+1))
        return
    end select
    if(BatchSize == N) then
        ! first call
        !$omp parallel shared(nThreads) private(iThread, myIbegin, myIend, BatchSize)
        nThreads = omp_get_num_threads()    ! all threads overstrike nThreads
        iThread = omp_get_thread_num()
        BatchSize = (N + nThreads - 1) / nThreads
        myIbegin = Ibegin + (BatchSize * iThread)
        myIend = myIbegin + BatchSize - 1
        call SORTserial(V, L, N, myIbegin, myIend)
        !$omp barrier
        ! now merge batches 
        do while(BatchSize < N)
            ! filter to even thread numbers
            if(iand(iThread, 1) == 0) then
                if(myIbegin + BatchSize < N) then
                    call MERGE(V, L, N, myIbegin, myIbegin + BatchSize)
                endif
            end if
            BatchSize = BatchSize * 2
            myIbegin = Ibegin + (BatchSize * iThread)
            !$omp barrier
        end do
        !$omp end parallel
        return
    endif
    
    ! Here in parallel region with something to split
    ! Note, no further parallelization available
    BatchSize = (BatchSize + 1) / 2 ! split the batch (round up)
    call SORTserial(V, L, N, Ibegin, Ibegin+BatchSize-1)
    call SORTserial(V, L, N, Ibegin+BatchSize, Iend)
    call MERGE(V, L, N, Ibegin, Ibegin+BatchSize)
END SUBROUTINE SORTparallelHandSplit
end module MySort

program Paul_Dent
    use MySort
    use IFPORT
    use omp_lib
    implicit none
    INTEGER, PARAMETER :: N = 10*1000*1000
    REAL, ALLOCATABLE :: V(:)    ! N keys
    INTEGER, ALLOCATABLE :: L(:) ! N indexes
    REAL :: X
    INTEGER :: I
    double precision :: T1, T2
    
    ALLOCATE(V(N), L(N))

    X=RAND(123)     ! reseed and discard
    X=RAND(-123)    ! reseed and discard
    
    DO I=1,N
        V(I)=RAND(0)    ! key is random number
    END DO
    
    ! Set index array to initial unsorted order
    DO I=1,N
        L(I) = I
    END DO
    
    T1 = omp_get_wtime()
    CALL SORTserial(V,L,N,1,N)                                                         ! THIS IS THE 2-THREAD VERSION.
    T2 = omp_get_wtime()

    WRITE(*,*)"EXECUTION TIME UNTHREADED=",T2-T1," SECONDS"
    ! verify
    DO I=1,N-1
        if(V(L(I)) > V(L(I+1))) then
            WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
            exit
        endif
    END DO

    
    ! Set index array to initial unsorted order
    DO I=1,N
        L(I) = I
    END DO
    
    T1 = omp_get_wtime()
    CALL SORTparallelHandSplit(V,L,N,1,N)                                                                     ! WHICH RUNS TWO INSTANCES OF THIS N0N-THREADED VERSION
    T2 = omp_get_wtime()

    WRITE(*,*)"EXECUTION TIME SORTparallelHandSplit=",T2-T1," SECONDS"

    ! verify
    DO I=1,N-1
        if(V(L(I)) > V(L(I+1))) then
            WRITE(*,*) 'Error ', I, V(L(I)), V(L(I+1))
            exit
        endif
    END DO

end program Paul_Dent

Notes:

In order to get meaningful timing results I increased the number of elements to 10 million.

I also made these arrays allocatable (large arrays should go on heap).

The parallel code will use as many hardware threads as are available to the program (even 1 thread).

 

Something for you to do is to read about generic interfaces and then add different key types (real(8), character, etc...).

 

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
3,117 Views

Note, the merge subroutine can be improved upon.

Jim Dempsey

0 Kudos
Paul_Dent
New Contributor I
3,045 Views

Thanks. I followed what you said and it worked and I was able to post some code (just a test) below

cDEC$ FIXEDFORMLINESIZE:132
C****************************************************************************
C TESTS A WAY TO WRITE MATH ROUTINES IN WHICH THREADING IS INVISBLE TO THE CALLER
C****************************************************************************
	REAL A(16384),B(16384)
	X=RAND(123)
	X=RAND(-123)
	N=16384
	A(1:N)=RAND(0)
	B(1:N)=RAND(0)
C measure time to peform 400 DOT PRODUCTS of 16384 VALUES
	TIM=TIMEF()
	DO 1 I=1,400
	X=DOT(A,B,N)             
    1 CONTINUE
	TIM=TIMEF()
	WRITE(*,*)TIM
c compare with the time to do it without threads
	TIM=TIMEF()
	DO 2 I=1,400
	X=RECURSIVEDOT(A,B,N)
    2 CONTINUE
	TIM=TIMEF()
	WRITE(*,*)TIM
	PAUSE
	STOP
	END

 

0 Kudos
Reply