- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi to everyone,
I am working on a Monte Carlo code, written in Fortran77 in order to make it parallel using OpenMP. Now I am in the testing phase of the development process, but I am facing problems with the overhead costs of the code. For example, when I analice it using Vtune Amplifier XE I obtain the following summary:
Elapsed Time: 43.352s
Total Thread Count: 5
Overhead Time: 16.560s
Spin Time: 0.847s
CPU Time: 157.369s
Paused Time: 0s
Well, the systems complains that the overhead time is too much. What is worst is that I have tested this code against gfortran and this effects are less pronunciated using the later. This is sad, because without parallelization the code compiled with ifort is much faster than gfortran, but as I increase the number of OMP threads (maintaining the load per thread constant) the overheads costs render the ifort version slower than the gfortran one.
What I have found is that the threads get "stalled" in a very disorder fashion, for example, you can see this in the image bellow
The code has several subroutines that controls all the Monte Carlo simulation process (for example, random number generation, electron and photon transport, geometry description, etc). This subroutines communicate each other using COMMON blocks, therefore I have had to flag some of them as private using the THREADPRIVATE statement when needed. The idea is to maintain the original structure of the code as much as possible, considering that this is a wide used code and the idea is to offer an easy transition to parallelization with OpenMP without changing the core of the program.
I have created a small code that runs only the random number generator and use them to estimate the value of PI. This code has also the same problem as the original code. In this last one what have I found is that the function _kmp_get_global_thread_id_reg has a great part of the overhead time:
Well, I would really appreciate if someone has a tip to face this problem. I have tried to search info about this problem without success. Thanks for your help!!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, thanks for the tips... I have been studying the code and I found the following situation:
Looking at the Amplifier analysis I found that most of the overhead costs come from two subroutines: UPHI and MSCAT, and (very) far away comes the random generator subroutine RANMAR_GET, the concurrency test is the following:
Inside UPHI there were some WRITE sentences to a log file, so I removed them. After that I obtained the following:
so it seems that in the case of UPHI that was the problem, unfortunately for MSCAT and RANMAR_GET I was not able to find a solution. However, using a MACRO one can disable the MSCAT subroutine (the body of the subroutine is replaced with a return, basically), so I did obtaining the following:
Well, I have not checked in detail the other subroutines, but all of them call RANMAR_GET. I really suspect that that subroutine is the source of the overhead of the program, but when I look at the code I am really not able to find an obvious reason for that. The RANMAR_GET code is:
subroutine ranmar_get implicit none common/randomm/ rng_array(128), urndm(97), crndm, cdrndm, cmrndm, *i4opt, ixx, jxx, fool_optimizer, twom24, rng_seed C$OMP0THREADPRIVATE(/randomm/) integer*4 urndm, crndm, cdrndm, cmrndm, i4opt, ixx, jxx, fool_opti *mizer,rng_seed,rng_array real*4 twom24 integer*4 i,iopt DO 2591 i=1,128 iopt = urndm(ixx) - urndm(jxx) IF((iopt .LT. 0))iopt = iopt + 16777216 urndm(ixx) = iopt ixx = ixx - 1 jxx = jxx - 1 IF ((ixx .EQ. 0)) THEN ixx = 97 ELSE IF(( jxx .EQ. 0 )) THEN jxx = 97 END IF crndm = crndm - cdrndm IF((crndm .LT. 0))crndm = crndm + cmrndm iopt = iopt - crndm IF((iopt .LT. 0))iopt = iopt + 16777216 rng_array(i) = iopt 2591 CONTINUE 2592 CONTINUE rng_seed = 1 return end
what could cause the overhead?, the if sentences?, or it is something related with poor cache localization. Is there a way to detect that using the Intel dev software (like Amplifier)?. Unfortunately Amplifier does not point to a specific piece of code, just the call to RANMAR_GET or the subroutine declaration... thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, Amplifier will show you the code - even down to the instruction. Double-click on the subroutine to drill down.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Will ixx ever equal jxx? (the code above requires the answer to be no)
Does ixx and jxx always have the same circular reference offset?
Consider replacing (in both places)
iopt = urndm(ixx) - urndm(jxx) IF((iopt .LT. 0))iopt = iopt + 16777216 with iopt = IAND(urndm(ixx) - urndm(jxx), 16777215) ! note end digit is 5, making mask or iopt = IAND(urndm(ixx) - urndm(jxx), 'FFFFFF'Z)
Also, if cmrdm is also a power of 2, replace the if test and add with an IAND as above.
Depending on how smart the compiler optimization is, it may be more efficient to use
ixx = ixx - 1 jxx = jxx - 1 IF(( ixx .EQ. 0 )) ixx = 97 IF(( jxx .EQ. 0 )) jxx = 97
The reason being is the recent instruction sets have conditional move instructions (thus eliminating branch instructions).
If all of the above hints apply, then the DO 2591 loop will have no branches
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could the performance problem be related to the use of C$OMP0THREADPRIVATE(/randomm/)
You could try something like below and manage the private variables in a different way, say as arrays in the call, based on the thread number:
subroutine test_ranmar integer*4, parameter :: nt = 7 ! max thread number integer*4 rng_array(128,0:nt), urndm(97,0:nt), crndm(0:nt), * cdrndm, cmrndm, * ixx(0:nt), jxx(0:nt), rng_seed(0:nt), it c it = 0 C$ it = omp_get_thread_num () call ranmar_get (rng_array(:,it), urndm(:,it), crndm(it), * cdrndm, cmrndm, * ixx(it), jxx(it), rng_seed(it) ) end subroutine ranmar_get (rng_array, urndm, crndm, cdrndm, cmrndm, * ixx, jxx, rng_seed) implicit none integer*4 rng_array(128), urndm(97), crndm, cdrndm, cmrndm, * ixx, jxx, rng_seed integer*4 i,iopt C DO 2591 i=1,128 iopt = urndm(ixx) - urndm(jxx) IF (iopt < 0) iopt = iopt + 16777216 urndm(ixx) = iopt ixx = ixx - 1 IF (ixx == 0) ixx = 97 jxx = jxx - 1 IF (jxx == 0) jxx = 97 crndm = crndm - cdrndm IF (crndm < 0) crndm = crndm + cmrndm iopt = iopt - crndm IF (iopt < 0) iopt = iopt + 16777216 rng_array(i) = iopt 2591 CONTINUE rng_seed = 1 return end
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Steve,
My problem is when I double click on the subroutine (e.g. RANMAR_GET) Amplifier opens a tab with the source code, but it shows me the entry point of the subroutine (i.e. the subroutine RANMAR_GET statement). If I double click this line the source code is opened in VS, so I have no clue of which specific line (or lines) of code is causing problems.
@Jim
I modified my code accordingly to your suggestions and I obtain some improvement. The code looks as follows:
subroutine ranmar_get implicit none common/randomm/ rng_array(128), urndm(97), crndm, cdrndm, cmrndm, *i4opt, ixx, jxx, fool_optimizer, twom24, rng_seed C$OMP0THREADPRIVATE(/randomm/) integer*4 urndm, crndm, cdrndm, cmrndm, i4opt, ixx, jxx, fool_opti *mizer,rng_seed,rng_array real*4 twom24 integer*4 i,iopt C ED: RANMAR_GET modification following Jim's suggestions IF((rng_seed .EQ. 999999))call init_ranmar DO 2591 i=1,128 C iopt = urndm(ixx) - urndm(jxx) C IF((iopt .LT. 0))iopt = iopt + 16777216 iopt = IAND(urndm(ixx) - urndm(jxx), 'FFFFFF'Z) CCC urndm(ixx) = iopt ixx = ixx - 1 jxx = jxx - 1 IF ((ixx .EQ. 0)) ixx = 97 IF(( jxx .EQ. 0 )) jxx = 97 CCC C crndm = crndm - cdrndm C IF((crndm .LT. 0))crndm = crndm + cmrndm crndm = IAND(crndm - cdrndm, cmrndm-1 CCC C iopt = iopt - crndm C IF((iopt .LT. 0))iopt = iopt + 16777216 iopt = IAND(iopt - crndm, 'FFFFFF'Z) CCC rng_array(i) = iopt 2591 CONTINUE 2592 CONTINUE rng_seed = 1 return end
Looking at the code I realized that the first IF sentence never is true (rng_seed .EQ. 999999), therefore I decided to remove it. Unfortunately the result is disastrous:
so, what happened??? now UPHI again gives problem and appeared SSCAT (scattering related function).... but at least RANMAR_GET disappeared xD...
Well, I have some idea, How do affect GO TO statements to OpenMP?. For example, the UPHI subroutine is full of these, here the code:
SUBROUTINE UPHI(IENTRY,LVL) ! Copyright National Research Council of Canada, 2000. ! All rights reserved. implicit none COMMON/EPCONT/EDEP,TSTEP,TUSTEP,USTEP,TVSTEP,VSTEP, RHOF,EOLD,ENEW *,EKE,ELKE,GLE,E_RANGE, x_final,y_final,z_final, u_final,v_final,w_ *final, IDISC,IROLD,IRNEW,IAUSFL(31) C$OMP0THREADPRIVATE(/EPCONT/) DOUBLE PRECISION EDEP real*8 TSTEP, TUSTEP, USTEP, VSTEP, TVSTEP, RHOF, EOLD, ENE *W, EKE, ELKE, GLE, E_RANGE, x_final,y_final,z_final, u_final, *v_final,w_final integer*4 IDISC, IROLD, IRNEW, IAUSFL COMMON/STACK/ E(50),X(50),Y(50),Z(50),U(50),V(50),W(50),DNEAR(50), *WT(50),IQ(50),IR(50),LATCH(50), LATCHI,NP,NPold C$OMP0THREADPRIVATE(/STACK/) DOUBLE PRECISION E real*8 X,Y,Z, U,V,W, DNEAR, WT integer*4 IQ, IR, LATCH, LATCHI, NP, NPold COMMON/UPHIIN/SINC0,SINC1,SIN0(1002),SIN1(1002) real*8 SINC0,SINC1,SIN0,SIN1 COMMON/UPHIOT/THETA,SINTHE,COSTHE,SINPHI, COSPHI,PI,TWOPI,PI5D2 C$OMP0THREADPRIVATE(/UPHIOT/) real*8 THETA, SINTHE, COSTHE, SINPHI, COSPHI, PI,TWOPI,PI5D2 common/randomm/ rng_array(128), urndm(97), crndm, cdrndm, cmrndm, *i4opt, ixx, jxx, fool_optimizer, twom24, rng_seed C$OMP0THREADPRIVATE(/randomm/) integer*4 urndm, crndm, cdrndm, cmrndm, i4opt, ixx, jxx, fool_opti *mizer,rng_seed,rng_array real*4 twom24 common /egs_io/ file_extensions(20), file_units(20), user_code, i *nput_file, output_file, pegs_file, hen_house, egs_home, work_d *ir, host_name, n_parallel, i_parallel, first_parallel, n_max_p *arallel, n_chunk, n_files, i_input, i_log, i_incoh, i_nist_dat *a, i_mscat, i_photo_cs, i_photo_relax, xsec_out, is_batch character input_file*256, output_file*256, pegs_file*256, file_ext *ensions*10, hen_house*128, egs_home*128, work_dir*128, user_code*6 *4, host_name*64 integer*4 n_parallel, i_parallel, first_parallel,n_max_parallel, n *_chunk, file_units, n_files,i_input,i_log,i_incoh, i_nist_data,i_m *scat,i_photo_cs,i_photo_relax, xsec_out logical is_batch integer IENTRY,LVL real*8 CTHET, RNNO38, PHI, CPHI, A,B,C, SINPS2, SINPSI, US, *VS, SINDEL,COSDEL integer*4 IARG, LPHI,LTHETA,LCTHET,LCPHI real*8 xphi,xphi2,yphi,yphi2,rhophi2 save CTHET,PHI,CPHI,A,B,C,SINPS2,SINPSI,US,VS,SINDEL,COSDEL C$OMP0THREADPRIVATE(CTHET,PHI,CPHI,A,B,C,SINPS2) C$OMP0THREADPRIVATE(SINPSI,US,VS,SINDEL,COSDEL) IARG=21 IF ((IAUSFL(IARG+1).NE.0)) THEN CALL AUSGAB(IARG) END IF GO TO (6740,6750,6760),IENTRY GO TO 6770 6740 CONTINUE SINTHE=sin(THETA) CTHET=PI5D2-THETA COSTHE=sin(CTHET) 6750 CONTINUE 6781 CONTINUE IF((rng_seed .GT. 128))call ranmar_get xphi = rng_array(rng_seed)*twom24 rng_seed = rng_seed + 1 xphi = 2*xphi - 1 xphi2 = xphi*xphi IF((rng_seed .GT. 128))call ranmar_get yphi = rng_array(rng_seed)*twom24 rng_seed = rng_seed + 1 yphi2 = yphi*yphi rhophi2 = xphi2 + yphi2 IF(rhophi2.LE.1)GO TO6782 GO TO 6781 6782 CONTINUE rhophi2 = 1/rhophi2 cosphi = (xphi2 - yphi2)*rhophi2 sinphi = 2*xphi*yphi*rhophi2 6760 GO TO (6790,6800,6810),LVL GO TO 6770 6790 A=U(NP) B=V(NP) C=W(NP) GO TO 6820 6810 A=U(NP-1) B=V(NP-1) C=W(NP-1) 6800 X(NP)=X(NP-1) Y(NP)=Y(NP-1) Z(NP)=Z(NP-1) IR(NP)=IR(NP-1) WT(NP)=WT(NP-1) DNEAR(NP)=DNEAR(NP-1) LATCH(NP)=LATCH(NP-1) 6820 SINPS2=A*A+B*B IF ((SINPS2.LT.1.0E-20)) THEN U(NP)=SINTHE*COSPHI V(NP)=SINTHE*SINPHI W(NP)=C*COSTHE ELSE SINPSI=SQRT(SINPS2) US=SINTHE*COSPHI VS=SINTHE*SINPHI SINDEL=B/SINPSI COSDEL=A/SINPSI U(NP)=C*COSDEL*US-SINDEL*VS+A*COSTHE V(NP)=C*SINDEL*US+COSDEL*VS+B*COSTHE W(NP)=-SINPSI*US+C*COSTHE END IF IARG=22 IF ((IAUSFL(IARG+1).NE.0)) THEN CALL AUSGAB(IARG) END IF RETURN 6770 END
MSCAT and SSCAT also has some GO TO calls. Maybe this is the source of the problem?. As I stated before, Amplifier does not show any specific line as the source of the overhead, always shows the subroutine declaration, nothing more.
@John
Jim pointed above that THREADPRIVATE should have a low overhead cost, and doing some testing I found the same, so I do not think that that is the problem.
Thanks all for your help!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Goto is no problem for openmp as long as it doesn't jump into or out of mop constructs. Saved variables and lack of 'recursive procedure declaration are big red flags.
egregious spaghetti code will at least require you to use asm view
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Looking at the code I realized that the first IF sentence never is true (rng_seed .EQ. 999999), therefore I decided to remove it. Unfortunately the result is disastrous
rng_seed is thread private... and should have been initialized to 999999 at program start.
Either that or call init_ranmar
PROGRAM foo
...
! first parallel region
!$OMP PARALLEL
call init_ranmar() ! or rng_seed = 999999
!$OMP END PARALLEL
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please use:
module mod_ThreadPrivate COMMON/EPCONT/EDEP,TSTEP,TUSTEP,USTEP,TVSTEP,VSTEP, RHOF,EOLD,ENEW & &,EKE,ELKE,GLE,E_RANGE, x_final,y_final,z_final, u_final,v_final, & &w_final, IDISC,IROLD,IRNEW,IAUSFL(31) !$OMP THREADPRIVATE(/EPCONT/) end module mod_ThreadPrivate ... SUBROUTINE UPHI(IENTRY,LVL) ! Copyright.. ! All ... use mod_ThreadPrivate implicit none
This avoids the issue of keeping all declarations of COMMON/EPCONT/ the same
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Jim,
the initialization of the ranmar rng is done outside ranmar_get at the beggining of the parallel region (in post #20 above is the piece of code), for that reason the IF sentence should never be entered.
I did not get the module idea, do I put the COMMON declaration inside the module, and then I declare just the needed variables outside?
@Tim
All the conflictive subroutine have SAVE variables... but I suppose that I would need a deeper study of the code to be able to modify them... well that is the result of more than 30 years of history of this MC code... hehehe. By the way, what do you mean with "lack of 'recursive procedure declaration"?
Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the sample above, you would remove the COMMON/EPCONT/... from your subroutine UPHI and every other place that has COMMON/EPCONT/...
Instead you would have the use mod_ThreadPrivate (or whatever you want to name it).
By default, all the variables contained in the use'd modules are available to the program unit with the USE statement.
As you have now, one subroutine could have:
COMMON/EPCONT/EDEP,TSTEP,TUSTEP,USTEP,TVSTEP,VSTEP, RHOF,EOLD,ENEW
...
While a different one could have:
COMMON/EPCONT/EDEP,TSTEP,TUSTEP,USTEP,TVSTEP,VSTEP, NewSTEP, RHOF,EOLD,ENEW
...
Making code maintenance an issue
The module construct eliminates the maintenance issue.
*** You would have an issue doing this if different routines correctly used different variable names and/or types in the same named common.
Often in old programs you may see
COMMON/TEMPORARIES/ X,Y,Z
in one place and
COMMON/TEMPORARIES/ I,R,K
in a different place, all working well even different variable types in the same storage.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Recursive declaration avoids dependence on compile options to avoid extra default SAVE variables which don't act as private.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim Prince wrote:
Recursive declaration avoids dependence on compile options to avoid extra default SAVE variables which don't act as private.
Hi Tim, would it be possible for you to show an example of that?, I really do not get your idea and I have not be able to find additional info, only about the RECURSIVE statement on subroutine or function declarations, nothing about variables... Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
EnDoemer.
subroutine foo
real :: position(3) ! local scoped array containing the position X,Y,Z
real :: temp ! local scoped scalar
In the above subroutine, without recursive attribute, and without the compiler options equivalently provide recursive-like attribute, the array position is implicitly SAVE, whereas the scalar temp is on stack. Meaning there is one copy of position shared by all threads calling the subroutine. If this subroutine were to be called concurrently by multiple threads without one of the attributes or options that require it be on stack, then there would be an error in the program.
The cure is one of:
recursive subroutine foo
(add Intel specific)
real, automatic :: position(3)
ifort /recursive /c foo.f90
ifort /auto /c foo.f90
ifort /Qopenmp /c foo.f90
Note, if you use or build a library, either static or dynamic, and if it were not built with /auto or /Qopenmp or /recursive or use of recursive on subroutine and function statements, then you must not link this library into a multi-threaded program as this code is not multi-thread safe. This is true even if there is no multi-threaded statements within the library.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree also with concerns Jim expressed about common. I'm having difficulty commenting on tablet browser.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel mentioned on a different thread that the next Fortran standard (2015?) that all functions and subroutines are implicitly recursive. Meaning, locally scoped arrays default to automatic... And thus may require you to explicitly use SAVE when you require save attribute (as opposed to currently not knowing if the arrays are SAVE or AUTOMATIC).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
>> Saved variables and lack of 'recursive procedure declaration are big red flags.
They are not big red flags, they simply do not work.
I don't understand why EnDoemer is persisting with common variables inside the !$OMP region if he wants any of them to be private. Either they should be shared or their final interaction should be defined. I would expect they would need some form of !$OMP REDUCTION(operation : variable) for this to work effectively.
I would recommend to review the code structure and remove the PRIVATE use of COMMON variables from within the !$OMP region.
Jim, I thought for many versions, the Fortran standard has implied that local variables are not static, but automatic or dynamic. Again, if a private variable is given the SAVE attribute, it's interaction and update between threads should be explicitly defined. Their value on entry, and what value is adopted on exit from the parallel region should all be defined. After all, why are they private, unless they take different values between threads inside the parallel region. This interaction should not be implicitly managed.
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John
It is not that I "insist" in using COMMON variables, the original code was structured in that way and my idea, as a first step, is to keep the code closer to the original one. This platform (Electron Gamma Shower - EGS) have been used since several decades, and it is not the idea to do a radical change to it, at least as a first approach. It is clear for me that sooner or latter I should start to do major modifications to the code to improve further the performance of the code.
The main problem of eliminating the use of COMMON blocks that are private to each thread is that the structure of this platform uses them to
communicate data between the different subroutines (i.e. you have almost no subroutines with arguments). So it will imply a lot of
modifications to start to handle the private variables as arguments for the different subroutines. I finally I would like to do that (for example, to be able to use reduction clauses for the scoring variables, at this moment I have to use a CRITICAL section to add the results of each thread), but it will take some time.
Thanks for your comments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John,
Up until the next Fortran spec, scalars default to stack, others default to SAVE/static
From IVF:
By default, the compiler allocates local scalar variables on the stack. Other non-allocatable variables of non-recursive subprograms are allocated in static storage by default. This default can be changed through compiler options. Appropriate use of the SAVE attribute may be required if your program assumes that local variables retain their definition across subprogram calls.
To quote Dirty Harry: Do you feel lucky, ...
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Openmp has support for thread private common blocks . It's not trivial to figure out, certainly beyond advice possible with limited view given here
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim Prince wrote:
Openmp has support for thread private common blocks . It's not trivial to figure out, certainly beyond advice possible with limited view given here
Well, I use the C$OMP0THREADPRIVATE clause for the COMMON blocks that are private to each thread, do you mean that?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
C$OMP0THREADPRIVATE looks to me as a way of ignoring important code design for OpenMP. In complex code, it is not easy to reproduce the validation process that went into the original single thread code. Importantly, which thread will define the exit values for the common variables.
There are many areas where OpenMP should be used carefully. I am yet to understand how to apply OpenMP to monte-carlo simulation approaches, as although there may be thread safe random number generators, will the different threads behave independently or will there be some correlation between the threads using an in-sequence pseudo random number generator.
These areas affect the usefulness of results obtained from OpenMP, especially when applied to complex calculations that can not be easily tested. Having now understood how to apply OpenMP to a skyline equation solver, my next challenge will be to see how to apply the process to an iterative Eigen solver, where these issues may need to be better understood.
John

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page