- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pete
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We did have a problem for a while where RANDOM_NUMBER for single precision could actually return 1.0, which it is not supposed to. That was fixed many years ago.
I just wrote a test that repeatedly got groups of 50 random numbers and averaged them. If the values started to converge anywhere, you'd see the average converge. It doesn't. After 1000 groups, I saw:
0.4832707
0.5720206
0.4919030
0.4036785
0.4750714
0.4750229
0.5306970
0.5269253
0.4471116
0.5424646
0.5157450
0.5117723
This was with CVF 6.6C.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I should have been clearer that "Converges" was meant in a "feels like" sense rather than a strict mathematical sense. This is a monster message-passing parallel program. The code is not obviously broken. Different parallel subtasks lock in on the value at different times. One goes to it right away, most others after hundreds of thousands of calls to RANDOM_NUMBER. Once it hits the value printed out as 9.99999999534336714E-01, the value never changes.
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From what I've seen of the routine code, I don't see how such a convergence is possible and I don't recall anyone reporting this issue to us before. I'd be very interested in seeing a reproducer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If there's not something in the usage that I was missing, I'll go back and see if we can run this down. We can add some code to pul the seed and see if it's changing. If it's not, I'll try feeding the seed to a simple test program and see what it does. It's hard for me to imagine that the generator is flawed though. I think it's more likely that the heap that the generator is using gets stomped somehow.
Stay tuned,
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I note that the person I googled who had a similar problem fixed it by declaring his routine as RECURSIVE. All of these clues suggest heap corruption as the likely cause. We'll keep working it. Any additional thoughts welcome.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An interesting fact is that the seed goes to zero in a totally unrelated call (MPI_INIT is one).
In the random number implementation of 6.6C, where does the seed get stored? Is it just a static variable or is it allocated? I suspect from our tests that calling RANDOM_SEED results in the seed going to a different virtual address. Is this true?
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We also duplicated the "seed changes" behavior with a small test program. Both the test program and the real thing show the seed changing during a call to MPI_Init. MPICH is used by thousands, and it's written in C, so it's hard to imagine a way in which it could have a bug that so consistently trashes the random_number generator.
Based on the other thing we googled, we tested linking with different libraries. It turns out that the problem only occurs when we do the final link with /libs:dll /threads. Take either one away and the behavior is normal.
I'm certainly not ruling out an MPI_Init bug. Any thoughts on why turning off one of the library flags would change the random number generator? Is the random number generator in 8.1 the same as the one in CVF?
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you do have a test program that demonstrates this in Intel Fortran (9.0, preferably), please submit it to us at Intel Premier Support and we'll take a look.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This one continues to be a problem for us. As it turns out, it is easily duplicated with a test program, using IVF.
The attached program and library are all you need. I tried to go to Premier but it kept giving me "page not found" at login.
Compile commands:
ifort /compile_only /threads /libs:dll mpi_test.F
ifort mpi_test.obj /link /out:"mpi_test.exe" mpich.lib
The random numbers are ok if we don't throw /libs:dll, but that's not an option in the real program.
Any thoughts?
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are using list-directed output to display the values. This will round the value to some number of decimal digits. You should also add a:
write (6,'(Z8.8)') rand_num
after the list-directed write to see what the binary representation is. My guess is that it is not exactly 1.0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pete,
One potential area of a problem regarding multiple threads is assumptions made by the programmer from experiences learned from prior programming practices. (caught me with my 35+ years of experience).
If your program contains say a subroutine or function that declares a local storage array e.g.
subroutine foo
real :: mine(3)
Then old programmers, or younger C++ programmers may tend to assume the array "mine(3)" is on the call stack. This is not necessarily so. In FORTRAN it is an implimentation issue as to if mine is on the stack or if mine is in a subroutine private static storage location. If in static storage then your multiple threads will be using the same location when your code may assume seperate locations.
The fix for this is to declare the variable to be local stack storage
real, automatic :: mine(3)
Declaring the routine reenterant may or may not fix the problem depending on other issues.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For my amusement I tried adding a call to RANDOM_SEED (with no arguments) at the beginning. This sets the seed to something based on the system clock (behavior not specified by standard, but widely used.) As soon as the MPI_INIT call was done, it reverted back to what it would have been without the added call to RANDOM_SEED.
Assuming you call MPI_INIT only once in the program, it's unclear to me why this is really a problem.
I'll ask the developers if they understand the behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gotta go play with this some more...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pete
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
