Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

CVF RANDOM_NUMBER converges to 1

bradlepc
Beginner
2,280 Views
We're seeing strange behavior using CVF 6.6C in which the RANDOM_NUMBER function eventually converges to a number very close to 1. The problem appears to go away if RANDOM_SEED is called at the beginning of the program. However, I find nothing in the f90 standard that says this is a requirement, so I think the "fix" may just be fortuitous. Google turns up a few other mentions of this behavior, but no answers. Comments?

Pete
0 Kudos
24 Replies
Steven_L_Intel1
Employee
1,901 Views
Converges? You mean that you keep getting results that are very close to 1? Got an example that shows that? I've done a lot of testing of RANDOM_NUMBER and never saw this behavior.

We did have a problem for a while where RANDOM_NUMBER for single precision could actually return 1.0, which it is not supposed to. That was fixed many years ago.

I just wrote a test that repeatedly got groups of 50 random numbers and averaged them. If the values started to converge anywhere, you'd see the average converge. It doesn't. After 1000 groups, I saw:

0.4832707
0.5720206
0.4919030
0.4036785
0.4750714
0.4750229
0.5306970
0.5269253
0.4471116
0.5424646
0.5157450
0.5117723

This was with CVF 6.6C.
0 Kudos
bradlepc
Beginner
1,901 Views
Steve,

Thanks. I should have been clearer that "Converges" was meant in a "feels like" sense rather than a strict mathematical sense. This is a monster message-passing parallel program. The code is not obviously broken. Different parallel subtasks lock in on the value at different times. One goes to it right away, most others after hundreds of thousands of calls to RANDOM_NUMBER. Once it hits the value printed out as 9.99999999534336714E-01, the value never changes.

Pete
0 Kudos
Steven_L_Intel1
Employee
1,901 Views
Parallel, eh? You're calling RANDOM_NUMBER from different threads? Have you specified the threaded libraries?

From what I've seen of the routine code, I don't see how such a convergence is possible and I don't recall anyone reporting this issue to us before. I'd be very interested in seeing a reproducer.
0 Kudos
bradlepc
Beginner
1,901 Views
Message passing parallel (MPI). Threaded libraries are specified, but the program is not explicitly threaded.

If there's not something in the usage that I was missing, I'll go back and see if we can run this down. We can add some code to pul the seed and see if it's changing. If it's not, I'll try feeding the seed to a simple test program and see what it does. It's hard for me to imagine that the generator is flawed though. I think it's more likely that the heap that the generator is using gets stomped somehow.

Stay tuned,

Pete
0 Kudos
bradlepc
Beginner
1,901 Views
It turns out that just before the random number function locks up, RANDOM_SEED(GET) shows that the seed values magically become zero. It's not quite as simple as that though, since setting the seeds to 0 does not result in the random number that we see.

I note that the person I googled who had a similar problem fixed it by declaring his routine as RECURSIVE. All of these clues suggest heap corruption as the likely cause. We'll keep working it. Any additional thoughts welcome.
0 Kudos
Steven_L_Intel1
Employee
1,902 Views
Do you get the same problem if you build against the DLL form of the run-time libraries? It's probably not heap corruption, but certainly there is data corruption going on.
0 Kudos
bradlepc
Beginner
1,902 Views
We're building with /threads and /libs:dll, so I believe we're getting the threaded DLL version of the libraries.

An interesting fact is that the seed goes to zero in a totally unrelated call (MPI_INIT is one).

In the random number implementation of 6.6C, where does the seed get stored? Is it just a static variable or is it allocated? I suspect from our tests that calling RANDOM_SEED results in the seed going to a different virtual address. Is this true?

Pete
0 Kudos
bradlepc
Beginner
1,902 Views
I meant to say "corruption in the heap".

Pete
0 Kudos
Steven_L_Intel1
Employee
1,902 Views
From my reading of the sources, it is in "thread-local storage". This would be static for the non-threaded library and in some reserved part of the thread stack for the threaded library.
0 Kudos
bradlepc
Beginner
1,902 Views
This one continues to be a head-scratcher. It turns out 8.1 give identical behavior to 6.6C, which is a pretty strong strike against a random piece of memory being stepped on.

We also duplicated the "seed changes" behavior with a small test program. Both the test program and the real thing show the seed changing during a call to MPI_Init. MPICH is used by thousands, and it's written in C, so it's hard to imagine a way in which it could have a bug that so consistently trashes the random_number generator.

Based on the other thing we googled, we tested linking with different libraries. It turns out that the problem only occurs when we do the final link with /libs:dll /threads. Take either one away and the behavior is normal.

I'm certainly not ruling out an MPI_Init bug. Any thoughts on why turning off one of the library flags would change the random number generator? Is the random number generator in 8.1 the same as the one in CVF?

Pete
0 Kudos
Steven_L_Intel1
Employee
1,902 Views
Yes, the random number generator is the same. When you're using threads, the seed is kept in thread-local storage so it's possible that something is overwriting this in the MPI application.

If you do have a test program that demonstrates this in Intel Fortran (9.0, preferably), please submit it to us at Intel Premier Support and we'll take a look.
0 Kudos
bradlepc
Beginner
1,902 Views
Steve,

This one continues to be a problem for us. As it turns out, it is easily duplicated with a test program, using IVF.

The attached program and library are all you need. I tried to go to Premier but it kept giving me "page not found" at login.

Compile commands:

ifort /compile_only /threads /libs:dll mpi_test.F

ifort mpi_test.obj /link /out:"mpi_test.exe" mpich.lib


The random numbers are ok if we don't throw /libs:dll, but that's not an option in the real program.

Any thoughts?

Pete
0 Kudos
Steven_L_Intel1
Employee
1,902 Views
Well, I can't get the program to run on my system - it just hangs at the call to MPI_INIT. But...

You are using list-directed output to display the values. This will round the value to some number of decimal digits. You should also add a:

write (6,'(Z8.8)') rand_num

after the list-directed write to see what the binary representation is. My guess is that it is not exactly 1.0.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,902 Views

Pete,

One potential area of a problem regarding multiple threads is assumptions made by the programmer from experiences learned from prior programming practices. (caught me with my 35+ years of experience).

If your program contains say a subroutine or function that declares a local storage array e.g.

subroutine foo

real :: mine(3)

Then old programmers, or younger C++ programmers may tend to assume the array "mine(3)" is on the call stack. This is not necessarily so. In FORTRAN it is an implimentation issue as to if mine is on the stack or if mine is in a subroutine private static storage location. If in static storage then your multiple threads will be using the same location when your code may assume seperate locations.

The fix for this is to declare the variable to be local stack storage

real, automatic :: mine(3)

Declaring the routine reenterant may or may not fix the problem depending on other issues.

Jim Dempsey

0 Kudos
bradlepc
Beginner
1,902 Views
I recognize it's not exactly 1. The problem is that it's the same over and over. Interesting that it hangs for you. I haven't seen that. This program is a nuisance to share because it really wants to run in parallel under MPICH.

Pete
0 Kudos
bradlepc
Beginner
1,902 Views
The program is compiled with /threads, but it is not explicitly threaded. We consistently compile for multithreaded DLL to keep all of our many codes consistent. Only one thread of execution (as far as I know) is updating the random seed.

Pete
0 Kudos
Steven_L_Intel1
Employee
1,902 Views
I was able to run this on a different system. What I see is that the call to MPI_INIT causes the seed values to revert to the default. Why this should be, I don't know. (I don't get any values even close to 1.0, though.)

For my amusement I tried adding a call to RANDOM_SEED (with no arguments) at the beginning. This sets the seed to something based on the system clock (behavior not specified by standard, but widely used.) As soon as the MPI_INIT call was done, it reverted back to what it would have been without the added call to RANDOM_SEED.

Assuming you call MPI_INIT only once in the program, it's unclear to me why this is really a problem.

I'll ask the developers if they understand the behavior.
0 Kudos
Steven_L_Intel1
Employee
1,902 Views
I can reproduce some odd behavior on another system. If I take your program and build it, as is, I see that the call to MPI_INIT resets the random seed back to its initial value. If I add a commented-out line to the source and rebuild, that effect of MPI_INIT goes away. If I remove all my changes, the effect stays away. If I restore your copy of the source file, it comes back. No, I haven't been drinking....

Gotta go play with this some more...
0 Kudos
bradlepc
Beginner
1,902 Views
Welcome to our pain! Sounds like good progress.

Pete
0 Kudos
Steven_L_Intel1
Employee
1,799 Views
By the way, none of the numbers I get are even close to 1.0.
0 Kudos
Reply