Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29253 Discussions

CVF RANDOM_NUMBER converges to 1

bradlepc
Débutant
2 339 Visites
We're seeing strange behavior using CVF 6.6C in which the RANDOM_NUMBER function eventually converges to a number very close to 1. The problem appears to go away if RANDOM_SEED is called at the beginning of the program. However, I find nothing in the f90 standard that says this is a requirement, so I think the "fix" may just be fortuitous. Google turns up a few other mentions of this behavior, but no answers. Comments?

Pete
0 Compliments
24 Réponses
Steven_L_Intel1
Employé
1 948 Visites
Converges? You mean that you keep getting results that are very close to 1? Got an example that shows that? I've done a lot of testing of RANDOM_NUMBER and never saw this behavior.

We did have a problem for a while where RANDOM_NUMBER for single precision could actually return 1.0, which it is not supposed to. That was fixed many years ago.

I just wrote a test that repeatedly got groups of 50 random numbers and averaged them. If the values started to converge anywhere, you'd see the average converge. It doesn't. After 1000 groups, I saw:

0.4832707
0.5720206
0.4919030
0.4036785
0.4750714
0.4750229
0.5306970
0.5269253
0.4471116
0.5424646
0.5157450
0.5117723

This was with CVF 6.6C.
0 Compliments
bradlepc
Débutant
1 948 Visites
Steve,

Thanks. I should have been clearer that "Converges" was meant in a "feels like" sense rather than a strict mathematical sense. This is a monster message-passing parallel program. The code is not obviously broken. Different parallel subtasks lock in on the value at different times. One goes to it right away, most others after hundreds of thousands of calls to RANDOM_NUMBER. Once it hits the value printed out as 9.99999999534336714E-01, the value never changes.

Pete
0 Compliments
Steven_L_Intel1
Employé
1 948 Visites
Parallel, eh? You're calling RANDOM_NUMBER from different threads? Have you specified the threaded libraries?

From what I've seen of the routine code, I don't see how such a convergence is possible and I don't recall anyone reporting this issue to us before. I'd be very interested in seeing a reproducer.
0 Compliments
bradlepc
Débutant
1 948 Visites
Message passing parallel (MPI). Threaded libraries are specified, but the program is not explicitly threaded.

If there's not something in the usage that I was missing, I'll go back and see if we can run this down. We can add some code to pul the seed and see if it's changing. If it's not, I'll try feeding the seed to a simple test program and see what it does. It's hard for me to imagine that the generator is flawed though. I think it's more likely that the heap that the generator is using gets stomped somehow.

Stay tuned,

Pete
0 Compliments
bradlepc
Débutant
1 948 Visites
It turns out that just before the random number function locks up, RANDOM_SEED(GET) shows that the seed values magically become zero. It's not quite as simple as that though, since setting the seeds to 0 does not result in the random number that we see.

I note that the person I googled who had a similar problem fixed it by declaring his routine as RECURSIVE. All of these clues suggest heap corruption as the likely cause. We'll keep working it. Any additional thoughts welcome.
0 Compliments
Steven_L_Intel1
Employé
1 949 Visites
Do you get the same problem if you build against the DLL form of the run-time libraries? It's probably not heap corruption, but certainly there is data corruption going on.
0 Compliments
bradlepc
Débutant
1 949 Visites
We're building with /threads and /libs:dll, so I believe we're getting the threaded DLL version of the libraries.

An interesting fact is that the seed goes to zero in a totally unrelated call (MPI_INIT is one).

In the random number implementation of 6.6C, where does the seed get stored? Is it just a static variable or is it allocated? I suspect from our tests that calling RANDOM_SEED results in the seed going to a different virtual address. Is this true?

Pete
0 Compliments
bradlepc
Débutant
1 949 Visites
I meant to say "corruption in the heap".

Pete
0 Compliments
Steven_L_Intel1
Employé
1 949 Visites
From my reading of the sources, it is in "thread-local storage". This would be static for the non-threaded library and in some reserved part of the thread stack for the threaded library.
0 Compliments
bradlepc
Débutant
1 949 Visites
This one continues to be a head-scratcher. It turns out 8.1 give identical behavior to 6.6C, which is a pretty strong strike against a random piece of memory being stepped on.

We also duplicated the "seed changes" behavior with a small test program. Both the test program and the real thing show the seed changing during a call to MPI_Init. MPICH is used by thousands, and it's written in C, so it's hard to imagine a way in which it could have a bug that so consistently trashes the random_number generator.

Based on the other thing we googled, we tested linking with different libraries. It turns out that the problem only occurs when we do the final link with /libs:dll /threads. Take either one away and the behavior is normal.

I'm certainly not ruling out an MPI_Init bug. Any thoughts on why turning off one of the library flags would change the random number generator? Is the random number generator in 8.1 the same as the one in CVF?

Pete
0 Compliments
Steven_L_Intel1
Employé
1 949 Visites
Yes, the random number generator is the same. When you're using threads, the seed is kept in thread-local storage so it's possible that something is overwriting this in the MPI application.

If you do have a test program that demonstrates this in Intel Fortran (9.0, preferably), please submit it to us at Intel Premier Support and we'll take a look.
0 Compliments
bradlepc
Débutant
1 949 Visites
Steve,

This one continues to be a problem for us. As it turns out, it is easily duplicated with a test program, using IVF.

The attached program and library are all you need. I tried to go to Premier but it kept giving me "page not found" at login.

Compile commands:

ifort /compile_only /threads /libs:dll mpi_test.F

ifort mpi_test.obj /link /out:"mpi_test.exe" mpich.lib


The random numbers are ok if we don't throw /libs:dll, but that's not an option in the real program.

Any thoughts?

Pete
0 Compliments
Steven_L_Intel1
Employé
1 949 Visites
Well, I can't get the program to run on my system - it just hangs at the call to MPI_INIT. But...

You are using list-directed output to display the values. This will round the value to some number of decimal digits. You should also add a:

write (6,'(Z8.8)') rand_num

after the list-directed write to see what the binary representation is. My guess is that it is not exactly 1.0.
0 Compliments
jimdempseyatthecove
Contributeur émérite III
1 949 Visites

Pete,

One potential area of a problem regarding multiple threads is assumptions made by the programmer from experiences learned from prior programming practices. (caught me with my 35+ years of experience).

If your program contains say a subroutine or function that declares a local storage array e.g.

subroutine foo

real :: mine(3)

Then old programmers, or younger C++ programmers may tend to assume the array "mine(3)" is on the call stack. This is not necessarily so. In FORTRAN it is an implimentation issue as to if mine is on the stack or if mine is in a subroutine private static storage location. If in static storage then your multiple threads will be using the same location when your code may assume seperate locations.

The fix for this is to declare the variable to be local stack storage

real, automatic :: mine(3)

Declaring the routine reenterant may or may not fix the problem depending on other issues.

Jim Dempsey

0 Compliments
bradlepc
Débutant
1 949 Visites
I recognize it's not exactly 1. The problem is that it's the same over and over. Interesting that it hangs for you. I haven't seen that. This program is a nuisance to share because it really wants to run in parallel under MPICH.

Pete
0 Compliments
bradlepc
Débutant
1 949 Visites
The program is compiled with /threads, but it is not explicitly threaded. We consistently compile for multithreaded DLL to keep all of our many codes consistent. Only one thread of execution (as far as I know) is updating the random seed.

Pete
0 Compliments
Steven_L_Intel1
Employé
1 949 Visites
I was able to run this on a different system. What I see is that the call to MPI_INIT causes the seed values to revert to the default. Why this should be, I don't know. (I don't get any values even close to 1.0, though.)

For my amusement I tried adding a call to RANDOM_SEED (with no arguments) at the beginning. This sets the seed to something based on the system clock (behavior not specified by standard, but widely used.) As soon as the MPI_INIT call was done, it reverted back to what it would have been without the added call to RANDOM_SEED.

Assuming you call MPI_INIT only once in the program, it's unclear to me why this is really a problem.

I'll ask the developers if they understand the behavior.
0 Compliments
Steven_L_Intel1
Employé
1 949 Visites
I can reproduce some odd behavior on another system. If I take your program and build it, as is, I see that the call to MPI_INIT resets the random seed back to its initial value. If I add a commented-out line to the source and rebuild, that effect of MPI_INIT goes away. If I remove all my changes, the effect stays away. If I restore your copy of the source file, it comes back. No, I haven't been drinking....

Gotta go play with this some more...
0 Compliments
bradlepc
Débutant
1 949 Visites
Welcome to our pain! Sounds like good progress.

Pete
0 Compliments
Steven_L_Intel1
Employé
1 846 Visites
By the way, none of the numbers I get are even close to 1.0.
0 Compliments
Répondre