Solved: Re: Question

rafadix08 · ‎12-09-2009

I compiled my code on my intel fortran compiler 11.1.048.
The code just ran fine.

When I compiled and ran the code in a UNIX cluster, with the Intel Fortran 11.1 I have something really weird going on.

First, the program was crashing at some point... Debugging it I found this out:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

this prints a bunch of zeros on the screen, which should be the case.

However if I do this:

XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

I get nonzero stuff printed.

More confusingly, if I code:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

Then I get zeros everywhere.

How is this happening?

Let me just remind that none of this happens on my machine using the Intel compiler and Intel Fortran 11.1.048. But that happens when I migrate to the UNIX cluster with Intel Fortran 11.1

Thanks,
Rafael

Martyn_C_Intel · ‎12-17-2009

Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.

In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.

Martyn

View solution in original post

DavidWhite · ‎12-09-2009

What do you mean by "non-zero"? Are these large values, or just round-off? Some formats may store the intermediate result, giving zero. Are the compiler defaults the same for both environments? Theuse of stack vs variable storage may be different.

David

rafadix08 · ‎12-09-2009

Quoting - David White

What do you mean by "non-zero"? Are these large values, or just round-off? Some formats may store the intermediate result, giving zero. Are the compiler defaults the same for both environments? Theuse of stack vs variable storage may be different.

David

Thank you for the reply, David.

These are not round-off values, these are big values. The right answer should be zero.
The compiler defaults are not exactly the same, but that should not be the issue since I am using the heap-arrays option in both.

Here is another test:

This produces right results if I print XR(14:17)

a = log(rsk(1))
XR(14) = a**2
a = log(rsk(2))
XR(15) = a**2
a = log(rsk(3))
XR(16) = a**2
a = log(rsk(4))
XR(17) = a**2

This produces wrong results:

a = log(rsk(1))**2
XR(14) = a
a = log(rsk(2))**2
XR(15) = a
a = log(rsk(3)**2
XR(16) = a
a = log(rsk(4))**2
XR(17) = a

XR(14) stores the right number, but XR(15) to XR(17) store wrong numbers.

Just for background, my original code had:

XR(14) =log(rsk(1))**2
XR(15) =log(rsk(2))**2
XR(16) =log(rsk(3))**2
XR(17) = log(rsk(4))**2

and these assignments were producing wrong results.

I tried so many things... Really don't know what's going on.

DavidWhite · ‎12-09-2009

Rafael,

as has been repeated many times on the forum in recent weeks when strange results occur, have you checked array bounds, etc - is there a possibility of trampling over memory giving these results?

David

Paul_Curtis · ‎12-09-2009

Quoting - rafadix08

...
XR(14) =log(rsk(1))**2
XR(15) =log(rsk(2))**2
XR(16) =log(rsk(3))**2
XR(17) = log(rsk(4))**2

and these assignments were producing wrong results.

try cleaning your code, and optimize (appropriate declarations inferred); exponentiating to the power 2 is never as efficient or accurate or fast as simple multiplication:

[cpp]DO j = 1, 4
lrsk = LOG(rsk(j))
xr(13+j) = lrsk * lrsk
END DO[/cpp]

TimP · ‎12-09-2009

Any satisfactory compiler will perform full optimization of **2 (**2. would be debatable). The C analogue is debatable as well, but we're talking about Fortran consensus going back at least 3 decades. Even the f2c translator can deal with it.

rafadix08 · ‎12-09-2009

Yes, I know that arrays out of bounds can produce strange results. ButI did check the bounds of my arrays... Too many times!I also compiled with the Qdiag-enable option to see if the compiler detected something, but no.

I did try doing log*log, but the same problem persists.

At some other portion of my code I have assignments of exactly the same type and these work fine.

rafadix08 · ‎12-09-2009

Quoting - rafadix08

Yes, I know that arrays out of bounds can produce strange results. ButI did check the bounds of my arrays... Too many times!I also compiled with the Qdiag-enable option to see if the compiler detected something, but no.

I did try doing log*log, but the same problem persists.

At some other portion of my code I have assignments of exactly the same type and these work fine.

Let me just add that my code worked in my Windows machine.
I am having trouble executing it in a UNIX cluster.
If the options are not the same, they are very similar...

Here is my compiling line on the UNIX machine:
ifort trsapp.f bigden.f newuoa.f update.f biglag.f newuob.f Global_Data.f90 minim.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Loss_Function_MOD.f90 calfun.f Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -openmp -lpthread -heap-arrays 0

Here is my Build using Microsoft Visual Studio:
Deleting intermediate files and output files for project 'Estimation_4sectors', configuration 'Release|Win32'.
Compiling with Intel Visual Fortran 11.1.048 [IA-32]...
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4newuoa.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4bigden.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4trsapp.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4biglag.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4newuob.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4update.f"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Global_Data.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4LinReg_MOD.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4minim.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Parallel_Emax_MOD.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Loss_Function_MOD.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4Main.f90"
ifort /nologo /heap-arrays0 /Qopenmp /module:"Release" /object:"Release" /libs:static /threads /c /extfor:f /Qvc9 /Qlocation,link,"c:Program FilesMicrosoft Visual Studio 9.0VCbin" "C:Documents and SettingsRafael Dix CarneiroMy DocumentsThesisFortran Codes4Sectors_Educ4calfun.f"
Linking...
Link /OUT:"ReleaseEstimation_4sectors.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.exe.intermediate.manifest" /SUBSYSTEM:CONSOLE /STACK:100000000 /IMPLIB:"C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.lib" "Releasenewuoa.obj" "Releasebigden.obj" "Releasetrsapp.obj" "Releasebiglag.obj" "Releasenewuob.obj" "Releaseupdate.obj" "ReleaseGlobal_Data.obj" "ReleaseLinReg_MOD.obj" "Releaseminim.obj" "ReleaseParallel_Emax_MOD.obj" "ReleaseLoss_Function_MOD.obj" "ReleaseMain.obj" "Releasecalfun.obj"
Link: executing 'link'

Embedding manifest...
mt.exe /nologo /outputresource:"C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.exe;#1" /manifest "C:Documents and SettingsRafael Dix CarneiroMy DocumentsVisual Studio 2008ProjectsEstimation_4sectorsEstimation_4sectorsReleaseEstimation_4sectors.exe.intermediate.manifest"

Estimation_4sectors - 0 error(s), 0 warning(s)

abhimodak · ‎12-09-2009

Just a silly check: Are the rsk and XR defined to be of the same precision? Although, the LOG function overloads to correct value based on the kind of the argument, you may want to test the results by using Real(4). Also, may be try DLOG as well.

Abhi

rafadix08 · ‎12-09-2009

Quoting - abhimodak

Just a silly check: Are the rsk and XR defined to be of the same precision? Although, the LOG function overloads to correct value based on the kind of the argument, you may want to test the results by using Real(4). Also, may be try DLOG as well.

Abhi

Yes, XR and rsk are both double precision.
I also tried dlog, but same problem...

rafadix08 · ‎12-10-2009

I would greatly apprecite if I had a reply from the Intel team.

I have tried to clean up the code as much as I can in order to isolate the problem but it's still there.

Here is a short description of the problem:

I call a function that has the following assignments in its body:
XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

It turns out that XR(14:17) are not being recorded in the right way. I have the following lines that tell me that:

print*, 'log(rsk(1))**2 =', log(rsk(1))**2, 'XR(14) =', XR(14)
print*, 'log(rsk(2))**2 =', log(rsk(2))**2, 'XR(15) =', XR(15)
print*, 'log(rsk(3))**2 =', log(rsk(3))**2, 'XR(16) =', XR(16)
print*, 'log(rsk(4))**2 =', log(rsk(4))**2, 'XR(17) =',XR(17)

This should produce two columns with exactly the same numbers.

Instead, here is what I get:

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

XR(14) is right, but XR(15) is recording log(rsk(3))**2 instead of log(rsk(2))**2 and X(16:17) are recording something I don't know what it is.

Here is a short history of what I have done in order to solve the problem:

I compiled this code on my own Windows machine and the code is working perfectly. The above problem does not show up in my Windows machine.

The above problem shows up only when I compiled the exact same code in a UNIX cluster (Intel 11.1).

I am aware that arrays out of bounds are the first suspects for this type of problem and have thorouly check for that. I have compiled the code with -diag-enable sc and with -check all
The only message I receive is:
forrtl: warning (402): fort: (1): In call to EMAX_HAT, an array temporary was created for argument #1, but from what I know this warning is inoffensive.

I took away many parts of the code in order to focus on only the portion of code that is causing the problem.

Please let me know what type of settings I could try in order to find out what is going on.

Many thanks,
Rafael

Steven_L_Intel1 · ‎12-10-2009

If you would provide a small (if possible) but complete program that demonstrates the problem, I'd be glad to take a look. I don't think speculating based on code excerpts would be worthwhile.

Also, I am a bit confused when you say "UNIX machine", as Intel Fortran doesn't support any "UNIX" systems. We do support Linux, which is of course related to UNIX, but usually people don't call Linux "UNIX". Which Intel compiler version are you using on this UNIX system?

I would suggest "-warn interface" as a useful addition to your compiles. The symptom is that of argument type mismatches.

rafadix08 · ‎12-10-2009

Quoting - Steve Lionel (Intel)

If you would provide a small (if possible) but complete program that demonstrates the problem, I'd be glad to take a look. I don't think speculating based on code excerpts would be worthwhile.

Also, I am a bit confused when you say "UNIX machine", as Intel Fortran doesn't support any "UNIX" systems. We do support Linux, which is of course related to UNIX, but usually people don't call Linux "UNIX". Which Intel compiler version are you using on this UNIX system?

I would suggest "-warn interface" as a useful addition to your compiles. The symptom is that of argument type mismatches.

Hi Steve,

Thanks for offering, I am attaching my program... It is going to print a bunch of stuff, but what I need is that the two columns produce the same results that is (log(rsk(1:4)))**2 and XR(14:17) should be the same.

Here is the operating system / Intel Fortran details: PU_IAS Linux 5 and I have Intel 11.1 on that machine.

I am compiling with the following commands:

ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -openmp -lpthread -heap-arrays

Many thanks for your help.

Rafael

Steven_L_Intel1 · ‎12-10-2009

I first tried this on Windows. I had to change the reference to mkl_lapack.f90 as that file is not provided by MKL that comes with Intel Fortran 11.1. Are you using a different MKL?

The program prints out those four values many, many, many times. Earlier you wrote that they should be zero, but they're not when I run it on Windows. What should I be looking for?

rafadix08 · ‎12-10-2009

Quoting - Steve Lionel (Intel)

I first tried this on Windows. I had to change the reference to mkl_lapack.f90 as that file is not provided by MKL that comes with Intel Fortran 11.1. Are you using a different MKL?

The program prints out those four values many, many, many times. Earlier you wrote that they should be zero, but they're not when I run it on Windows. What should I be looking for?

Hi Steve,

Thanks for trying that out.

When I run it on Windows, with my compiler 11.1 everything works fine, so that's the puzzle.

In function Emax_hat in module Emax_MOD (Parallel_Emax+MOF.f90) I have several assigments:
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2

However, these assignments are not working properly.

Something like this is printed repeatedly on the screen:

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

And the first column should be equal to the second one.

Running the exact same code as the one you have in Linux with Intel 11.1 I have different results (see above)...

But I want to emphasize that compiling and running on Windows nothing of this happens.

Many thanks again,
Rafael

rafadix08 · ‎12-10-2009

Complemeting my previous reply:

When I compile on Windows I use:
include 'lapack.f90'
instead of include 'mkl_lapack.f90'

In the Linux cluster, in order to use MKL I load a module: intel-mkl/10.1/015/64

Hope that helps,
Rafael

rafadix08 · ‎12-10-2009

Ooops... Sorry...
Copied and paste and forgot to edit...

The assignments that are not working are:
XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

And not

XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2
XR(14) = log(rsk(1))**2

jimdempseyatthecove · ‎12-11-2009

Rafael,

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

FWIW your XR(15) appears to contain the log(rsk(3))**2 value, not the log(rsk(2))**2 value.
This could potentialy be:

1) an SSE versioning error between what your processor supports and what your code requires
2) Potentially a cache coherency issue related to parallel programming (you have not stated as to if this code is executing in a parallel region (with other threads potentially writing to the same/nearby XR(i) location))

Jim Dempsey

rafadix08 · ‎12-11-2009

Quoting - jimdempseyatthecove

Rafael,

log(rsk(1))**2 = 3.728137320489804E-003 XR(14) = 3.728137320489804E-003
log(rsk(2))**2 = 0.596035301219827 XR(15) = 1.289162843445539E-002
log(rsk(3))**2 = 1.289162843445539E-002 XR(16) = 0.617984690539646
log(rsk(4))**2 = 7.44257262752527 XR(17) = 0.862984463126197

FWIW your XR(15) appears to contain the log(rsk(3))**2 value, not the log(rsk(2))**2 value.
This could potentialy be:

1) an SSE versioning error between what your processor supports and what your code requires
2) Potentially a cache coherency issue related to parallel programming (you have not stated as to if this code is executing in a parallel region (with other threads potentially writing to the same/nearby XR(i) location))

Jim Dempsey

Thank you for your message, Jim.

Yes, I did notice the switch you are mentioning.

About your points:

1) Could you please be more specific? How can I check that? I have successfully compiled and run another version of this code (minor modifications) on exactly the same Linux system.

2) The problem persists if I compile the code serially. But this shouldn't make a difference since XR is a local variablein a function that is not parallelized.

XR is actually a vector of size 81 and this problem of wrong assigments occur only for the entries XR(14:17).

I am really puzzled and don't know what to do. Just a reminder that this exact same code was successfully compiled and ran well in my Windows machine. So I can only think that there is a compiler option that I should set, there is a compiler bug, or some oder incompatibility.

jimdempseyatthecove · ‎12-13-2009

>>2) The problem persists if I compile the code serially. But this shouldn't make a difference since XR is a local variablein a function that is not parallelized.

Although this function may not be parallized, it may be called from a routine that is parallized. If so, it needs to be thread-safe.

Mark your subroutine as RECURSIVE or use the INTEL specific ", AUTOMATIC" when declaring the array XR.

subroutine foo
real(8) :: XR(81) ! this is a SAVE array (or more precisely NON-guaranteed local array)

recursive
subroutine foo
real(8) :: XR(81) ! this is a local array

subroutine foo
real(8), automatic:: XR(81) ! this is a local array (but automatic is Intel specific)

Try addressing 2) first (above addresses 2)

for 1), pick an oldercomputer architecture such as Pentium 4, then migrate to newer archetectures.

Jim Dempsey

rafadix08 · ‎12-13-2009

Jim,
Many thanks for your responses, I really appreciate it.

I tried a couple of the things you suggested, but none worked.

First, the automatic array declaration and then the recursive array declaration. The problem persists.

In the end I deleted all the parallelization code I had and made it a purely serial code. I also compiled it without the openmp option. Tried the automatic and recursive (one at a time) declarations but they didn't work once again.

I have not tried your solution to 1), but when I run the code in my machine with Intel Core 2 Duo Processors it works.
When I run it on a Linux system equipped with 8 Intel Xeon CPU E5345 processors it doesn't work and the point where I see the code messing up is exactly at this XR array assingnment.