Re: Strange COS results

groupw_bench · ‎11-06-2008

I've been chasing down a problem which so far has been reported only on one machine. The first solid evidence of something wrong is in the calculation of a double precision real COS value. On four machines I have, the calculation is correct. On the problem machine (which I don't have), the function returns incorrect values. Worse yet, it returns different values each time. I've had the user do a RAM test using the Microsoft test program, and it passes all tests. I'm of course suspicious of the problem machine, except for one thing -- an older version of my program which was compiled with the Compaq Fortran compiler runs fine. A test program which only calculates some cosine values functions perfectly on the problem machine when compiled with the Compaq compiler; it always produces incorrect values on the problem machine when compiled with the Intel compiler. (Both compilations do fine on my machines.)

Here's the test program:

Program CosExp

IMPLICIT NONE

REAL (KIND = 8) :: d, c
INTEGER (KIND = 4) :: i

open (UNIT=1, FILE='CosTest1c.txt', STATUS='replace')
write(1,*)'Value, Cos'

d = 1.558548156358005D-2
c = cos(d)
write(1,*),d,c
do i = 1, 20
d = .01D0 * i
c = cos(d)
write(1,*),d,c
end do
do i = 1, 20
d = .1D0 * i
c = cos(d)
write(1,*),d,c
end do

Close(1)
end

The first value to evaluate was chosen to be the same as a value in the program the user was having trouble with and where I first spotted the error. The test program was compiled with IVF v. 10.1.025 and CVF v. 6.6C. Default debug settings were used, which include no optimization. Here are the first few results when run on my machines -- both compilations produced identical results. The problem machine also produces identical results to these from the Compaq compiled program. These results are, as far as I can tell, correct.

Value, Cos
1.558548156358005E-002 0.999878548840693
1.000000000000000E-002 0.999950000416665
2.000000000000000E-002 0.999800006666578
3.000000000000000E-002 0.999550033748988
4.000000000000000E-002 0.999200106660978
5.000000000000000E-002 0.998750260394966
6.000000000000000E-002 0.998200539935204
7.000000000000001E-002 0.997551000253280
8.000000000000000E-002 0.996801706302619
9.000000000000000E-002 0.995952733011994
0.100000000000000 0.995004165278026

Here are the same calculations from the Intel compiled program run on the problem machine two different times:

Value, Cos
1.558548156358005E-002 0.999994306976907
1.000000000000000E-002 0.999999999999998
2.000000000000000E-002 0.999999999964238
3.000000000000000E-002 0.999999999678159
4.000000000000000E-002 0.999850019998933
5.000000000000000E-002 0.997590080547875
6.000000000000000E-002 0.997471971872478
7.000000000000001E-002 0.997154117664448
8.000000000000000E-002 0.996636549708941
9.000000000000000E-002 0.995919319762322
0.100000000000000 0.995002499546987

Value, Cos
1.558548156358005E-002 -1.221295362646600E+150
1.000000000000000E-002 0.999999999999998
2.000000000000000E-002 0.999999990845032
3.000000000000000E-002 0.999999999979885
4.000000000000000E-002 0.999999853535155
5.000000000000000E-002 0.997590080547875
6.000000000000000E-002 0.997471971872478
7.000000000000001E-002 0.997154117664448
8.000000000000000E-002 0.996636549708941
9.000000000000000E-002 0.995919319762322
0.100000000000000 0.995002499546987

Notice that E+150 result on the second run!

I tried explicitly calling DCOS instead of COS but the problem machine still produced bad results from the Intel compiled program.

The problem machine is a Dell Optiplex GX-260 with 2.8 GHz Pentium 4 processor, running Windows 2000 SP4. My test machines are a Pentium 4 running XP Pro, Pentium 3 running 2000 SP4, and Pentium 4 and Pentium 2 laptops running XP Pro. The Intel compiled program results are fine on all my machines. The user reports no problems with other applications, and no problems with an earlier version of my program compiled with the Compaq compiler. So there's something about the combination of the problem machine and the Intel compiler that's causing the errors. Any idea what it might be?

jimdempseyatthecove · ‎11-06-2008

The interesting thing is the initial value is incorrect

d = 1.558548156358005D-2
c = cos(d)
write(1,*),d,c

Can you place a break point on the cos line, then at break open dissassembly window. and copy and paste window to forum message?

Jim Dempsey

groupw_bench · ‎11-06-2008

Quoting - jimdempseyatthecove

The interesting thing is the initial value is incorrect

d = 1.558548156358005D-2
c = cos(d)
write(1,*),d,c

Can you place a break point on the cos line, then at break open dissassembly window. and copy and paste window to forum message?

Jim Dempsey

Do you mean that the first calculated cosine is incorrect, or is something wrong with the first argument? All or nearly all the calculated cosines are incorrect on the problem machine with the Intel compiled test program.

I'll be glad to do as you suggest, if you'll tell me how to open the disassembly window. I can't find it in any of the menus, and the only reference in the manual implies that it sometimes opens by itself. It doesn't open by iteself when the program stops at the breakpoint.

Steven_L_Intel1 · ‎11-06-2008

What happens if you compile with /QxW ?

jimdempseyatthecove · ‎11-07-2008

Quoting - groupw_bench

Do you mean that the first calculated cosine is incorrect, or is something wrong with the first argument? All or nearly all the calculated cosines are incorrect on the problem machine with the Intel compiled test program.

I'll be glad to do as you suggest, if you'll tell me how to open the disassembly window. I can't find it in any of the menus, and the only reference in the manual implies that it sometimes opens by itself. It doesn't open by iteself when the program stops at the breakpoint.

The first printout of the cos, prior to your loop, varied greatly between systems. Investigate the first error first.

When debugging with Visual Studio IDE Debugger, when you reach the break point, click on

Debug | Windows | Dissassembly

Then try to select and copy the codewindow at and following the a=cos(d) through to the next source statement.

I do not use the Intel Debugger, but it will have a similar feature.

Jim Dempsey

groupw_bench · ‎11-07-2008

Quoting - Steve Lionel (Intel)

What happens if you compile with /QxW ?

No apparent change. It still produces varying and incorrect values on the problem machine, and correct results on my machine.

The user with the problem machine ran the test program multiple times and got the following results for COS(1.558548156358005D-2):

0.999999644186057 
0.999994306976907
1.00000000000000 
0.999999911046514 
1.00000000000000
0.999994306976907 
1.00000000000000
 0.999994306976907 
0.999999999999979
1.00000000000000 
0.999999644186057
0.999994306976907
0.999999644186057
0.999999644186057  
-7.342756263218562E+260
1.00000000000000
1.00000000000000
-1.221295362646600E+150
1.00000000000000
 0.999994306976907
1.00000000000000
0.999994306976907
0.999994306976907
0.999994306976907
0.999999644186057
1.00000000000000
0.999999644186057
0.999994306976907
1.00000000000000
1.00000000000000
0.999994306976907
0.999994306976907
0.999994306976907
1.00000000000000
0.999999644186057
0.999999644186057
1.00000000000000
0.999994306976907
0.999994306976907
0.999999998610102
 1.00000000000000
1.00000000000000

These were with the program not having the /QxW option, but using that option produced similar results.

groupw_bench · ‎11-07-2008

Quoting - jimdempseyatthecove

The first printout of the cos, prior to your loop, varied greatly between systems. Investigate the first error first.

When debugging with Visual Studio IDE Debugger, when you reach the break point, click on

Debug | Windows | Dissassembly

Then try to select and copy the codewindow at and following the a=cos(d) through to the next source statement.

I do not use the Intel Debugger, but it will have a similar feature.

Jim Dempsey

Sorry, I thought you meant a menu in the compiler GUI. The only choices in its Debug/Windows submenu are

Immediate

Locals

Breakpoints

Output

Autos

Call Stack

Threads

Watch

Array Visualizer (greyed out)

Modules

Script Explorer

Processes

I see now that that the Intel Debugger is a separate program with its own manual. My Windows Control Panel shows that I have the Intel Debugger v. 10.1 installed, and I located it and its manual. From the description, it appears to be a console application without any GUI. Here's the basic information on starting the Intel Debugger, from its manual:

...................

Before you start the debugger, make sure that you have correctly set the size information for your terminal; otherwise, the debugger's command line editing support may act unpredictably. For example, if your terminal is 25x80, you may need to set the following:

% set LINES=25

% set COLS=80

There are four basic alternatives for running the debugger on a process (see examples below):

Have the debugger create the process using the command prompt line to identify the executable to run
Have the debugger create the process using the debugger commands to identify the executable to run
Have the debugger attach to a running process using the command prompt line to identify the process and the executable file that process is running
Have the debugger attach to a running process using the debugger commands to identify the process and the executable file that process is running

....................

I don't see any reference to any menus -- it seems like an entirely command line driven console application. What am I missing?

Steven_L_Intel1 · ‎11-07-2008

Here's another thing to try. Boot the problem system into "Safe Mode" (press F8 during the initial Windows boot screen) and run the program. Are the results better? I have seen drivers and even antivrus software corrupt data during execution. This test should eliminate most of the background software which may be affecting the run.

groupw_bench · ‎11-07-2008

Quoting - Steve Lionel (Intel)

Here's another thing to try. Boot the problem system into "Safe Mode" (press F8 during the initial Windows boot screen) and run the program. Are the results better? I have seen drivers and even antivrus software corrupt data during execution. This test should eliminate most of the background software which may be affecting the run.

Thanks, I had him do this long ago with the program he was encountering trouble with. No help.

g_f_thomas · ‎11-08-2008

On firstlook it'sas ifdouble on one machine is single on the other.

Try on the fly determination of the real and integer KINDs on both machines instead of hardwiring them to 8 or 4.

Gerry

groupw_bench · ‎11-08-2008

Quoting - g.f.thomas

On firstlook it'sas ifdouble on one machine is single on the other.

Try on the fly determination of the real and integer KINDs on both machines instead of hardwiring them to 8 or 4.

Gerry

Sorry, I don't understand what you're suggesting. Could you furnish some sample code?

And are you saying that -7.342756263218562E+260 and -1.221295362646600E+150 are correct single precision results for COS(1.558548156358005D-2)?

TimP · ‎11-08-2008

Quoting - groupw_bench

And are you saying that -7.342756263218562E+260 and -1.221295362646600E+150 are correct single precision results for COS(1.558548156358005D-2)?

There's nothing correct about attempts to use single and double precision values in the same memory slot. It produces wildly wrong results on any recent architecture, with one exception as recent as CVF: it was possible to use single or double precision function results indiscriminately, and get at least single precision accuracy. However, that explanation fits the source code presented here only if compiled with a pre-f77 compiler, which doesn't exist for the platforms in question.

g_f_thomas · ‎11-08-2008

Quoting - groupw_bench

Sorry, I don't understand what you're suggesting. Could you furnish some sample code?

And are you saying that -7.342756263218562E+260 and -1.221295362646600E+150 are correct single precision results for COS(1.558548156358005D-2)?

cos(1.558548156358005D-2) evaluates to the same correct result using CVF and IVF.

Submit a support request to Premier Support along with the errant code and let them advise you on how to fix it.

Gerry

FYI, neitherof the over the top values you report are legitimate single precision numbers.

groupw_bench · ‎11-09-2008

Thanks for all the suggestions. I've submitted it to support, and I'll post the solution here when I get it.

groupw_bench · ‎11-11-2008

Premier Support has responded. Windows 2000 isn't a supported operating system for deployment of IVF v. 10 programs, and they can't duplicate it on their supported systems, so there's nothing they will do.

The program works fine on my Windows 2000 system and those of, I'd estimate, at least 100 of my customers. (A fair number are also running it under Windows 98.) But this should serve as a cautionary note to anybody thinking of deploying IVF applications on anything other than XP or Vista.

groupw_bench · ‎11-20-2008

Update: Steve Lionel expressed an interest in tracking the problem down, but because of the inability to duplicate it there (or here) it would require remote access to the user's machine. Unfortunately it's a corporate machine and outside access is prohibited. So what I'm going to do is put a trap in the next revision of my program, so if a bad result comes from a test COS calculation, the user will get a message to notify me. If the problem really is occurring on more than just this one machine, I'll know about it before long and I'll pass it along to Steve.

lklawrie · ‎11-22-2008

Windows 2000 isn't supported for V10 IVF compilers. Yikes. This does not seem good. Steve? Is this true? I don't think I want to post this to my users......

Linda

TimP · ‎11-22-2008

Quoting - linda@lawrie.com

Windows 2000 isn't supported for V10 IVF compilers. Yikes. This does not seem good. Steve? Is this true? I don't think I want to post this to my users......

You might have noticed that Microsoft suspended all support for Windows 2000 over 2 years ago. This alone means it can't get full support. It never supported HyperThread correctly, which indicates how long ago it stopped added support for new hardware features. It's lost a lot of value when it's not allowed on corporate intranets due to absence of vulnerability fixes. Even medical offices have discontinued its use. It's unfortunate when good things have to be dropped, but that point was passed a while ago.

As far as Intel tools go, the only one I noticed which broke on Windows 2000 was VTune.

Steven_L_Intel1 · ‎11-22-2008

It is true that the compiler is not supported on Windows 2000. Your applications may run in Windows 2000 but we don;t test that. We generally follow Microsoft's lead in this regard.

lklawrie · ‎11-23-2008

Like I said, I won't tell any of my users, some of whom are still using Window 98 or ME. Our apps are mostly console apps and, generally speaking, we've had reasonable response to using them on any "windows" platform. We can't control which platform they might be using, only that our install, etc work there.

I have had some weird problems with COS routines, but it was not using IVF or CVF or even g95. And it was on Linux.

Linda

Steven_L_Intel1 · ‎11-23-2008

Your apps are likely fine. It's installing and using the compiler that is not supported.