Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Catastrophic debugger bug

dondilworth
New Contributor II
4,238 Views
This is painful.

I have a mixed-language program and ran the debugger. One of my Fortran subroutines has several entry points. I wanted to check the value of an input argument to be sure it was the same as the calling program's value. I held the mouse over the variable name, and nothing happened. (Usually, it shows the numeric value.) So I dragged that name into the Watch pane.

The system froze. Nothing responded to anything. Ctrl+Alt+Del did nothing. I tried it lots of times, and then got a blue screen.

Rebooted, and my desktop icons were in all the wrong places. My network did not work. Hours on the phone with three tech support guys, who tried everything. The restore points were corrupted.

They said I had to wipe my disk and reinstall everything. Ugh. Anyone who's used a computer knows bad news when it hears it. It's not nice to scream and weep in public -- but this was definitely the time.

I now have some of my stuff reinstalled, but the network still does not work. I'm not an expert in that, and it will take me several days of dumb trial and error before it works again. Fortunately, I had a backup copy of all my source files, so it could have been worse.

Be advised: if you use the debugger for a Fortran program, you have entered a minefield. Does the programming team know about this?
0 Kudos
32 Replies
Steven_L_Intel1
Employee
2,697 Views
I don't think this is "a debugger bug". While I have seen issues with the debugger misbehaving, often when a Windows process called ctfmon.exe is running, you have much bigger problems for which the debugger (and you) is just an innocent victim. There is certainly nothing in the debugger that would be able to cause the amount of system damage you report.

From the symptoms you describe, I would suspect disk corruption, bad memory or perhaps even malware.
0 Kudos
dondilworth
New Contributor II
2,697 Views
Well, this is about the kind of response I was expecting: "It's not our fault". How do you know? Denying a problem is never the first step in solving it.

I have a brand-new system, very little software installed yet, little on-line time, and a good antivirus program installed (Zone Alarm). I go to the debugger, do a simple task, and it crashes. Sure, it's not your fault.

Have you tried the steps I described? If not, your reply sounds like wishful thinking.

Please do your job and give me a responsible answer.
0 Kudos
mecej4
Honored Contributor III
2,697 Views
Infantile mortality of electronic components is not unheard of. In fact, I had a mouse failure a few weeks ago which, before I figured out it as such,was very perplexing and made me do things I should not have.

The "steps I described" are too generic to do anything useful. I have for years used the VS debuggers (VS2005, 2008 and now 2010) to debug Fortran, C (and even assembler) with and without debug-symbols in the EXE. The debugger itself has never failed. Other components, such as the project converter and features such as "create a project from existing code" have not always worked, but never have I seen the debugger cause crashes all by itself.

It is possible to have such a nasty bug in one's own code that the debugger is not capable of catching in time. In such cases, a test case with full source code and instructions for building the EXE and reproducing the bug are nearly indispensable.
0 Kudos
dondilworth
New Contributor II
2,697 Views
Here's a test that is not generic at all.

1. Create a project with C++ the main language
2. Add some Fortran subroutines
3. Add a subroutine with several entry points, each with different arguments.
4. Call one of those entry points
5. look at the value of the incoming data in the debugger. (The variable is passes by reference.)
6. See if the debugger can display the value by holding the mouse over it.
7. If not, try dragging the name into the watch pane.
8. Oh, yes, back up your system first.

I did this sequence (which occurs in my code), twice. The first time, the drag made the system hang, but the task manager was able to stop the VS application. The second time, same place, the drag made the world implode. Your observation "...has never failed" is true but a distraction. It never failed for me with VS6 and CVF either. I'm talking about the time it did fail, not all the times it did not.

If a given tool fails twice doing the same thing, it's a sign that there's something wrong with the tool.

If you do the above tests and everything works, then we have to look elsewhere for the culprit. I would be pleased to learn that the debugger is not at fault, since I use it every day and I can't afford to waste about three days reinstalling everything, looking up license codes, remembering all my settings and preferences, and then waiting for several hours while all of the reinstalled software updates itself. (About a GB of updates, into my DLS connection. Painful.)

But if the culprit is elsewhere, how can I sleep at night knowing what might explode tomorrow? In such a serious situation, it seems to me that the proper course for those denying the problem is first to thoroughly test the debugger. If it passes, then of course you can deny.
0 Kudos
mecej4
Honored Contributor III
2,697 Views
I am just another user who, in that capacity, shared his experience with you. I do not write C++ code, but there are others who regularly post here and use C++.

If the caller can be in C rather than C++ and still cause this problem, I'd like to try out a test case.

It would really help to have source code for the pathological case, even if you are forced to reconstruct it from memory (in order to avoid another system crash and HD restore at your end) rather than capture the source code as it was just before the crash.
0 Kudos
bmchenry
New Contributor II
2,697 Views

I have used C++ and Fortran codes intermixed for years without any of the types of problems you describe. Generally i have a Fortran project and a c++ project and include them in a solution.
Debugging is no problem in that scenario (and C++ sometimes has more tools available!)

The only different type of thing you mention is that you 'drag' a variable to the watch window?
Since C++ has more/different tools/etc perhaps it is dragging something unexpected to Fortran debug window which wreaks havoc on the system?
I normally do not 'drag' things between applications. I simply copy the variable name and paste it into the watch window (i expecting that is what you do too?) (Or I am sure I save things before the drag in case things go horribly wrong!)
I have had too many lockdowns with the mother of all bugfests Windows Word where after i drag something (a picture from the web? a picture from another application?) to Word it locks down the system requiring a hard reboot (unplug the computer and battery and restart from there!)

I think it is all the interoperability of processes and something when things get dragged across applications it locks down the system as the system tries to resolve the 'paste' of the 'something' that was inadvertently included in the 'drag'.
I recommend you select something to be sure you have what you want/need, copy it and then paste it.
Be wary of dragging things.

And another thing...(and please dont take offense to this however everytime I hear someone reinstalling the system I think of possibly lazy tech support!) It seems to be typical response of some tech support departments these dayz (and INTEL I am not saying you ever do this and have never done this to me!)

What a waste of time!

How long was it between you started the reinstall and you were back to the point you were at when something went wrong? days? weeks?
Did you start up in safe mode and/or boot from disk/CD/USB to check the disk for issues? Didyou remove the disk and attach it to a USB on another computer and scan it for any issues/defects on another computer?

Did you try windows repair?

You may have done all these things. I have heard of many other 'non tech' folks with computers who have a minor issue and then have to go through 'resinstall' (like when the 'Geeks' at certain support places who don't know what they are doing say it needs a resintall!! (my sister and another friend had this situation and i went and rescued the computer for the moronic 'Geeks' , scanned the disk from another computer, removed some issues and voila! it restarted without any problem!)

Hardware issues should NEVER require a reinstall until and unless you are CERTAIN it is the only way to resolve the issue.
And software issues should be able to be resolved with Windows Repair or other utilities.
0 Kudos
bmchenry
New Contributor II
2,697 Views

Sorry to be so chatty

One other item: I would suggest you rid your code of ENTRY points. Could they be the source of your issues?

I am surprised that havent been ruled obsolescent like Alternate Returns??

A simple way to rid your code of them is to either

1) Simply add an argument to the calling sequence so all calls go though the main entry point and then route to the formerly entry points.

OR

2) Or put all local storage for the subroutine in a module and include it in the sepearte subroutines you create out of each ENTRY point.

I have worked with a lot of older code which had many many ENTRY points and got rid of them all.

0 Kudos
mecej4
Honored Contributor III
2,697 Views
The ENTRY feature is, indeed, a problem (perhaps not as destructive as described). Here is a short example, all Fortran, that shows the defect in the debugger.

Using IFort 12.1.3, with /Zi /MD, and running under the VS2010-SP1 debugger, after entry to the main entry of the subroutine I can see the values of arguments a and b in the debugger by hovering the mouse cursor on the variables immediately after subroutine entry. If those variable names are inserted into the watch window, the values are shown correctly. As soon as the variable c acquires a value, its value is also displayed correctly. However, when the secondary ENTRYs are called, the debugger appears not to have the necessary and/or correct debug symbols. In fact, after entering EntryB the debugger shows the value of the first argument, p, nor as the value of p but as the value of the now inactive argument a of the prior call, and does not show the value of r after this argument acquires a value. Only after return to the caller can you see that the result is correct.

The same behavior occurs with IFort 11.1.70. I believe that this is not a consequence of a bug in the debugger itself, but is caused by the failure of the respective Fortran compilers to emit the proper debug symbols.

Curiously, when I used the IFort 7.0 compiler and ran under the VS2010SP1 debugger, everything worked correctly. This observation reinforces the conjecture of the preceding paragraph.

This program, compiled with CVF 6.6 and run under the CVF/VS6 debugger, does not display the problems just described.
[fortran]program UseEntry integer :: a,b,c,p,q,r,u,v; a=3 b=2 call CRASH_N_BURN(a, b, c) print *,'C = ',C p=5 q=7 call ENTRYB(p, q, r) print *,'R = ',R u=11 call ENTRYC(u,v); print *, 'V = ',V end program UseEntry subroutine CRASH_N_BURN(a,b,c) implicit none integer :: a,b,c,p,q,r,u,v c=a+b return entry ENTRYB(p,q,r) r=p-q return entry ENTRYC(u,v) v=u*u return end subroutine crash_n_burn [/fortran]
0 Kudos
dondilworth
New Contributor II
2,697 Views
I agree completely with the assessment that support guys can save their time by simply recommending a fresh install. I had one say exactly that when I had already done the install the day before! "Do it again" he said. Idiot. They are certainly not saving my time.

Before I started over, I did an sfc \scannow. It found no problems. But I'm not as expert as the CS people, so I followed orders. I simply did not know any other way to proceed. And since the restore files were all corrupted, it looked like the damage was pretty widespread -- and not reinstalling could be asking for even more trouble.

It took the better part of three days to get to where I could test code again. Now, I'm not claiming to be totally innocent. Suppose I called my entry pont with a double-precision real variable, but the called program expects a REAL? It's not easy for the compiler engineers to think of every possible screwup like this, so maybe they missed one. But finding coding errors is precisely what the debugger is for, so one would assume they did their job. Also, the structure of the obj and other files depends on which compiler options are selected. There are way too many of those for a dummy like me to make sense of without serious research, and it is possible that some combinations work just fine. I sympathize with the debugger programmers, who have to test all possible screwups with all possible combinations. I'm actually not mad at them, since I can imagine being in their shoes.

I have removed the ENTRY points from the subroutine that crashed. I have a huge legacy code that worked with CVF, and there are tons of other ones elsewhere. It is not practical to remove them all.

I'm glad another poster found a debugger bug. I posted this thread mainly to convince the compiler programmers that their job was not finished yet. Evidently I'm not the only one who thinks so.
0 Kudos
SergeyKostrov
Valued Contributor II
2,697 Views
Quoting dondilworth
...I have a brand-new system, very little software installed yet, little on-line time, and a good antivirus program
installed (Zone Alarm).I go to the debugger, do a simple task, and it crashes...

First thing I would try, when doing arecovery of a "broken" computer system, is uninstall ofantivirus software. Did you try to reproduce
the problem when your system doesn't have the Zone Alarm installed?

Best regards,
Sergey
0 Kudos
Steven_L_Intel1
Employee
2,697 Views
We know that the debugger does not properly show dummy arguments for ENTRY points. However, that is not to say that using the debugger with ENTRY causes Windows to become corrupted. I will try to reproduce any kind of misbehavior with dragging and dropping.
0 Kudos
Bernard
Valued Contributor I
2,697 Views
I do not think that it was debugger's fault.Your app was executing entirely in user-mode space when misbehaving application's tread will be terminated when its exception handler can not be found.But we can not eliminate the situation when one of the Natiive API function within the call - chain caused an exceptionwhen executing in kernel space.For example calls to display driver when debugger wanted to display some values from the debugged process.
Bear in mind that sometime anti-virus inserts hooks and inline function's prolog patching to intercept the WIN API and Native Api function calls when it is performed unwisely in the kernel space the anti-virus can bring down the system.I have witnessed such a behaviour with Kaspersky AV. Did you save the BSOD crash dump?It couldbe very helpfull to pinpoint the problem.You can opent it with windbg and use command "analyze -v" to inspect crash dump.
0 Kudos
dondilworth
New Contributor II
2,697 Views
This is a very cogent reply, thank you. When I reinstalled everything, of course I also lost everything, so I cannot analyze the crash dump.

I am reluctant to try to reproduce the problem, as you can imagine. I don't play Russian roulette either, and the two are closely related.

I did not know that an antivirus program could get in the way. Just now I have reinstalled Zone Alarm free version, and yesterday I wanted to remove it since I cannot get printer sharing to work now. But it won't uninstall! There should be a law against programs that install themselves in a way that makes it impossible to uninstall them. So I cannot test anything with and without that program running.

BTW: my program was actually not running when the crash occurred. It was halted by the debugger, and I was trying to see the value of a variable. So the debugger was in charge at that moment. My program did not crash. The answers were screwy, which is why I wanted to diagnose things. The crash occurred after dragging, as explained above. Does that narrow things down?
0 Kudos
bmchenry
New Contributor II
2,697 Views
An important suggestion/recommendation:
Since your troubles seem to indictate thet you are not connected to a server and therefore do not have daily/weekly/monthly full backups of your system...
might i suggest...
1) buy a USB external drive, they go for $120 for 1 tB
2) get some backup software, i use Acronis True Image, $50?
3) as a minimum once a week or before adding anything major back up your computer.
The windows backup/restore is a bear and it appears didn't work for you.
doing a full image backup permits you to fully restore your computer if things hit the fan.

As far as Zone Alarm, i have and do use Zonealarm without incident.
I do not use the freebie version.
Go to their forum and ask them how to un-install the freebie version.
it shouldn't be a problem.
you can also simply turn the program off (no anti-virus, no program control, etc)

brian
0 Kudos
Bernard
Valued Contributor I
2,697 Views
dondilworth
Crash dump file would have been very helpful in your situation or even the BSOD stop code.Without this it is almost impossible to track down and findthe problem.
As I stated earlier in my post AV software by its design can sometimes bring down the system.Beacuse it implements kernel modules which patche various crirtical system structures like IDT table SSDT table or they install filter drivers above function drivers to intercept IRP's flow.Even mouse can be hooked by SetWindowsHookEx() in the user mode or by filter driver or even by IDT handler for mouse or keyboard.This explains the mouse event generated by moving your mouse it can be intercepted and tracked.
Regarding the uninstallin problem bear in mind that even here AV can place so called IAT hooks in msi.dll dll
which is responsible for installing/uninstalling apps.
Try to reproduce the problem even on virtual machine because in order to understand what has happend we need to see debugger's kernel-mode stack.
0 Kudos
JohnNichols
Valued Contributor III
2,697 Views
In terms of backup, a pair of hard drives in raid mode used for data only has saved me on several occasions when the OS took a nose dive.

JMN
0 Kudos
bmchenry
New Contributor II
2,697 Views
raid mode is great but does you 'data only' include ALL system data?
"data only" (meaning your data and not all programs, settings, etc) doesn't cure the issue that if Windows decides to self implode, "data only" won't restore the system.
I recommend full and incremental backups of the entire system with regularity.
First for data integrety and then so you can step back a day or so in the event a virus or antivirus software or system issue brings down the drive/system.
now having said all this my system will probably self-immolate aka crash n burn just to demonstrate that my system isn't properly archived! ahhhh...the joys of modern computers.

some may also recommend backing up 'to the cloud' but that reminds me too much of the old time share dayz which i thought PC's got us away from?
current wireless is mucho faster than 300/1200 baud but of course the amount of programs, setting and data needing to be backed up is exponentially larger too!

Perhaps this has run afoul of the initial topic of this thread..then again, maybe not since the original author could have avoided a lot of wasted time on reinstallation, etc with proper and regular backups!

0 Kudos
deanserious
Beginner
2,697 Views
I have to agree with you. I would actually go so far as to recommend to regularly (as in, once a year) format your HD. Not only does this preserve the performance of your computer, getting rid of useless software and/or files that might have piled up in the meantime (including malware and such), but you will also be forced to back up at regular intervals.
As for cloud vs. external HD, I wouldn't know. I think cloud is fairly reliable these days - when in doubt, I would probably use a mix of the two.
0 Kudos
Steven_L_Intel1
Employee
2,697 Views
I tried the exact set of tasks you outlined earlier. Nothing untoward happened, other than the dummy arguments for the entry not being visible in the debugger. As I mentioned earlier, we know about that. I dragged the name into the Watch pane. It told me the variable was undefined (same issue), but otherwise everything behaved fine. The debugger was still responsive and Windows was behaving normally.

I maintain that the debugger did not crash nor corrupt your Windows system - at least not with the software Intel and Microsoft provides. The more likely explanation is that you had some existing corruption that needed a triggering event. Disk corruption is likely.
0 Kudos
JohnNichols
Valued Contributor III
2,626 Views
Backup:

I have two computers configured the same, if I lose one I move to the other while I fix the first one. Expensive, but a life saver.

JMN

0 Kudos
Reply