While I haven't looked at the

Brian_Murphy · ‎09-16-2016

I'm using visual studio.

I added a few lines of code to one of my fortran files, and rebuilt the entire solution (debug/win32), but visual studio won't let me set breakpoints on any of the new lines. Where am I missing? I did a clean of the solution, and then rebuilt, but this didn't help.

Steven_L_Intel1 · ‎09-16-2016

What happens when you try to set the breakpoint?

mecej4 · ‎09-16-2016

Have you checked whether you have two copies of the source file, one that you are editing and the other a member of the project source file set?

Brian_Murphy · ‎09-16-2016

I can set breakpoints on lines that were originally in the file. And those work as expected. I have added a short block of three new lines. Before I start running the code, I can set breakpoints on those lines. But when the code is running, those breakpoints disappear. If I try to set a bp on one of those lines while the code is paused in the debugger, the bp that is set is at the next original line. I can't get a bp to stick on one the new lines I added.

As far as I can tell, there is only one copy of this file in the project. There are some other files that have multiple copies because this project has two implementations of BLAS in it. But this breakpoint problem doesn't involve those files. Somehow, I must have either screwed something up, or I'm missing a crucial step in the rebuild process.

mecej4 · ‎09-17-2016

One more item to check: make sure that optimization is turned off for the problem source file. If optimization is in effect, the correlation between source line number and instruction address is rather weak and the highlighting of breakpoints and current line can be confusing.

Steven_L_Intel1 · ‎09-17-2016

Try doing Build > Clean. Delete the Debug folder, then rebuild. I wonder if the PDB file got corrupted somehow.

jimdempseyatthecove · ‎09-17-2016

Or, the build failed or produced output into a different folder than that which the debugger used. I (an experienced user) fell into this only yesterday. I was building Debug version, yet debugging Release version. Clean (as Steve suggests), often points you to this conundrum.

Jim Dempsey

jimdempseyatthecove · ‎09-17-2016

Also, re: mecej4,

With optimizations enabled, if the new code is never entered/called by the program, the optimization will likely remove it as "dead code".

If the code is inlined (requires optimization), it may be difficult for the line numbers to be preserved. You can medicate this with inserting:

IF(RuntimeExpressionThatCannotBeDeterminedAtCompileTimeButKnownToBeFalse) CALL YourErrorRoutine() ! never called

With the above, the compiler cannot optimize out the statement, and the call will remain in the code (but never be called). The line number on which that statement occurs becomes breakable.

Jim Dempsey

Brian_Murphy · ‎09-17-2016

The lines I added were basically do nothing lines added specifically so I could set a breakpoint to pause execution if a variable were zero.

          if( np.eq.0 ) then
              np = np
          endif

I could not set a bp on the np=np line. Optimization is turned off (/Od).

I changed this to

          if( np.eq.0 ) then
              write(6,*) np
          endif

And now I can set a breakpoint that works as it should. So it seems the optimizer was enabled, even though it was supposed to be turned off. So maybe the pdb file has become corrupt. I put the np=np back in and deleted the pdb file. A new pdb was generated on the next build, but still could not break on that line.

Anyhow, with the write statement I can get it to break, so I'm back in business. The code I'm debugging is the Arnoldi eigensolver ARPACK by Dan Sorenson at Rice University. This is really good code, but I've encountered a case where it's overstepping bounds of one of its own arrays. It's dynamically changing the value of a variable which it also uses as an array size for the array that is going out of bounds. I'm not too optimistic I can find the root cause and fix it, but I'm giving it a try.

IanH · ‎09-17-2016

Sounds like there is a mismatch between the time stamp of the file that you are viewing and the time stamp of the file that the debug information considers relevant to the executable or DLL that you are debugging.

Make sure that you are actually debugging the EXE or DLL that you have just built.

mecej4 · ‎09-18-2016

Brian Murphy wrote:

The code I'm debugging is the Arnoldi eigensolver ARPACK by Dan Sorenson at Rice University. This is really good code, but I've encountered a case where it's overstepping bounds of one of its own arrays. It's dynamically changing the value of a variable which it also uses as an array size for the array that is going out of bounds. I'm not too optimistic I can find the root cause and fix it, but I'm giving it a try.

If the bug originates in some mismatch in the expected argument list (type, size, order, IN/OUT status, etc.) that would be normal. If the errors that you mention are within the ARPACK code itself, and cannot be attributed to a compiler bug, that would be a more serious situation. ARPACK is mature code, and such things should not happen.

Please provide details on how to reproduce the error, and mention whether you are using the sources from Netlib, Rice or elsewhere.

Brian_Murphy · ‎09-18-2016

The version I have seems to be

SID: 2.10 DATE OF SID: 08/23/02 RELEASE: 2

I think this came from http://www.caam.rice.edu/software/ARPACK/ where there is a download of a patch.

I poked around the Netlib site, but couldn't find a working link to source code.

Later this week, I will make available my calculation case that causes ARPACK to overstep array bounds.

Brian_Murphy · ‎09-19-2016

I have been using ARPACK quite a bit since about the year 2000, and it performs extremely well. Much better than anything else I have used. Only on rare occasions does it have trouble. This particular case is one example.

dnsimp.f is the ARPACK driver I have adapted for my application. I use pardiso to factor the matrix, and then arpack's "LM" option to compute largest eigenvalues. It calls dnaupd in a loop, which in turn calls dnaup2. The documentation in dnaup2.f says that the input argument NP is adjusted dynamically, but this variable is also used to declare the array size of another argument V(:,:) used for eigenvectors. This seems like asking for trouble. dnaup2 calls dnaitr, and for this case eventually calls dcopy with a column index into the V() array that is greater than its declared upper bound.

In this particular instance, the problem size is n=90 (i.e. 90x90 matrix), the number of eigenvalues being computed is nev=6, and the number of vectors used for the calculation is ncv=14.

With different inputs like nev=8 and ncv=20, the case runs fine.

It also runs fine with nev=4 and ncv=10.

If anyone would like the matrix to run their own tests, let me know and I will save it to a file. In the meantime I will re-check the calling arguments to look for potential problems.

jimdempseyatthecove · ‎09-19-2016

While I haven't looked at the code, the symptoms sound familiar, where the code is using the incorrect N.. (nev, ncv, nwhatever) and that the incorrect number, when supplied larger than necessary is benign (and produces the correct results), when smaller, it may produce results that may be mistaken for correct. Your dimensions are one that exposes the problem.

I suggest compiling with gen/warn interfaces and running with array bounds checking enabled. See if something pops up.

Note. Many of these old library codes were developed on systems with relatively small memory capacity and with working storage in COMMON blocks, with the dimensions of the arrays "larger than anything that could possibly run on the small memory capacity system". Thus when porting to todays memory capacity systems, the code is found (exposed) to be incorrect.

Jim Dempsey

mecej4 · ‎09-19-2016

Brian Murphy wrote:

The documentation in dnaup2.f says that the input argument NP is adjusted dynamically, but this variable is also used to declare the array size of another argument V(:,:) used for eigenvectors. This seems like asking for trouble. dnaup2 calls dnaitr, and for this case eventually calls dcopy with a column index into the V() array that is greater than its declared upper bound.

You are misreading the Arpack documentation -- there are no arrays with dimensions (:) or (:,:) in the Arpack sources. The code is entirely F77, and all arrays are allocated from a few big workspace arrays whose sizes are fixed at compile time. This hoary memory management scheme was standard for decades, and Arpack uses it in standard fashion.

I ran the Arpack DNSIMP example with checks turned on, and there were no array overruns.

If you think that array bounds are being overrun in your example, please provide your modified version of dnsimp.f (and any data files if you read the matrix from them).

Brian_Murphy · ‎09-20-2016

In reply to #14. That is indeed what is happening. With array checking turned on, the code aborts and identifies the variable, the array index that is out of bounds, the relevant line of code, and corresponding call stack. That's how I found out what is going on.

Brian_Murphy · ‎09-20-2016

In reply to #15, I used (:,:) just to indicate a 2d array. The code is otherwise F77 style.

I have written the arrays to files, and created a stripped down visual studio project that recreates the error at the same place after roughly 130 iterations in DNSIMP. The attached zip contains the stripped down VStudio project, but without the ARPACK library. On my system, ARPACK is in the form of four files (SRC.LIB, UTIL.LIB, BLAS.LIB and LAPACK.LIB). These four were built from downloaded code from the Rice web site.

524743

Briefly, my rendition of DNSIMP is used to compute the largest eigenvalues for (K^-1*M). K and M are read from files in sparse form. Pardiso factors K. The A*v operator needed by Arpack performs (K^-1*M)*v

If you want to try running the code, open the project, add what you already have for ARPACK, and see if it runs. If it does run, does it crash at line 448 of dnaitr.f and say subscript #2 of array V has value 15 which is greater than the upper bound of 14 ?

mecej4 · ‎09-20-2016

There are a couple of major errors in your program:

the arrays alfa_r, alfa_i, z_r, z_i are never declared in the main program, but are passed as (implicit) scalars to (but used as arrays in) MKL_EIGS_SPARSE_INPUT(). You could have trapped this error yourself if you had used IMPLICIT NONE in the main program. Using unallocated memory in this fashion is very likely to cause program crashes. Perhaps, this is an error only in the cut down test code?
the argument ierr_temp to Pardiso is incorrectly declared as LOGICAL, instead of INTEGER.
the subscript bound error is largely self-inflicted. The array v is allocated as v(nx,ncv+2) after reducing ncv to the next lower even integer (I don't understand the reasons for this) near Line-100. However, the argument passed to Arpack for the second dimension of v is ncv, which causes the spurious array bound errors that you saw.

After correcting the above errors, I obtained the following results for alfa_r and alfa_i and the program ran with no errors after having been compiled with /check:bounds :

1 -9.999950D+02 0.000000D+00
2 -1.000005D+03 0.000000D+00
3 -1.000000D+03 -1.101442D+04
4 -1.000000D+03 1.101442D+04
5 -1.000000D+03 -1.423725D+05
6 -1.000000D+03 1.423725D+05

In conclusion, there no bugs found in Arpack, but there were errors in using Arpack and Pardiso.

Brian_Murphy · ‎09-20-2016

Thank you for the reply, mecej4, and for running my code. If my pain is indeed self-inflicted, it wouldn't be the first time, or the last.

I tried putting in the corrections that you listed, but the code still aborts with exactly the same error. I must have done something different in my revisions than what you did. I simply set ncv=14 and allocated the v array as v(nx,ncv).

When the case ran successfully for you, were the input values nev=6 and ncv=14 ? If not, please run it with these specific values. The ARPACK documentation recommends that ncv be at least two times nev, which it is. But 6 and 14 could be a unique combination that triggers the bounds error.

In case you want it, here is the code with my revisions: 524757

mecej4 · ‎09-20-2016

Here is the modified version of pardiso.f that I ran. I did not change the problem sizes at all. I made some changes that were, strictly speaking, unnecessary. For example, I removed the Pardiso status array pt() from the argument lists of your subroutines and put it into a module.

I have not yet seen the modified files that you posted in #19.

Brian_Murphy · ‎09-20-2016

I tried replacing my pardiso.f with yours, but visual studio would not build it. I think I need to change something in visual studio, but I don't know what. Here is the build log. 524759

In your pardiso.f, there are these two lines:
ALLOCATE ( d(nx,3), v(nx,ncv+2) ) !add +2 09/15/2016
ncv=ncv+2 !CNXS

I think this makes ncv equal to 16 for the subsequent call to ARPACK. If so, I expect the code will run ok. Please change these to this:
ncv = 14
ALLOCATE ( d(nx,3), v(nx,ncv) )

Does it is still run ok?