Vectorization problem

dajum · ‎06-17-2008

Hi,

The following code crashes on an access violation:

SUBROUTINE DOSUM3(PSN,PNG,SUMGL,SUMGLT,SUMGR,SUMGRT,
. SUMGRL)
INTEGER PSN,PNG,NPT,K
REAL SUMGL,SUMGLT,SUMGR,SUMGRT,SUMGRL,GV,TNT,T1,TNT2

... code omited

T1= T(PSN)
DO 40 K = 1,NRAD(PSN)
GV= G(PG(PNG))
NPT= PT(PNG)
SUMGR= SUMGR + GV
IF(NPT .GT. PSN) THEN
TNT= T(NPT)
ELSEIF(DODP)THEN
TNT= SNGL(DXTRA1(NPT))
ELSE
TNT= EXTRA1(NPT)
END IF
TNT2= TNT*TNT
SUMGRT= SUMGRT + GV * (TNT2 * TNT2)
SUMGRL= SUMGRL + GV * (TNT2
. + T1*T1) * (TNT + T1)
PNG= PNG + 1
40 CONTINUE

It crashes on the line TNT=T(NPT). But it only crashes if that loop gets vectorized. I can easily turn off vectorization, but I would like to understand why the compiler is crashing. I have equivalent loops all over the program so I don't want to just globablly turn off this optimization.

Can anyone explain why this crashes and if there is something wrong with my code or the compiler?

Thanks,

Dave

Steven_L_Intel1 · ‎06-17-2008

Not enough information. This is the sort of issue that requires a buildable and runnable example. I'd suggest in this case that you submit an issue to Intel Premier Support and let us investigate.

dajum · ‎06-17-2008

Steve,

This is incredibly hard to whittle down to a sample that will bomb. And the data that feeds this particular model isn't something I can send out. I have tried to figure it out, but I can't put anything in the loop like write statements in the loop cause then it doesn't get vectorized. Is there a strategy I can employ to look at this myself?

Dave

dajum · ‎06-17-2008

Let me also point out that I suspect he problem is that the index into the arrays get incremented in the loop so I'm not sure how it can do this in parallel. The NPT index comes from the PT arrays which is indexed from PNG, which gets incremented inside the loop.

Dave

Steven_L_Intel1 · ‎06-17-2008

Perhaps you can do this. In the caller to this routine at the point where it would fail, write all the arguments out to a file. Then write a simple "driver" program that reads this data in and calls the routine. No exposure of the rest of the program nor the application's input data.

jimdempseyatthecove · ‎06-17-2008

Dave,

Setup your test run such that it will get the memory access problem. Then open a dissassembly window (Debug, Windows, Dissassembly), assuming you debug location is at the TNT = statement. You should be able to select and copy a portion of (or extended range of) thedissassembly window for us to see and examine. Also, if you can grab a copy of the Registers window as well.

From looking at the source code of the loop it does not look like it is vectoriable. I suspect T or NPT is (are)invalid. Looking at the registers and dissassembly might shed light on what is happening.

And, what does the debugger repor for the address of T as well as value of NPT. To get address of T enter it into a Memory window.

Jim Dempsey

g_f_thomas · ‎06-18-2008

An

IMPLICIT NONE

would do no harm. Also check for unused/uninitialized variables.

Where do G and DODP come from?

Gerry

dajum · ‎06-18-2008

I left out some code that would only obscure what is important I think. A few include statements that have common blocks that define things like DODP and G, T, and all the other arrays. There aren't any uninitialized variables, or undeclared variables.

I find that I probably won't be able to use Steve's suggestion to try running it alone. It may work so I'll forge ahead, but while writing out all the data needed. I found that the problem appears to happen during a call when PNG get incremented inside the routine 64 times more than the sum of NLIN(PSN) and NRAD(PSN), the loop ranges. They are 5 and 116 and it returns PNG being incremented by 185. But it doesn't happen during every call,. happens enough that it goes off the end of the arrays and crashes. But all the values are consistent enough that this shouldn't happen. I can see differences when I run it in my 32 bit version with the same input. Generates identical output until it gets to the point where it addes the extra loops in the vectorized version.

I have attached the assembly listing for the routine. Hopefully this helps.

dajum · ‎06-19-2008

I was able to boil this down to a small sample and submit it to Premier Support. It took quite a while, since it only seems to fail if the subroutine is compiled in its own source file. Include it with the mail and it seems to run fine. Hopefully a work around for this problem, (besides not turning on optimization) can be found soon.

Dave

jimdempseyatthecove · ‎06-19-2008

Dave,

I looked at the ASM file and the program loop is not vectorized as suspected. But this does not mean different code was generated with the vectorization options. What was not include with your ASM file was the register dump and an indication of where in the code the crash occured. The ASM file does not include the program counter information during run time as would a copy and paste of the Dissassembly window.

Jim Dempsey

jimdempseyatthecove · ‎06-19-2008

Dave,

You can insert compier directives into the source code to disable vectorization

cDEC$ NOVECTOR

In the case of your DO loop, vectorization will not help anyways.

I have two loops in my code that I have to disable too. Although in my case the program does not crash rather different results are produced between the vectorized code and the non-vectorized code. And I am unable to produce a simple test case for Premier Support to use. It took a long time to discover the bug as the source code statements were without error.

Jim Dempsey

dajum · ‎06-19-2008

Jim,

Thanks for you efforts. My small sample doesn't crash, it just gives bad results. I found this because it was crashing at another point when it goes beyond the end of the arrays with the indexes. But the indexes are wrong because the compiler can't seem to increment the index correctly. I would use your suggestion, except that this same construct appears in hundreds (probably more than a thousand) of places throughout the code. So analyzing each one to see that the COMPILER hasn't screwed up really isn't acceptable. I can understand code that isn't fast, but wrong isn't going to help.

Dave

dajum · ‎06-25-2008

Steve,

I submitted an issue (486586) to premier support last thursday, but have yet to hear anything meaningful back. I made a sample that shows the problem with what I think is the compiler. I really need a work around for my entire project. This bug only seems to surface in this routine sporadically, and since I have the similar construct in hundreds of places, I can't see adding NOVECTOR directive to each one. And turning off optimization isn't a great idea either. Can you suggest some other solution that will work for my entire project. I'd really like to see this resolved in the compiler, but I'll take a decent work-around for now.

Thanks,

Dave

jimdempseyatthecove · ‎06-25-2008

Dave,

Barring the fact that any time you add or remove code you run the chance of mucking up the optimizations try adding diagnostic code to a broken routine

T1 = T(PSN)
if(PSN .le. 0) goto 999
if(PSN .gt. size(NRAD)) goto 999
DO 40 K = 1,NRAD(PSN)
if(PNG .le. 0) goto 999
if(PNG .gt. size(PG)) goto 999
if(PNG .gt. size(PT)) goto 999
if(PG(PNG) .le. 0) goto 999
if(PG(PNG) .gt. size(G)) goto 999
 GV = G(PG(PNG))
 NPT = PT(PNG)
 SUMGR = SUMGR + GV
 IF(NPT .GT. PSN) THEN
if(NPT .le. 0) goto 999
if(NPT .gt. size(T)) goto 999
 TNT = T(NPT)
 ELSEIF(DODP)THEN
 TNT = SNGL(DXTRA1(NPT))
 ELSE
 TNT = EXTRA1(NPT)
 END IF
 TNT2 = TNT*TNT
 SUMGRT = SUMGRT + GV * (TNT2 * TNT2)
 SUMGRL = SUMGRL + GV * (TNT2
 . + T1*T1) * (TNT + T1)
 PNG = PNG + 1
40 CONTINUE
 goto 41
999 write(*,*) 'DebugThis'

Note, if in the loop you use "call DebugThis" instead of the GOTO 999
the optimizer will assume registers whacked by the subroutine.
Therefore break out of the loop on error condition. Put break point on 999

You might discover that something you assume is initialized is not initialized.

Jim Dempsey

Steven_L_Intel1 · ‎06-26-2008

Dave,

It looks as if you have some progress on this in Premier Support. I noticed the issue is related to /Op. Let me suggest you try /fp:precise as a preferable alternative.

dajum · ‎06-26-2008

Steve,

I have a few questions after trying this.

1. I get a warning when I used this "ifort: command line warning #10212: /fp:precise evaluates in source precision with Fortran." What does this mean? I couldn't find this warning in any documentation or a goggle search. It is incredibly annoying since all 3000 sources emit the warning.

2. It also gives an error:

ifort /nologo /Od /Og /Qunroll:35 /Qparallel /include:"..include" /include:"..includefluint" /assume:nocc_omp /extend_source:132 /assume:byterecl /fpe:0 /Op /iface:cref /iface:mixed_str_len_arg /module:"x64Release/" /object:"x64Release/" /traceback /libs:static /threads /c /names:lowercase /fp:precise /Qvc8 /Qlocation,link,"C:Program Files (x86)Microsoft Visual Studio 8VCinx86_amd64" "C:V51Ivf64procesdp2sp.for"
ifort: command line warning #10212: /fp:precise evaluates in source precision with Fortran.
ifort: command line error: use of '/fp:' option along with a floating point precision option not supported

I can't see what is wrong. I can take out /fp:precise and it goes away but that is what I need right?

3. In VS this doesn't appear as an option in any pull down I looked at. Yet /Op does and it says it is deprecated in 10.1. Is /fp:precise going to be supported via VS?

4. Can you explain what effects switching between these to options is going to have on the execution?

5. Is there any hope that the compiler will be fixed in the near future so that this incorrect execution will not occur?

Thanks,

Dave

Steven_L_Intel1 · ‎06-26-2008

Use /fp:source instead. Remove /Op. Yes, /fp will be supported in VS in the next major release.

In the past, we had a mish-mash of various options to control the floating point model, some of which had bad effects on optimization. /Op especially. In version 10 (or maybe 9.1, I forget), we created /fp (-fp-model on Linux/Mac) to simplify the various options and make them more meaningful. We also worked to limit optimization impact when using these options.

Patrick, through Premier Support, will investigate the issue and work with the developers on that.

Please read the documentation for details on the /fp option.

dajum · ‎06-26-2008

I see the problem. VS isn't removing /Op from the compilation command it issues to the compiler. The command line page in VS doesn't show /Op. The floating point page is set to "default Consistency" and not the /Op choice. Yet it won't drop that flag. Is there some way to get VS to drop the /Op? I tried restarting. NO Good. So I deleted the .sou file and that didn't work either. Is this a known problem?

Dave

TimP · ‎06-26-2008

/fltconsistency and /Op were synonyms; neither should be set when using /fp:

No doubt, the properties setting and documentation is confusing.

dajum · ‎06-26-2008

Just to be very clear. I can not get the compiler to use only /fp:precise. It adds /Op even though it is NOT set in the floating point page in VS (or anywhere else that I know of). VS2005 SP1 is being used andsince the project started with /Op being set it will not remove it. It doesn't matter what the inteface is set to. Getting this option turned offappears to be impossible using the UI.I have set /fp:precise via the command line additions, since that is what has been recommended. So I need help on how to get the compiler to not use /Op because I'm not setting it. This appears to be another bug I'm hitting.

Dave