Community
cancel
Showing results for 
Search instead for 
Did you mean: 
rudaho
Beginner
145 Views

How to debug numerical error with -O1 option?

Hi there~

I'm doing code refactoring with our existing product. The problemed code section is to do a lot of table look up. This includes many floating point operations. I found a very strange situation and after roll back all modification, the simple problem can be that : If I add a printf, or local variable in the function, ... surprisingly, the table look up value is different in some benchmark. This is a very strange situation, since the floating point operation does the same thing, nothing changed, just a new variable added, etc. I look at the assembly code, ..., mmm... there're many difference, but I guess I can't say that's the problem. I'm sorry that I can't post the source code or whatever here for company confidential. But does anyone can give some advice or hints about how to resolve this issues based on this limited information. Thanks

I'm using icc 10.1.011 with -O1 -fno-omit-frame-pointer options

Best Regards

Yi-Ju
0 Kudos
8 Replies
Om_S_Intel
Employee
145 Views

If the code segemt is not a hotspot then you can disable the optimization for the code segmemnt. You may use "#pragma opmization("off")" or use -O0 compiler option.
TimP
Black Belt
145 Views

Although turning off optimization may help confirm the nature of your problem, such behavior often comes about through programming errors such as uninitialized values (e.g. a forgotten static) or subscript over-runs. One would think you would want to make your program safer, rather than attempting to get past the error by compile options.
rudaho
Beginner
145 Views

Hi~

In fact, the code section is a hot spot of the engine. So I can't use -O0 to compile it. Also, I have use Intel inspector and purify to check the memory issue before. But none error found by them. So I think the variable uninitialization is not the cause. Is there any other way may cause the problem? Thanks...

Best Regards

Yi-Ju
jimdempseyatthecove
Black Belt
145 Views

>>If I add a printf, or local variable in the function, ... surprisingly, the table look up value is different in some benchmark.

Can you insert diagnostic code that can detect the problem without causing the problem to go away?

Sometimes it is quite difficult to craft diagnostic code that does not affect the problem (as well as not affecting the performance). This is often called a "Heisenbug" a play on words for Heisenburg Uncertanty Principle.

int BugTrap1 = 12345;
int BugTrap2 = 12345;
...
int main(...)
{
BugTrap2 = 0;
...
}

void YourFunction(...)
{
...
if(YourDetectionDetectsError)
BugTrap1 /= BugTrap2; // Divide by 0 error
...
}

Compile your code with assembler output (for source containing YourFunction)
Keep assembler file handy for when error occures. Note, you can place a break point on the

BugTrap1 /= BugTrap2; // Divide by 0 error

Assuming you can locate the instruction.

On trap or break, the listing may aid you in determining which registers have residual information.

Information of particular interests are:

Is the index into the table lookup table corrupted (invalid). Note, examine the register use in the index do not examine what a caller may have passed in for an index.

If index vaid, is the entry in the table corrupted? If entry in table corrupted, and repeatably corrupted, then get your program running with break on entry into YourFunction(...). Examine table entry, if correct, add Data break at location in table where corruption occurs. Remove Break at entry into YourFunction(...) and continue. However, if table corrupted at entry into YourFunction(...), then something outside of YourFunction(...) corrupted the table. In this case, set the break point earlier (e.g. in main(...)). Examine the table, if correct, add Data changeBreak point, and continue. With luck you will find out what is corrupting the table.

Note, Data Change break points (at least of VS 2005), are not remembered well from run to run. So, you may need to delete all break points, then add back the function entry break, then run to function, remove function break, then add Data Change break point.

If the index was invalid. Then you may have a coding error (and/or compiler bug). Assume first you have a coding error. In table dispatched codeyou may find that a coding error happens through chance to produce correct results with optimizations off, but incorrect results (perhaps intermittently) with optimizations on. Often this is a case of mis-using signed and/or unsigned and/or different bit-nessed variables (char, unsigned char, short, unsigned short, int, unsigned int, enum, etc...).

Jim Dempsey




Om_S_Intel
Employee
145 Views

It would be nice if you can share the testcase.
Brandon_H_Intel
Employee
145 Views

If it's conceivable that the problem might be with floating-point precision, I'd also recommend trying -fp-model precise or -fp-model strict and see if that helps.

I would also recommend submitting a Premier Support issue on this at https://premier.intel.com if you're not getting anywhere with the current suggestions.
rudaho
Beginner
145 Views

Hi~

If the binary is compiled with -g mode, then the problem dismissed. So, should I use -g -O1 to detect the problem? I'll test it then and see what's the difference. By the way, the program didn't crash, just some bit different with the order 1e-16 etc. Thansk...

Best Regards

Yi-Ju
timintel
Beginner
145 Views

I think the 10.0 compiler used x87 code by default at -O0, while of course the standard choice for x86_64 is SSE2. The x87 code in effect would promote all float expressions to double evaluation, an effect which you could retain by -fp-model double, if you didn't care about performance. -fp-model source, like the other -fp-model options mentioned in this thread, follows C standard, dropping only those optimizations which are expected to produce numerical differences.
Reply