Under the version 9 compilers (9.1.045 and 9.1.052) we never had any problem with the memNewStr macro. The 10.0.023 compiler, however, will generate incorrect optimized code for the macro in some cases.
Unfortunately, there's not a simple 20 line program that shows the error. (My hunch is that the code has to be sufficiently complex to "fool" the version 10 optimizer.) I have attached a longer but short enough to understand program that does demonstrate the error. (memNew is defined in this code as above.) The attached tar file example.tar contains:
buildit bog.h bogus.h dum.c dum.h main.c
buildit is a short, dumb c-shell script that builds the executable for the 9.1.045 or the 10.0.023 compiler assuming the compilers are installed at /opt/intel/cc/9.1.045 and /opt/intel/cc/10.0.023. buildit takes two arguments, a version number (9 or 10) and an optimization level (g, O0, O1, O2, or O3), e.g.,
buildit 9 O3
The executable is named "bogus". Correct execution of bogus is:
The code executes correctly for version 9 at all optimization levels and version 10 for g, O0, and O1. For O2 and O3, however, the last four lines of output show garbled strings, and glibc detects corrupted memory when "free" is executed:
======== %coot 305: bogus ... after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0 after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0 after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0 after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0 *** glibc detected *** bogus: double free or corruption (fasttop): 0x0804d048 *** ======= Backtrace: ========= /lib/tls/i686/cmov/libc.so.6[0xb7dcdd65] /lib/tls/i686/cmov/libc.so.6(cfree+0x90)[0xb7dd1800] bogus(dumInitPps+0x20f)[0x80497a9] ... =========
Any insights into what is happening and why would be appreciated.
Some answers to questions you may have:
1) What OS's are we running?
A: Linux. This error was detected in-house under Ubuntu 7.10 and Fedora 5.
2) Why such old compilers and OS's ?
A: We're a commercial outfit, we're cautious about upgrading, and we have users running old OS's as well.
3) Why not use
#define memNewStr(S) strdup(S)
A: We want to be able to use our own memory manager.
4) Why not replace the macro memNewStr with a function memNewStr that does the same thing?
A: We could do that, and we probably will do that. Doing so does indeed fix the problem. We want to understand what's going on, however. In particular, we want to know if this is indicative of more general problems with the version 10 compiler.
5) This sure smells like a memory overwriting problem. Have you run valgrind and/or used electric fence to check for memory overruns?
A: Ad nauseum. With the version 9 compiler, all optimization levels, and the version 10 compiler, optimization levels g, O0, and O1, valgri or O3 electric fence detects a seg fault at the first call of the "after dumTrunFixVars" printf statement. Similarly, valgrind detects errors only for version 10, O2 or O3, and inside C library calls emanating from the same printf. These errors are most likely due to the corrupted memory returned by memNewStr.
6) Couldn't you have made a simpler example code?
A: I tried, but after a certain amount of simplification the error goes away.
7) Isn't that a clue as to where your problem is?
A: I'm 99.9% certain it's not. For example, in routine dumTrunFixVars, there are two calls to dumSetError that never execute. If these calls are commented out, the code runs correctly in all cases. Simplification in other ways makes the error go away as well.
8) That sounds like a memory overwriting problem.
A: See 5) above.
9) Did you try alternate variant definitions of memNewStr?
Yes. The following all qualify as "superstitious" examples, i.e., they shouldn't make any difference, and, indeed, they did not:
Good point, but I think the actual posted test case has the required parens. I do see the described behavior (or misbehavior) with the 10.0 compiler, but it works as desired with the current compiler, 11.1. If feasible, I would recommend updating to the latest version.