Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
7699 Discussions

ICC 10.0.023 generate incorrect optimized code


The software my coworkers and I develop uses the following (well-known) macro to create new strings:

#define memNewStr(S) strcpy(memNew(char,strlen(S)+1),S)

where (for simplicity) memNew is defined as

#define memNew(T,N) (T*)malloc(N*sizeof(T))

(Our actual memory manager is more complicated.)

Under the version 9 compilers (9.1.045 and 9.1.052) we never had any problem with the memNewStr macro. The 10.0.023 compiler, however, will generate incorrect optimized code for the macro in some cases.

Unfortunately, there's not a simple 20 line program that shows the error. (My hunch is that the code has to be sufficiently complex to "fool" the version 10 optimizer.) I have attached a longer but short enough to understand program that does demonstrate the error. (memNew is defined in this code as above.) The attached tar file example.tar contains:


buildit is a short, dumb c-shell script that builds the executable for the 9.1.045 or the 10.0.023 compiler assuming the compilers are installed at /opt/intel/cc/9.1.045 and /opt/intel/cc/10.0.023. buildit takes two arguments, a version number (9 or 10) and an optimization level (g, O0, O1, O2, or O3), e.g.,

buildit 9 O3

The executable is named "bogus". Correct execution of bogus is:

%coot 308: bogus
inside dumTrunFixVars, varName0[iVar] = seed
inside dumTrunFixVars, varName0[iVar] = x_coordinate
inside dumTrunFixVars, varName0[iVar] = y_coordinate
inside dumTrunFixVars, varName0[iVar] = z_coordinate
inside dumTrunFixVars, varName0[iVar] = time
inside dumTrunFixVars, varName0[iVar] = visc_1
inside dumTrunFixVars, varName[iVar] = seed
inside dumTrunFixVars, varName[iVar] = coordinate
inside dumTrunFixVars, varName[iVar] = time
inside dumTrunFixVars, varName[iVar] = visc_1
after dumTrunFixVars, trunHd->ppsVarName[iVar] = seed
after dumTrunFixVars, trunHd->ppsVarName[iVar] = coordinate
after dumTrunFixVars, trunHd->ppsVarName[iVar] = time
after dumTrunFixVars, trunHd->ppsVarName[iVar] = visc_1

The code executes correctly for version 9 at all optimization levels and version 10 for g, O0, and O1. For O2 and O3, however, the last four lines of output show garbled strings, and glibc detects corrupted memory when "free" is executed:

%coot 305: bogus
after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0
after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0
after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0
after dumTrunFixVars, trunHd->ppsVarName[iVar] = ^P^P0
*** glibc detected *** bogus: double free or corruption (fasttop): 0x0804d048 ***
======= Backtrace: =========

Any insights into what is happening and why would be appreciated.

Some answers to questions you may have:

1) What OS's are we running?

A: Linux. This error was detected in-house under Ubuntu 7.10 and Fedora 5.

2) Why such old compilers and OS's ?

A: We're a commercial outfit, we're cautious about upgrading, and we have users running old OS's as well.

3) Why not use

#define memNewStr(S) strdup(S)


A: We want to be able to use our own memory manager.

4) Why not replace the macro memNewStr with a function memNewStr that does the same thing?

A: We could do that, and we probably will do that. Doing so does indeed fix the problem. We want to understand what's going on, however. In particular, we want to know if this is indicative of more general problems with the version 10 compiler.

5) This sure smells like a memory overwriting problem. Have you run valgrind and/or used electric fence to check for memory overruns?

A: Ad nauseum. With the version 9 compiler, all optimization levels, and the version 10 compiler, optimization levels g, O0, and O1, valgri or O3 electric fence detects a seg fault at the first call of the "after dumTrunFixVars" printf statement. Similarly, valgrind detects errors only for version 10, O2 or O3, and inside C library calls emanating from the same printf. These errors are most likely due to the corrupted memory returned by memNewStr.

6) Couldn't you have made a simpler example code?

A: I tried, but after a certain amount of simplification the error goes away.

7) Isn't that a clue as to where your problem is?

A: I'm 99.9% certain it's not. For example, in routine dumTrunFixVars, there are two calls to dumSetError that never execute. If these calls are commented out, the code runs correctly in all cases. Simplification in other ways makes the error go away as well.

8) That sounds like a memory overwriting problem.

A: See 5) above.

9) Did you try alternate variant definitions of memNewStr?

Yes. The following all qualify as "superstitious" examples, i.e., they shouldn't make any difference, and, indeed, they did not:

#define memNewStr(S) (strcpy((malloc(sizeof(char)*(strlen(S)+1))),S))
#define memNewStr(S) strcpy(((char *)malloc(sizeof(char)*(strlen(S)+1))),S)
#define memNewStr(S) (strcpy(((char *)malloc(sizeof(char)*(strlen(S)+1))),S))
#define memNewStr(S) strcpy((char *)malloc(sizeof(char)*(strlen(S)+1)),S)
#define memNewStr(S) (char *)strcpy((char *)malloc(sizeof(char)*(strlen(S)+1)),S)
#define memNewStr(S) (char *)(strcpy((char *)malloc(sizeof(char)*(strlen(S)+1)),S))
#define memNewStr(S) ((char *)(strcpy((char *)malloc(sizeof(char)*(strlen(S)+1)),S)))

Rick Pember

0 Kudos
4 Replies
Black Belt
I think the problem relates to changes where sizeof(T) is unknown at the point of the optimization.

Try this:

inline T* memNew(size_t N)
   return (T*)malloc(N*sizeof(T)); // replace malloc with your allocator

#define memNewStr(S)        strcpy(memNew(strlen(S)+1),S)

This will defer inline optimization until after sizeof(T) is known.

Also consider creating a template "memNewCopy" thatprovides a generic new and copy operation.

I haven't tested the above code. I've experienced similar problems where sizeof(X) is unkown at some places in optimization.

Jim Dempsey
Thanks. I'm going to have to dust off my C++ skills to try it -- even though we use icc, we're a strictly C shop.

What puzzles me though is why the version 9 compiler optimizes the code I attached correctly but the version 10 does not.
Black Belt
This comment does not bear directly on the optimizer bug that you have brought up but, surely, part of the lore one finds in old C shops tells us that the macro definition

#define memNew(T,N) (T*)malloc(N*sizeof(T))

creates a bug waiting to be born. Please use

#define memNew(T,N) (T*)malloc((N)*sizeof(T))

instead. (Hint: what will happen with your original macro if it is invoked as, say, 'memNew(float,m+2)' ?
Good point, but I think the actual posted test case has the required parens. I do see the described behavior (or misbehavior) with the 10.0 compiler, but it works as desired with the current compiler, 11.1. If feasible, I would recommend updating to the latest version.