Community
cancel
Showing results for 
Search instead for 
Did you mean: 
mdellerus
Beginner
122 Views

Binary Reproducibility in C/C++ Builds

I am looking at moving from a Linux/GCC environment to a Windows environment.
One of the requirements that I have in binary reproducibility: a set of C/C++ code must compile to the exact binary image on multiple machines at multiple times.
Using Linux/GCC this is do-able.
Apparently, this is not easily done using the default compiler for Microsoft Visual C++.
Does the Intel Compiler have known support for or known issues with binary reproducibility?
0 Kudos
4 Replies
Brandon_H_Intel
Employee
122 Views

I would assume for Linux* with gcc that you are using -static or something similar? Or is using a static library runtime not necessarily a requirement in this case?
TimP
Black Belt
122 Views

I'm not certain I understand the requirement you allude to. Do you mean a requirement to take the same code path in math libraries on various platforms?

The option -fimf-arch-consistency=true was first implemented for linux in the 12.0.4 compilers. The corresponding option /Qimf-arch-consistency:true was implemented on Windows in earlier Compiler XE versions. This option can produce a slowdown in the svml vector math library, but it should still be at least as fast as the gcc library. If you find the documentation of this option unsatisfactory, you might request attention to that issue.

Of course, you would choose a single code path compile option, such as default -msse2 (Windows /arch:SSE2) similar to the one you use with gcc, and you would likely set equivalent optimization and standards compliance levels, such as /fp:source (Intel ICL /fp:source resembles Microsoft CL /fp:fast). If you set a compile option /Qxhost (similar to gcc -march=native) you are requesting that the code generation is set specific to the compile platform.

mdellerus
Beginner
122 Views

Quite simply, I always want the same .dll or .exe from the same inputs.

Under Linux/GCC, we need to strip symbols from the target and make sure to either not use unnamed namepaces or to specify -frandom-seed.

However, the VC compiler, based on inputs from MS, does not guarentee the same code generation form the same source. This makes binary reproducibility impossible using their compiler.

In some regulatory environments, this is unacceptable as they expect to be able to generate the exact same binary image from a given set of source code before they will approve the product. This is only reasonable since if it cannot be reproduced exactly it could be assumed that the code provided is not the same code used to generate the product being approved for use. (Think medical, monetary, safety, etc. and you can see why such a thing can be required.)

My question therefore is whether or not the Intel compilers, with whatever native options or simple tools (like the "strip" command in Linux) can be used to guarantee binary reproducibilty.

Thanks for you assistance so far and I look forward to hearing from both of you again.
mecej4
Black Belt
122 Views

I think that I understand your requirement.

If source codes SA and SB are compiled into executable files (or DLLs) XA and XB, respectively, using exactly the same compiler tool-chain but

(i) at different times

AND/OR

(ii) on a different development computer,

you require that if SA and SB match perfectly, except perhaps for syntactically insignificant differences such as comments and whitespace, then XA and XB should match byte for byte.

There is at least one problem in the way, but one that can be overcome. The PE32/PE32+ file format specification requires a 4-byte time-stamp at byte-offset 000000F0 in the EXE file. If you use a binary file comparison utility that overlooks this difference, or you use the MS utility FC.EXE and ignore the part of the output that resembles this:
[bash]Comparing files parts3.exe and XX.EXE
000000F0: D0 EE
000000F1: 17 12
[/bash]
you will have almost achieved your goal. You could, instead, use a utility that overwrites bytes 00F0 to 00F3 of the EXE file with 00 or some other fixed fake value.

There is, however, a big loophole in all this. If your EXE uses any DLLs you will only have an illusion of reproducibility unless you also ensure that all the DLLs used also match; likewise, you have to ensure that there are no uninitialized variables, subscript overruns and other common errors whose effects are often unpredictable.

Finally, if Microsoft changes the PE file specification some day, you will have to revise your builds to suit.

Binary reproducibility of the type discussed in this thread was first achieved (as far as I casually remember) in 1990 by a Bell Labs team who set out, in their own words, to achieve this impressive feat:

"As a debugging aid, we sought bit-level compatibility between objects compiled from the C produced by f2c and objects produced by our local f77 compiler. That is, on the VAX where we developed f2c, we sought to make it impossible to tell by running a Fortran program whether some of its modules had been compiled by f2c or all had been compiled by f77."
Reply