- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A few months ago, I compiled libvpx with icc. The output looks good, but at my first attempt, the compiler almost froze the machine compiling it. In the test cases like `tests/dct32x32_test.cc`, there are so many unrollable for statements, and the compiler dying tried to unroll them all. The memory use of icc process kept growing, so I had to kill the make sequence before it could freeze the machine. In the end, I gave up on test programs and compiled the library without them.
void reference_32x32_dct_2d(const int16_t input[kNumCoeffs], double output[kNumCoeffs]) { // First transform columns for (int i = 0; i < 32; ++i) { double temp_in[32], temp_out[32]; for (int j = 0; j < 32; ++j) temp_in= input[j * 32 + i]; reference_32x32_dct_1d(temp_in, temp_out); for (int j = 0; j < 32; ++j) output[j * 32 + i] = temp_out ; } // Then transform rows for (int i = 0; i < 32; ++i) { double temp_in[32], temp_out[32]; for (int j = 0; j < 32; ++j) temp_in = output[j + i * 32]; reference_32x32_dct_1d(temp_in, temp_out); // Scale by some magic number for (int j = 0; j < 32; ++j) output[j + i * 32] = temp_out / 4; } }
I mean, is it really worth it to optimise this kind of code? I don't know if it's fixed at this point, but the growth of that memory use scared me. I don't have to think about the output binary it would have produced(Oh, maybe the compiler was just doing code branch prediction so it might not have had any effect on the size of the binary)
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page