- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Consider the following C++ code:
#include <malloc.h> #include <cmath> #include <complex> int main(int argc, char **argv) { int N = 4000000; double * _arr_4_0; _arr_4_0 = (double *) (malloc((sizeof(double) * (unsigned long) (5331.0)))); for (int _i0 = 0; (_i0 <= 5330); _i0 = (_i0 + 1)) { _arr_4_0[_i0] = std::sin(_i0); } double * _arr_7_7; _arr_7_7 = (double *) (malloc((sizeof(double) * (unsigned long) (((0.1 * (double) (N)) + -66.0))))); #pragma omp parallel for schedule(static) #pragma ivdep for (int _i0 = 0; (_i0 < ((N / 10) - 66)); _i0 = (_i0 + 1)) { _arr_7_7[_i0] = std::sqrt(_i0); } std::complex<double> * _arr_6_8; _arr_6_8 = (std::complex<double> *) (malloc((sizeof(std::complex<double>) * (unsigned long) (((0.1 * (double) (N)) + -5396.0))))); for (int o1 = 0; (o1 < (((N + 110) / 320) - 168)); o1 = (o1 + 1)) { int _ct167 = ((((32 * o1) + 31) < ((N / 10) - 5397))? ((32 * o1) + 31): ((N / 10) - 5397)); for (int o2 = (32 * o1); (o2 <= _ct167); o2 = (o2 + 1)) { _arr_6_8[o2] = (0.0 + 0.0j); } } #pragma omp parallel for schedule(static) for (int o1 = 0; (o1 < (((N + 110) / 320) - 168)); o1 = (o1 + 1)) { for (int o2 = 0; (o2 <= 166); o2 = (o2 + 1)) { int _ct168 = ((((32 * o1) + 31) < ((N / 10) - 5397))? ((32 * o1) + 31): ((N / 10) - 5397)); #pragma unroll_and_jam (6) for (int o3 = (32 * o1); (o3 <= _ct168); o3 = (o3 + 1)) { int _ct169 = ((5330 < ((32 * o2) + 31))? 5330: ((32 * o2) + 31)); #pragma ivdep for (int o4 = (32 * o2); (o4 <= _ct169); o4 = (o4 + 1)) { _arr_6_8[o3] = (_arr_6_8[o3] + (_arr_7_7[((5330 - o4) + o3)] * _arr_4_0[o4])); } } } } return 0; }
I compiled this using the following command (file saved as test.cpp):
icpc -O3 -qopenmp -qopt-report=5 -qopt-report-file=stdout test.cpp > optrpt
However, I get a warning on stderr which says:
test.cpp(38): (col. 7) remark: unroll_and_jam pragma will be ignored due to
There is no reason specified for why the pragma is being ignored. Could you please help me diagnose this?
icpc -V
gives
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.1.132 Build 20161005 Copyright (C) 1985-2016 Intel Corporation. All rights reserved.
This bug is also present on
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.2.174 Build 20170213 Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
Any suggestions on how to debug this would be appreciated.
Thanks,
Abhinav
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An update on the issue:
The warning is issued only when the unroll_and_jam pragma is used with tiled loops. If lines 31-48 are replaced with the code below, no warning is emitted and the outer loop is unroll-jammed.
#pragma omp parallel for schedule(static) #pragma unroll_and_jam (16) for (int _i0 = 0; (_i0 < ((N / 10) - 5396)); _i0 = (_i0 + 1)) { #pragma ivdep for (int _i1 = 0; (_i1 <= 5330); _i1 = (_i1 + 1)) { _arr_6_8[_i0] = (_arr_6_8[_i0] + (_arr_7_7[((5330 + _i0) - _i1)] * _arr_4_0[_i1])); } }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Abhinav,
I will investigate it and will be back with an update shortly. Looks like a bug and I will check your test case with 18.0 Beta compiler version.
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem is still in 18.0 compiler version. We should correct the remark message for sure to explain the reason. It looks like the loop was distributed on 2 chunks and the innermost loop of chunk 1 was vectorized. Chunk 2 was not vectorized. I will escalate this to the developers.
Thank you for reporting this problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Igor,
Thanks for your response. In fact, AFAICS the loop should actually be unroll-jammed (maybe that's why there's no reason given for ignoring the pragma). You're right in observing that the 2 loops in the original code (#2) have been tiled into chunks of length 32 each.
Thanks,
Abhinav

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page