- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have this code which I test on my AVX2 machine:
bool interpolate(const Mat &im, float ofsx, float ofsy, float a11, float a12, float a21, float a22, Mat &res) { bool ret = false; // input size (-1 for the safe bilinear interpolation) const int width = im.cols-1; const int height = im.rows-1; // output size const int halfWidth = res.cols >> 1; const int halfHeight = res.rows >> 1; float *out = res.ptr<float>(0); const float *imptr = im.ptr<float>(0); for (int j=-halfHeight; j<=halfHeight; ++j) { const float rx = ofsx + j * a12; const float ry = ofsy + j * a22; #pragma omp simd for(int i=-halfWidth; i<=halfWidth; ++i, out++) { float wx = rx + i * a11; float wy = ry + i * a21; const int x = (int) floor(wx); const int y = (int) floor(wy); if (x >= 0 && y >= 0 && x < width && y < height) { // compute weights wx -= x; wy -= y; int rowOffset = y*im.cols; int rowOffset1 = (y+1)*im.cols; // bilinear interpolation *out = (1.0f - wy) * ((1.0f - wx) * imptr[rowOffset+x] + wx * imptr[rowOffset+x+1]) + ( wy) * ((1.0f - wx) * imptr[rowOffset1+x] + wx * imptr[rowOffset1+x+1]); } else { *out = 0; ret = true; // touching boundary of the input } } } return ret; }
As suggested by Intel Advisor, I added #pragma omp simd
to force vectorization since the compiler (icpc
2017 update 3) assumed an inexistent dependency. On my AVX2 machine this doesn't produce any error and actually improve perfomance.
However, on the AVX-512 machine (with same compiler and version) this generates a segmentation fault. Why this happens?
The compilation flags are the same, expect that one use -xCORE-AVX2' and the other one
-xMIC-AVX512`. This is the complete set of compilation flags:
INTEL_OPT=-O3 -ipo -simd -xCORE-AVX2 -parallel -qopenmp -fargument-noalias -ansi-alias -no-prec-div -fp-model fast=2 -fma -align -finline-functions INTEL_PROFILE=-g -qopt-report=5 -Bdynamic -shared-intel -debug inline-debug-info -qopenmp-link dynamic -parallel-source-info=2 -ldl
Some halfWidth
examples are 20, 9, 17... while halfHeight
values are 48, 20, 9, 43... If you mean the im
size some examples of width
are 1368, 683, 62 ... while height
50, 1061, 58, ...notice that the values that I reported are not the correspondent to each other, I just reported random number that I've seen from their printing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This code does have cross-iteration data dependency by variable ret known as 'conditional scalar assignment'. So the code with bare simd pragma is malformed and incorrect result shouldn't be considered as a bug. I believe that with higher level optimization report (-opt-report:5) compiler should point to `ret` as a dependency.
However, this issue can be easily resolved by declaring 'ret' as OR-reduction - just modify your pragma as #pragma omp simd reduction(|:ret). I don't have Intel Compiler right now, but I believe it should produce quite optimized code for this kind of reduction (more efficient than generic reduction).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Serge P. wrote:
This code does have cross-iteration data dependency by variable ret known as 'conditional scalar assignment'. So the code with bare simd pragma is malformed and incorrect result shouldn't be considered as a bug. I believe that with higher level optimization report (-opt-report:5) compiler should point to `ret` as a dependency.
However, this issue can be easily resolved by declaring 'ret' as OR-reduction - just modify your pragma as #pragma omp simd reduction(|:ret). I don't have Intel Compiler right now, but I believe it should produce quite optimized code for this kind of reduction (more efficient than generic reduction).
I've never heard about the #pragma omp simd reduction(|:ret), thanks for that. However...Damn, I was quite confident that your solution was going to solve this, but I still get the segmentation fault.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> but I still get the segmentation fault
before your loop, insert some assert code to assure out and imptr are properly constructed pointers (correct address of data, data is allocated, correct size of data, correct alignment of data).
Note, your outer for loop is executing halfHeight*2+1 times and the inner for loop is executing halfWidth*2+1 times. This may push out beyond the end of your allocated array. IOW out allocated to (halfHeight*2)*(halfWidth*2) < (halfHeight*2+1)*(halfWidth*2+1)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
>> but I still get the segmentation fault
before your loop, insert some assert code to assure out and imptr are properly constructed pointers (correct address of data, data is allocated, correct size of data, correct alignment of data).
Note, your outer for loop is executing halfHeight*2+1 times and the inner for loop is executing halfWidth*2+1 times. This may push out beyond the end of your allocated array. IOW out allocated to (halfHeight*2)*(halfWidth*2) < (halfHeight*2+1)*(halfWidth*2+1)
Jim Dempsey
Thanks for your answer. Notice that the code above works fine with -xCORE-AVX2. I'm not sure that I correctly understood your comment above, but if so "correct address of data, data is allocated and correct size of data" should not be a problem: out and imptr are OpenCV matrices and they are correctly allocated.
In addition, by building OpenCV with:
-DENABLE_AVX=ON -DENABLE_AVX2=ON -DENABLE_FMA3=ON -DENABLE_SSE=ON -DENABLE_SSE2=ON -DENABLE_SSE3=ON -DENABLE_SSE41=ON -DENABLE_SSE42=ON -DENABLE_SSSE3=ON
Should ensure that data is at least 256-aligned. I didn't find any solution to build OpenCV in a 512-aligned way. Could be this the problem?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page