Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

OpenMP collapsed for with non-const values

luca_l_
Beginner
1,178 Views

I have this code:

    std::vector<Wrapper> localWrappers;
    std::vector<float> pixelDistancesNew;
    std::vector<float> curSigmas;
    //fill the 3 vectors
    #pragma omp parallel for collapse(2) schedule(dynamic, 1)
    for(int i=0; i<localWrappers.size(); i++)
      for (int r = par.border; r < (localWrappers.cur.rows - par.border); r++)
        for (int c = par.border; c < (localWrappers.cur.cols - par.border); c++) {
          const float val = localWrappers.cur.at<float>(r,c);
          if ( (val > positiveThreshold && (isMax(val, localWrappers.cur, r, c) && isMax(val, localWrappers.low, r, c) && isMax(val, localWrappers.high, r, c))) ||
            (val < negativeThreshold && (isMin(val, localWrappers.cur, r, c) && isMin(val, localWrappers.low, r, c) && isMin(val, localWrappers.high, r, c))) )
            // either positive -> local max. or negative -> local min.
            localizeKeypoint(r, c, curSigmas, pixelDistancesNew, localWrappers);
        }

And I get this error:

    error: parallel loops with collapse must be perfectly nested
          for(int i=0; i<localWrappers.size(); i++)
                  ^

    Error: the OpenMP "single" pragma must not be enclosed by the "for" pragma

Reading [this](http://dev-archive.ambermd.org/201509/0004.html), I think that the error is given by the fact that we I'm using `size()`. However, I don't know how could I get `const` for this or implementing any kind of solution for this problem.

Can someone help me with this?

This is Wrapper definition:

struct Wrapper{

    Wrapper(const SIFTDescriptorParams &sp, const AffineShapeParams ap) :
        sift(sp),
        ap(ap),
        sp(sp),
        mask(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1),
        img(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1),
        fx(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1),
        fy(ap.smmWindowSize, ap.smmWindowSize, CV_32FC1)
    {
        computeGaussMask(mask);
        patch = cv::Mat(ap.patchSize, ap.patchSize, CV_32FC1);
        fx = cv::Scalar(0);
        fy = cv::Scalar(0);
        descriptors.reserve(500);
    }

    AffineShapeParams ap;
    SIFTDescriptorParams sp;

    cv::Mat1f descriptors;
    cv::Mat patch;
    std::vector<Keypoint> keys;

    cv::Mat high;
    cv::Mat prevBlur;
    cv::Mat blur;
    cv::Mat low;
    cv::Mat cur;
    SIFTDescriptor sift;

    cv::Mat mask, img, fx, fy;

   std::vector<unsigned char> workspace;

}

 

0 Kudos
3 Replies
Anoop_M_Intel
Employee
1,181 Views

Could you please share with us a test case which includes the definition for "Wrapper" to make sure we can inspect the optimizations under the hood?

 

0 Kudos
luca_l_
Beginner
1,181 Views

Anoop M. (Intel) wrote:

Could you please share with us a test case which includes the definition for "Wrapper" to make sure we can inspect the optimizations under the hood?

 

I added the Wrapper definition, is anything else necessary? I'm afraid to make the question code too much confusing if I add other stuff

0 Kudos
Aditya_K_5
Novice
1,181 Views

Your question gave me the hint and I found a solution. This line is the problem:

for (int r = par.border; r < (localWrappers.cur.rows - par.border); r++)

Instead, declare above the loop:

const int numRowsWithoutBorder = localWrappers.cur.rows - par.border;

Perhaps the size() is also a problem, as you said. So that the code should look like

    std::vector<Wrapper> localWrappers;
    std::vector<float> pixelDistancesNew;
    std::vector<float> curSigmas;

    const size_t localWrappersSize = localWrappers.size();
    const int numRowsWithoutBorder = localWrappers.cur.rows - par.border;

    //fill the 3 vectors
    #pragma omp parallel for collapse(2) schedule(dynamic, 1)
    for(int i=0; i<localWrappersSize; i++)
      for (int r = par.border; r < numRowsWithoutBorder; r++)
        for (int c = par.border; c < (localWrappers.cur.cols - par.border); c++) {
          ...
         }

If you want to use collapse(3), you'll have to do the same with the columns. This kind of thing worked for me with Intel 2018. I couldn't find any documentation about this behavior easily.

Reply