Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and GDB*
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
467 Discussions

Parallel for is very slow compared to iterative solution

sidrakiyani
Beginner
197 Views

I am trying to accelerate an algorithm using DPC++. What happens is that the normal calculation takes 1.5 times faster than kernel parallel execution. The following code is for both calculations.

the num_items currently equals 16,000. I tried small values like 500 but the same thing, the CPU is way faster the kernel.

I am using visual studio 2022 that runs oneAPI dpc++ compiler, and trying to make emulation on an FPGA, but I don't know how to find the details of the FPGA emulator like what frequency it is running on. The full code is: https://ideone.com/iEHQHab

    // This is the normal iterative code.
    std::vector<double> distance_calculation(std::vector<std::vector<double>>& dataset, 
    std::vector<double>& curr_test) {
    auto start = std::chrono::high_resolution_clock::now();
    std::vector<double>res;
    for (int i = 0; i < dataset.size(); ++i) {
        double dis = 0;
        for (int j = 0; j < dataset[i].size(); ++j) {
            dis += (curr_test[j] - dataset[i][j]) * (curr_test[j] - dataset[i][j]);
        }
        res.push_back(dis);
    }
    auto finish = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = finish - start;
    std::cout << "Elapsed time: " << elapsed.count() << " s\n";
    return res;
}
    // This is FPGA emulation code
    std::vector<double> distance_calculation_FPGA(queue& q, const  
    std::vector<std::vector<double>>& dataset, const std::vector<double>& curr_test) {
    std::vector<double>linear_dataset;
    for (int i = 0; i < dataset.size(); ++i) {
        for (int j = 0; j < dataset[i].size(); ++j) {
            linear_dataset.push_back(dataset[i][j]);
        }
    }
    range<1> num_items{dataset.size()};
    std::vector<double>res;
    //std::cout << "im in" << std::endl;

    res.resize(dataset.size());
    buffer dataset_buf(linear_dataset);
    buffer curr_test_buf(curr_test);
    buffer res_buf(res.data(), num_items);
    {
        auto start = std::chrono::high_resolution_clock::now();
        q.submit([&](handler& h) {
            accessor a(dataset_buf, h, read_only);
            accessor b(curr_test_buf, h, read_only);

            accessor dif(res_buf, h, read_write, no_init);
            h.parallel_for(range<1>(num_items), [=](id<1> i) {
                //  dif[i] = a[i].size() * 1.0;// a[i];
                for (int j = 0; j < 5; ++j) {
                    dif[i] += (b[j] - a[i * 5 + j]) * (b[j] - a[i * 5 + j]);

                }
                });
            });
            q.wait();
            auto finish = std::chrono::high_resolution_clock::now();
            std::chrono::duration<double> elapsed = finish - start;
            std::cout << "Elapsed time: " << elapsed.count() << " s\n";

    }
    /*
        for (int i = 0; i < dataset.size(); ++i) {
            double dis = 0;
            for (int j = 0; j < dataset[i].size(); ++j) {
                dis += (curr_test[j] - dataset[i][j]) * (curr_test[j] - dataset[i][j]);
            }
            res.push_back(dis);
        }
        */
    return res;
}

 

0 Kudos
2 Replies
Steve_Lionel
Black Belt Retired Employee
192 Views
Barbara_P_Intel
Moderator
188 Views

Definitely on the wrong forum. Moving to DPC++ Forum.

 

Reply