- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to accelerate an algorithm using DPC++. What happens is that the normal calculation takes 1.5 times faster than kernel parallel execution. The following code is for both calculations.
the num_items currently equals 16,000. I tried small values like 500 but the same thing, the CPU is way faster the kernel.
I am using visual studio 2022 that runs oneAPI dpc++ compiler, and trying to make emulation on an FPGA, but I don't know how to find the details of the FPGA emulator like what frequency it is running on. The full code is: https://ideone.com/iEHQHab
// This is the normal iterative code. std::vector<double> distance_calculation(std::vector<std::vector<double>>& dataset, std::vector<double>& curr_test) { auto start = std::chrono::high_resolution_clock::now(); std::vector<double>res; for (int i = 0; i < dataset.size(); ++i) { double dis = 0; for (int j = 0; j < dataset[i].size(); ++j) { dis += (curr_test[j] - dataset[i][j]) * (curr_test[j] - dataset[i][j]); } res.push_back(dis); } auto finish = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> elapsed = finish - start; std::cout << "Elapsed time: " << elapsed.count() << " s\n"; return res; }
// This is FPGA emulation code std::vector<double> distance_calculation_FPGA(queue& q, const std::vector<std::vector<double>>& dataset, const std::vector<double>& curr_test) { std::vector<double>linear_dataset; for (int i = 0; i < dataset.size(); ++i) { for (int j = 0; j < dataset[i].size(); ++j) { linear_dataset.push_back(dataset[i][j]); } } range<1> num_items{dataset.size()}; std::vector<double>res; //std::cout << "im in" << std::endl; res.resize(dataset.size()); buffer dataset_buf(linear_dataset); buffer curr_test_buf(curr_test); buffer res_buf(res.data(), num_items); { auto start = std::chrono::high_resolution_clock::now(); q.submit([&](handler& h) { accessor a(dataset_buf, h, read_only); accessor b(curr_test_buf, h, read_only); accessor dif(res_buf, h, read_write, no_init); h.parallel_for(range<1>(num_items), [=](id<1> i) { // dif[i] = a[i].size() * 1.0;// a[i]; for (int j = 0; j < 5; ++j) { dif[i] += (b[j] - a[i * 5 + j]) * (b[j] - a[i * 5 + j]); } }); }); q.wait(); auto finish = std::chrono::high_resolution_clock::now(); std::chrono::duration<double> elapsed = finish - start; std::cout << "Elapsed time: " << elapsed.count() << " s\n"; } /* for (int i = 0; i < dataset.size(); ++i) { double dis = 0; for (int j = 0; j < dataset[i].size(); ++j) { dis += (curr_test[j] - dataset[i][j]) * (curr_test[j] - dataset[i][j]); } res.push_back(dis); } */ return res; }
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Definitely on the wrong forum. Moving to DPC++ Forum.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page