- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there any official documentation on the DLA runtime or inference engine for managing the DLA from the ARM side? I need to develop a custom application for running inference, but so far, I’ve only found the dla_benchmark (main.cpp) and streaming_inference_app.cpp example files. There should be some documentation covering the SDK. The only documentation that i found related with is the Intel FPGA AI suite PCIe based design example https://www.intel.com/content/www/us/en/docs/programmable/768977/2024-3/fpga-runtime-plugin.html
From what I understand, the general inference workflow involves the following steps:
- Identify the hardware architecture
- Deploy the model
- Prepare the input data
- Send inference requests to the DLA
- Retrieve the output data
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruben,
From the output log file, I observed that it is different from your code where certain printout is missing.
I do not observed the print out below.
// Flip Vertically
flipHorizontally(processed_output);
std::cout << "Flipped Output Array:" << std::endl;
for (const auto& row : processed_output) {
std::cout << "[ ";
for (const auto& value : row) {
std::cout << value << " ";
}
std::cout << "]" << std::endl;
}
// Group the data into 9 slaves
std::vector<std::vector<int>> grouped_output = groupData(processed_output);
std::cout << "\nGrouped Output Array:" << std::endl;
for (size_t i = 0; i < grouped_output.size(); ++i) {
std::cout << "Group " << i + 1 << ": [ ";
for (const auto& value : grouped_output[i]) {
std::cout << value << " ";
}
std::cout << "]" << std::endl;
}
std::cout << "RIS Resolution: " << output_shape[3] << "-bit." << std::endl;
std::vector<int> flattened_output;
for (const auto& group : grouped_output) {
flattened_output.insert(flattened_output.end(), group.begin(), group.end());
}
// std::vector<uint64_t> groups = prepareData(flattened_output, output_shape[3]);
std::vector<uint64_t> groups = prepareData(grouped_output, output_shape[3]);
std::cout << "Prepared Data for SPI:" << std::endl;
for (size_t i = 0; i < groups.size(); ++i) {
std::cout << "Group " << i << ": 0x"
<< std::hex << groups[i]
<< std::dec << std::endl;
}
const std::string throughput_file_name = "throughput_report.txt";
std::ofstream throughput_file;
throughput_file.open(throughput_file_name);
throughput_file << "Throughput : " << totalFps << " fps" << std::endl;
throughput_file << "Batch Size : " << batchSize << std::endl;
throughput_file << "Graph number : " << exeNetworks.size() << std::endl;
throughput_file << "Num Batches : " << num_batches << std::endl;
throughput_file.close();
// Output Debug Network Info if COREDLA_TEST_DEBUG_NETWORK is set
ReadDebugNetworkInfo(ie);
if (return_code) return return_code;
Thanks.
John Tio
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel ,
I only commented out the // Flip Vertically and // Group the data into 9 slaves sections to avoid excessively extending the log, as they only modify the already retrieved output. You can run the full code if you’d like. Do you have any questions about that?
The raw output is printed in the previous section, as you can see in the log file.
Again, do you have an example or pseudocode for properly handling the inference requests?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruben,
Unfortunately I do not have the setup to this up and I am working with engineering to see how we can implement the flow that you requested.
I am sorry if I am not able expediate this support and I am trying my best to help you resolved the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel,
Sorry. As you suggested reusing the inference request instead of creating a new one for each inference, I thought the solution was trivial and that the problem was in my implementation or concept.
I look forward to a solution.
I believe the concept of using the DLA is correct in a real application: deploy the accelerator and configure it with the graph, then keep it configured and continuously feed it with new data for inference. Isn't that right? Of course, new inferences must wait for the previous one to finish. Is this correct, or have I misunderstood something about the working principle of the accelerator?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Yes, your understanding is correct where you should be able to continuously feed new data for inference. Your understanding of accelerator is correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel ,
Is there any news about this topic?
I'm using the S2M design in case it is helpful to find an alternative solution based on streaming app.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruben,
Sorry for the delay. If you are using the benchmark source code then you will need to include “wait_all” so the inference is completed before you proceed with new input.
You might want to refer to OpenVINO’s classes instead: https://docs.openvino.ai/2024/openvino-workflow/running-inference/integrate-openvino-with-your-application/inference-request.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel ,
The following statement is present in the code I shared with you:
std::cout << "#Debug: 10. waitAll.\n";
// wait the latest inference executions
for (auto& inferRequestsQueue : inferRequestsQueues)
inferRequestsQueue->waitAll();
Is this what you are referring to? It doesn't work. Maybe it is not used correctly. Do you have a pseudocode example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I think it should be different. You may refer to openvino.AsyncInferQueue — OpenVINO™ documentationCopy to clipboardBack ButtonFilter Button — Version(2024)
You may also refer to for the example that contain the wait_all. Throughput Benchmark Sample — OpenVINO™ documentationCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardBack ButtonFilter Button — Version(2024)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel,
dla_benchmark is implemented in C++.
The API documentation you shared in the previous comment is for Python. The example that uses wait_all is implemented in Python. There is also an example in C++, but it doesn't use wait_all, waitAll, or any similar function.
In addition, the OpenVINO documentation is available, but the required OpenVINO version for the latest FPGA AI (2024.3) is 2023.3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruben,
You may also refer to 2023.3 version of document from OpenVINO. The sample design can be use to be run with FPGA.Throughput Benchmark Sample — OpenVINO™ documentationCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboardCopy to clipboard — Version(2023.3)
It has both Python and C++ sample code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel ,
The same. It has a C++ example but no "wait_all" o similar funcion is used on it. Only in the Python example.
it uses:
for (ov::InferRequest& ireq : ireqs) {
ireq.wait();
}
Similar to the code I shared with you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruben,
I think in C++ it is using below code which is wait()
for (ov::InferRequest& ireq : ireqs) {
ireq.wait();
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel ,
As I said, it also included in dla_bechmark as well as the application I shared with you. It doesn't work. Find below the code extracted:
for (size_t iireq = 0; iireq < nireq; iireq++) {
auto inferRequest = inferRequestsQueues.at(net_id)->getIdleRequest();
if (!inferRequest) {
THROW_IE_EXCEPTION << "No idle Infer Requests!";
}
if(niter != 0LL){
std::cout << "#Debug: 10. Set output blob.\n";
for (auto & item : outputInfos.at(net_id)) {
std::string currOutputName = item.first;
auto currOutputBlob = ioBlobs.at(net_id).second[iterations.at(net_id)][currOutputName];
inferRequest->SetBlob(currOutputName, currOutputBlob);
}
std::cout << "#Debug: 10. Set input blob.\n";
for (auto & item: inputInfos.at(net_id)){
std::string currInputName = item.first;
auto currInputBlob = ioBlobs.at(net_id).first[iterations.at(net_id)][currInputName];
inferRequest->SetBlob(currInputName, currInputBlob);
}
}
// Execute one request/batch
if (FLAGS_api == "sync") {
inferRequest->infer();
} else {
// As the inference request is currently idle, the wait() adds no additional overhead (and should return immediately).
// The primary reason for calling the method is exception checking/re-throwing.
// Callback, that governs the actual execution can handle errors as well,
// but as it uses just error codes it has no details like ‘what()’ method of `std::exception`
// So, rechecking for any exceptions here.
inferRequest->wait();
inferRequest->startAsync();
}
iterations.at(net_id) ++;
if (net_id == exeNetworks.size() - 1) {
execTime = std::chrono::duration_cast<ns>(Time::now() - startTime).count();
if (niter > 0) {
progressBar.addProgress(1);
} else {
// calculate how many progress intervals are covered by current iteration.
// depends on the current iteration time and time of each progress interval.
// Previously covered progress intervals must be skipped.
auto progressIntervalTime = duration_nanoseconds / progressBarTotalCount;
size_t newProgress = execTime / progressIntervalTime - progressCnt;
progressBar.addProgress(newProgress);
progressCnt += newProgress;
}
}
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruben,
I think you might need to only provide new input of data and not changing the blob which will think that this is a new inference setting.
During the 1st run, you should have performed all the setting and during the second run onwards, you should just provide the input data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @JohnT_Intel,
Same behaviour.
I changed to create the blobs before the loop and only filling them in the loop:
// Create blobs only once before the loop
using Blob_t = std::vector<std::map<std::string, Blob::Ptr>>;
std::vector<std::pair<Blob_t, Blob_t>> ioBlobs = vectorMapWithIndex<std::pair<Blob_t, Blob_t>>(
exeNetworks, [&](ExecutableNetwork* const& exeNetwork, uint32_t index) mutable {
Blob_t inputBlobs;
Blob_t outputBlobs;
ConstInputsDataMap inputInfo = exeNetwork->GetInputsInfo();
ConstOutputsDataMap outputInfo = exeNetwork->GetOutputsInfo();
for (uint32_t batch = 0; batch < num_batches; batch++) {
std::map<std::string, Blob::Ptr> outputBlobsMap;
for (auto& item : outputInfo) {
auto& precision = item.second->getTensorDesc().getPrecision();
if (precision != Precision::FP32) {
THROW_IE_EXCEPTION << "Output blob creation only supports FP32 precision. Instead got: " + precision;
}
auto outputBlob = make_shared_blob<PrecisionTrait<Precision::FP32>::value_type>(item.second->getTensorDesc());
outputBlob->allocate();
outputBlobsMap[item.first] = (outputBlob);
}
std::map<std::string, Blob::Ptr> inputBlobsMap;
for (auto& item : inputInfo) {
Blob::Ptr inputBlob = nullptr;
auto& precision = item.second->getTensorDesc().getPrecision();
if (precision == Precision::FP32) {
inputBlob = make_shared_blob<PrecisionTrait<Precision::FP32>::value_type>(item.second->getTensorDesc());
} else if (precision == Precision::U8) {
inputBlob = make_shared_blob<PrecisionTrait<Precision::U8>::value_type>(item.second->getTensorDesc());
} else {
THROW_IE_EXCEPTION << "Input blob creation only supports FP32 and U8 precision. Instead got: " + precision;
}
inputBlob->allocate();
inputBlobsMap[item.first] = (inputBlob);
}
inputBlobs.push_back(inputBlobsMap);
outputBlobs.push_back(outputBlobsMap);
}
return std::make_pair(inputBlobs, outputBlobs);
}
);
std::cout << "Blobs initialized once before the loop.\n";
while (1) {
...
// Fill blobs with new input values (DO NOT re-create them)
for (size_t i = 0; i < exeNetworks.size(); i++) {
slog::info << "Filling input blobs for network ( " << topology_names[i] << " )" << slog::endl;
fillBlobs(inputs, ioBlobs[i].first); // Only fill the existing blobs
}
...
}
Error: dlia_infer_request.cpp:53 Number of inference requests exceed the maximum number of inference requests supported per instance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ruben,
I think you might need to try out with OpenVINO example design or other runtime example design to see if it is working from your side (eg. classification_sample_async or object_detection_demo)?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »