- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
I modified the sample code cnn_inference_f32.cpp and
convolution Created only one network.
I also created a function for the part that creates the primitive.
<Problem>
The following issues occur:
If I specify a CPU device, it will run to the end.
C:\Test.\_build\cnn-inference-f32-cpp.exe cpu
execute,start,end
Use time: 206 ms per iteration.
Example passed on CPU.
However, if I specify a GPU device, it will be killed.
C:\Test>.\_build\cnn-inference-f32-cpp.exe gpu
execute,start,
As a result of investigation, it ends with the following code.
net.at(i).execute(s, net_args.at(i));
What am i doing wrong? Or is this a library bug?
Please advise me the cause and solution.
I have attached the code that can reproduce the problem.
<Information>
OS: Windows 10 Pro (21H1)
Toolkit:
Intel oneAPI 2021.3
cmake: ver.3.19.2
ninja: ver.1.8.2
CPU: Intel Core i7-1065G7 1.3GHz
Accelerator:Iris Xe Graphics
driver ver.: 27.20.100.9664
Best regard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I answer your questions in below:
Hello,
I was able to run it on the GPU using the code I got from you.
Thank you!
I want a secure implementation. I have a question.
Question:
Do I need to keep only the dnnl :: memory variable?
[Intel] No. In the updated sample code, it also includes the engine, stream, net, net_args.
dnnl::memory is the key variable to lead to crash in original code.
For example L136
net.push_back(convolution_forward(conv1_prim_desc));
A local variable in conv1_prim_desc is used as an argument for push_back().
Is conv1_prim_desc safe using local variables?
[Intel] Yes.
My first answer/reply yesterday including wrong info. I have removed the wrong post.
convolution_forward(conv1_prim_desc) doesn't include the data malloced in device/GPU. So no need to set as global variable. net.push_back() will copy it to new element as you said.
But dnnl::memory includes the data in device/GPU, so it must be set as global.
Best regard.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We are trying to reproduce the issue, will get back to you soon with the updates.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are able to reproduce the issue using your code and checking on it internally.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
I am still expecting your response regarding this issue.
I would like to know the cause of this issue and how to deal with it.
Best regard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
Please let us know the current status regarding this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi k_higashi,
I find the root cause of the crash: wrong code usage in original code.
In function create_net(), net.push(xxx) are called. But the variable xxx is local variable of create_net().
When code leave the function create_net(), the xxx will be released.
Then, call net.at(i).execute() will crash when access variable xxx which is unavailable now.
But it's possible not to trigger crash when running on CPU or some GPU.
Because in some case, the OS doesn't change memory of xxx after create_net(), so the net.at(i).execute() will get correct data in xxx's address.
But it's only for lucky.
If the system is busy or in other hardware, the memory of xxx will be covered soon and crash will appear frequently.
Solution 1:
Change the code:
net.push() and net.execute() are called in same function. local variable is used in local. So no crash.
Solution 2:
Define the xxx as global variable.
I remember I have made same mistake when I learned oneDNN code.
I spend more time to recall the mistake.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Your feedback doesn't appear in the community loop. So, I just see your feedback.
Question 1:
>In function create_net(), net.push(xxx) are called. But the variable xxx is local variable of create_net().
Is "net.push (xxx)" exactly "net.push_back (xxx)" ?
[Intel] Yes. it's pseudo code. In your code, it's net.push_back(xxx).
Question 2:
> Solution 2:
> Define the xxx as global variable.
Are the following measures correct?
For example
net.push_back(convolution_forward(conv1_prim_desc));
-> I define "conv1_prim_desc" as global variable.
For example
net.push_back(reorder(conv1_dst_memory, user_dst_memory));
-> I define "conv1_dst_memory" and "user_dst_memory" as global variable.
[Intel] Yes
---------------
Question 3:
I have doubts about the cause of the crash.
About the push_back () specification of vector,
I think "net.push_back (xxx)" reallocates memory for net and copies the value of xxx to the end of the net variable.
I think the "net" variable retains its value even when the original local variable xxx is released.
Therefore, I don't think execute uses net and the address of xxx is not directly referenced.
What am I doing wrong?
I hope for good advice.
[Intel] If net.push_back() call the deep copy method of xxx, it could avoid this issue in CPU. But I guess xxx wouldn't provide deep copy method in most cases.
In GPU case, the xxx would include some member variables assigned to GPU(device) memory, it's a little complex to implement the deep copy of xxx in such case. Nobody like to implement deep copy in GPU, it waste memory and time.
It's hard to trust std::vector to copy the variable in push_back().
Global/static/instance's member are good way to keep data.
-- What I want to achieve ---
I want the inference execution time to be as fast as possible.
The process of [Create network] requires a considerable amount of time even if the oneDNN cache is used.
So I wondered if I could execute [Create network] as a create_net function in advance.
I want to infer only the processing of [Execute model].
[Intel] Yes, I understand your goal.
Here is sample code updated as your requirement. please refer to it.
I will publish it in next post.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I answer your questions in below:
Hello,
I was able to run it on the GPU using the code I got from you.
Thank you!
I want a secure implementation. I have a question.
Question:
Do I need to keep only the dnnl :: memory variable?
[Intel] No. In the updated sample code, it also includes the engine, stream, net, net_args.
dnnl::memory is the key variable to lead to crash in original code.
For example L136
net.push_back(convolution_forward(conv1_prim_desc));
A local variable in conv1_prim_desc is used as an argument for push_back().
Is conv1_prim_desc safe using local variables?
[Intel] Yes.
My first answer/reply yesterday including wrong info. I have removed the wrong post.
convolution_forward(conv1_prim_desc) doesn't include the data malloced in device/GPU. So no need to set as global variable. net.push_back() will copy it to new element as you said.
But dnnl::memory includes the data in device/GPU, so it must be set as global.
Best regard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for accepting our solution. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page