- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I'm currently trying to implement and train a Deep Neural Network architecture using the oneDNN toolkit on Devcloud. So far I've been able to follow the example provided here: https://oneapi-src.github.io/oneDNN/cnn_training_bf16_cpp.html, however there is a step in backward convolution that is not completely clear. Specifically, normally when we do reordering after declaring a primitive we use the following syntax:
auto conv_bwd_src_memory = conv_src_memory;
if (conv_bwd_weights_pd.src_desc() != conv_src_memory.get_desc())
{
conv_bwd_src_memory = memory(conv_bwd_weights_pd.src_desc(), eng);
net_bwd.push_back(reorder(conv_src_memory, conv_bwd_src_memory));
net_bwd_args.push_back({{DNNL_ARG_FROM, conv_src_memory},
{DNNL_ARG_TO, conv_bwd_src_memory}});
}
(in this example we reorder the backward source memory, but there are many more that follow the exact same structure)
however, in the last (declared) layer of the CNN f32 training example the reorder operation is somehow "delayed":
net_bwd_args.push_back({{DNNL_ARG_SRC, conv_bwd_src_memory},
{DNNL_ARG_DIFF_DST, conv_diff_dst_memory},
// delay putting DIFF_WEIGHTS until reorder (if needed)
{DNNL_ARG_DIFF_BIAS, conv_diff_bias_memory}});
auto conv_diff_weights_memory = conv_user_diff_weights_memory;
if (conv_bwd_weights_pd.diff_weights_desc() != conv_user_diff_weights_memory.get_desc())
{
conv_diff_weights_memory = memory(conv_bwd_weights_pd.diff_weights_desc(), eng);
net_bwd_args.back().insert(
{DNNL_ARG_DIFF_WEIGHTS, conv_diff_weights_memory});
net_bwd.push_back(reorder(
conv_diff_weights_memory, conv_user_diff_weights_memory));
net_bwd_args.push_back({{DNNL_ARG_FROM, conv_diff_weights_memory},
{DNNL_ARG_TO, conv_user_diff_weights_memory}});
}
else
{
net_bwd_args.back().insert(
{DNNL_ARG_DIFF_WEIGHTS, conv_diff_weights_memory});
}
Looking at the code it seems that the operation that is being done is the exact same. Am I missing some slight difference that makes it impossible to use the previous solution?
Best regards,
Francesco Brozzu
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel forums.
We will check on this internally and let you know.
Regards,
Janani Chandran
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Brozzu,
thanks for your interests in oneDNN, and raised this question.
For delaying putting diff_weights until reorder, both f32 and bf16 have same implementation as below links.
https://github.com/oneapi-src/oneDNN/blob/master/examples/cnn_training_bf16.cpp#L417
https://github.com/oneapi-src/oneDNN/blob/master/examples/cnn_training_f32.cpp#L387
There should be no difference for this delay implementation between f32 and bf16.
Let us know if L417 in bf16 implementation address your question.
regards

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page