Intel® oneAPI DL Framework Developer Toolkit
Gain insights from peers and Intel experts to develop new deep learning frameworks or to customize an framework utilizing common APIs.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Reorder in CNN f32 training example

francescobrozzu
Beginner
1,170 Views

Dear all,

I'm currently trying to implement and train a Deep Neural Network architecture using the oneDNN toolkit on Devcloud. So far I've been able to follow the example provided here: https://oneapi-src.github.io/oneDNN/cnn_training_bf16_cpp.html, however there is a step in backward convolution that is not completely clear. Specifically, normally when we do reordering after declaring a primitive we use the following syntax:

auto conv_bwd_src_memory = conv_src_memory;
if (conv_bwd_weights_pd.src_desc() != conv_src_memory.get_desc())
{
        conv_bwd_src_memory = memory(conv_bwd_weights_pd.src_desc(), eng);
        net_bwd.push_back(reorder(conv_src_memory, conv_bwd_src_memory));
        net_bwd_args.push_back({{DNNL_ARG_FROM, conv_src_memory},
                                {DNNL_ARG_TO, conv_bwd_src_memory}});
}

 (in this example we reorder the backward source memory, but there are many more that follow the exact same structure)

however, in the last (declared) layer of the CNN f32 training example the reorder operation is somehow "delayed":

net_bwd_args.push_back({{DNNL_ARG_SRC, conv_bwd_src_memory},
                        {DNNL_ARG_DIFF_DST, conv_diff_dst_memory},
                        // delay putting DIFF_WEIGHTS until reorder (if needed)
                        {DNNL_ARG_DIFF_BIAS, conv_diff_bias_memory}});

auto conv_diff_weights_memory = conv_user_diff_weights_memory;
if (conv_bwd_weights_pd.diff_weights_desc() != conv_user_diff_weights_memory.get_desc())
{
        conv_diff_weights_memory = memory(conv_bwd_weights_pd.diff_weights_desc(), eng);
        net_bwd_args.back().insert(
            {DNNL_ARG_DIFF_WEIGHTS, conv_diff_weights_memory});

        net_bwd.push_back(reorder(
            conv_diff_weights_memory, conv_user_diff_weights_memory));
        net_bwd_args.push_back({{DNNL_ARG_FROM, conv_diff_weights_memory},
                                {DNNL_ARG_TO, conv_user_diff_weights_memory}});
}
else
{
        net_bwd_args.back().insert(
            {DNNL_ARG_DIFF_WEIGHTS, conv_diff_weights_memory});
}

 Looking at the code it seems that the operation that is being done is the exact same. Am I missing some slight difference that makes it impossible to use the previous solution?

Best regards,

Francesco Brozzu

0 Kudos
2 Replies
JananiC_Intel
Moderator
1,127 Views

Hi,


Thanks for posting in Intel forums.


We will check on this internally and let you know.


Regards,

Janani Chandran



Louie_T_Intel
Moderator
1,115 Views

Hi Brozzu,


thanks for your interests in oneDNN, and raised this question.

For delaying putting diff_weights until reorder, both f32 and bf16 have same implementation as below links.

https://github.com/oneapi-src/oneDNN/blob/master/examples/cnn_training_bf16.cpp#L417

https://github.com/oneapi-src/oneDNN/blob/master/examples/cnn_training_f32.cpp#L387


There should be no difference for this delay implementation between f32 and bf16.

Let us know if L417 in bf16 implementation address your question.


regards


Reply