Intel® oneAPI DL Framework Developer Toolkit
Get answers for developing new or customizing existing frameworks using common APIs.

Reorder in CNN f32 training example

francescobrozzu
Beginner
538 Views

Dear all,

I'm currently trying to implement and train a Deep Neural Network architecture using the oneDNN toolkit on Devcloud. So far I've been able to follow the example provided here: https://oneapi-src.github.io/oneDNN/cnn_training_bf16_cpp.html, however there is a step in backward convolution that is not completely clear. Specifically, normally when we do reordering after declaring a primitive we use the following syntax:

auto conv_bwd_src_memory = conv_src_memory;
if (conv_bwd_weights_pd.src_desc() != conv_src_memory.get_desc())
{
        conv_bwd_src_memory = memory(conv_bwd_weights_pd.src_desc(), eng);
        net_bwd.push_back(reorder(conv_src_memory, conv_bwd_src_memory));
        net_bwd_args.push_back({{DNNL_ARG_FROM, conv_src_memory},
                                {DNNL_ARG_TO, conv_bwd_src_memory}});
}

 (in this example we reorder the backward source memory, but there are many more that follow the exact same structure)

however, in the last (declared) layer of the CNN f32 training example the reorder operation is somehow "delayed":

net_bwd_args.push_back({{DNNL_ARG_SRC, conv_bwd_src_memory},
                        {DNNL_ARG_DIFF_DST, conv_diff_dst_memory},
                        // delay putting DIFF_WEIGHTS until reorder (if needed)
                        {DNNL_ARG_DIFF_BIAS, conv_diff_bias_memory}});

auto conv_diff_weights_memory = conv_user_diff_weights_memory;
if (conv_bwd_weights_pd.diff_weights_desc() != conv_user_diff_weights_memory.get_desc())
{
        conv_diff_weights_memory = memory(conv_bwd_weights_pd.diff_weights_desc(), eng);
        net_bwd_args.back().insert(
            {DNNL_ARG_DIFF_WEIGHTS, conv_diff_weights_memory});

        net_bwd.push_back(reorder(
            conv_diff_weights_memory, conv_user_diff_weights_memory));
        net_bwd_args.push_back({{DNNL_ARG_FROM, conv_diff_weights_memory},
                                {DNNL_ARG_TO, conv_user_diff_weights_memory}});
}
else
{
        net_bwd_args.back().insert(
            {DNNL_ARG_DIFF_WEIGHTS, conv_diff_weights_memory});
}

 Looking at the code it seems that the operation that is being done is the exact same. Am I missing some slight difference that makes it impossible to use the previous solution?

Best regards,

Francesco Brozzu

0 Kudos
2 Replies
JananiC_Intel
Moderator
495 Views

Hi,


Thanks for posting in Intel forums.


We will check on this internally and let you know.


Regards,

Janani Chandran



Louie_T_Intel
Moderator
483 Views

Hi Brozzu,


thanks for your interests in oneDNN, and raised this question.

For delaying putting diff_weights until reorder, both f32 and bf16 have same implementation as below links.

https://github.com/oneapi-src/oneDNN/blob/master/examples/cnn_training_bf16.cpp#L417

https://github.com/oneapi-src/oneDNN/blob/master/examples/cnn_training_f32.cpp#L387


There should be no difference for this delay implementation between f32 and bf16.

Let us know if L417 in bf16 implementation address your question.


regards


Reply