Are there a more complete information about the SGDFusion method defined inside sgd_solver.cpp on Intel-Caffe framework?

kala855 · ‎03-28-2019

Hi everyone,

I'm working about how Intel-Caffe make some routines calls just to evaluate the performance of the framework. However I can not understand when the Update Method is called, and even when the axpy_axpby_copy_axpy is called when one step of AlexNet training is executed. I'm having issues understanding this, because as far I understand about this process each method would be called 8 times(one each layer), however in my case axpy_axpby_copy_axpy is called 8 times and then Update is called 8 times too. Why each routine is called 8 times ? I though that if the Update method is called then axpy_axpby_copy_axpy isn't called and viceversa.

Thanks for your help in advance.

Nathan_G_Intel · ‎03-28-2019

Thank you for looking into intel-caffe for your work. I will contact the caffe dev team about this.

Tian_Feng · ‎03-29-2019

In IntelCaffe, the SGD weight update process would be executed at the end of each training iteration/step. Its logic in fact is to check all learnable parameter blobs and makes corresponding operations, such as normalize/regularize/compute update value/make weight updates.

As for Update() func, SGD implements two ApplyUpdate() functions:

1. template <typename Dtype> void SGDSolver<Dtype>::ApplyUpdate()

2. template <typename Dtype> void SGDSolver<Dtype>::ApplyUpdate(int param_id)

The calling logic is:

SGD -> ApplyUpdate() -> for (all learnable param blobs) {ApplyUpdate(learnable_param_blob_id);}

And the axpy_axpby_copy_axpy() gets invoked when doing L2 norm regularization in ApplyUpdate(learnable_param_blob_id). (Note: it will only be triggered when you use ICC compiler or manual enable ENABLE_SGD_FUSION build flag.)

That’s why you saw 8 times axpy_axpby_copy_axpy() invocations and 8 times ApplyUpdate(learnable_param_blob_id) invocations. It means you have 8 learnable parameter blobs.

kala855 · ‎03-29-2019

Thanks @FTian for your answer, and yes I know that all what you are telling me helps me a lot, but specifically I had not understand why inside ApplyUpdate(learnable_param_blob_id) -> SGDFusion -> axpy_axpby_copy_axpy is called 8 times (Now I understand). But why inside the same SGDFusion when I'm doing the analysis the net_params[param_id]->Update() is called 8 times too. I mean if axpy_axpby_copy_axpy (do normalize/regularize/compute update value/weight updates all inside the same routine) is called, why is necessary to call net_params[param_id]->Update() (this just do weights update)? If you see the source code, you understand that probably isn't called 8 times each routine, but doing some debug on the code shows what I'm asking. Please refer to: net_params[param_id]->Update

I don't know if I'm missing something or I have some kind of misunderstanding.

Thanks for your help. :)

Tian_Feng · ‎03-29-2019

axpy_axpby_copy_axpy() in fact fused computeupdatevalue and weight update. if you see axpy_axpby_copy_axpy() is executed, net_params[param_id]->update() would not be called as is_separate_ComputeUpdateValue_Update is set to false.

kala855 · ‎03-29-2019

I see, but then @FTian I'm using a very simple debugging process to count the number of times a routine is invoked, however the net_params[param_id]->update() is called even when axpy_axpby_copy_axpy() was called. I'm testing all of this using PIN and to be sure using some prints on Intel-Caffe source code and recompiling it. I'm executing one step of AlexNet training process. That's why I'm a little confused.

Tian_Feng · ‎03-29-2019

I would suggest you to compile a debug version of intelcaffe and use GDB to trace those execution steps. there should have no net_params[param_id]->update() executed when axpy_axpby_copy_axpy() was called. if you saw that, pls paste me the call stack.

kala855 · ‎03-29-2019

Now, looking at the source code, effectively as you say when net_params[param_id]->update() is called, this axpy_axpby_copy_axpy() routine isn't called. However now I understand that in this line of the code

for (int param_id = 0; param_id < this->net_->learnable_params().size();++param_id){

ApplyUpdate(param_id);

}

The ApplyUpdate(param_id) is called 16 times. For sure 8 of this times the axpy_axpby_copy_axpy() is called and then net_params[param_id]->update() during the other 8 times. Do you know why learnable_params().size() is equal to 16 ? Take into account that I'm using AlexNet to make the tests and executing everything using Caffe binary with this .prototxt

random_seed: 30

test_initialization: false

base_lr: 0.01

lr_policy: "step"

gamma: 0.1

stepsize: 100000

display: 20

max_iter: 1

momentum: 0.9

weight_decay: 0.0005

snapshot: 0

snapshot_prefix: "./alexnet"

solver_mode: CPU

Thanks for your help.

Tian_Feng · ‎04-02-2019

>> Do you know why learnable_params().size() is equal to 16?

just like I said, it means there are 18 learnable parameters.

>> The ApplyUpdate(param_id) is called 16 times. For sure 8 of this times the axpy_axpby_copy_axpy() is called and then net_params[param_id]->update() during the other 8 times

it may be because part of learnable parameters, in you case it's 8, are same layout/count between cpu format and mkldnn prv format, and part of learnable parameters, are not same layout/count between cpu format and mkldnn prv format. as for why there is such reorder, you have to refer to mkldnn document for details. in a short, mkldnn converts nchw to some best performance formats, such as nchw16c or nchw16c8i, to boost up cache hit rate.