OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Are there a more complete information about the SGDFusion method defined inside sgd_solver.cpp on Intel-Caffe framework?

kala855
Beginner
1,094 Views

Hi everyone,

 

 

 

I'm working about how Intel-Caffe make some routines calls just to evaluate the performance of the framework. However I can not understand when the Update Method is called, and even when the axpy_axpby_copy_axpy is called when one step of AlexNet training is executed. I'm having issues understanding this, because as far I understand about this process each method would be called 8 times(one each layer), however in my case axpy_axpby_copy_axpy is called 8 times and then Update is called 8 times too. Why each routine is called 8 times ? I though that if the Update method is called then axpy_axpby_copy_axpy isn't called and viceversa.

 

 

 

Thanks for your help in advance.

0 Kudos
8 Replies
Nathan_G_Intel
Employee
509 Views

Thank you for looking into intel-caffe for your work. I will contact the caffe dev team about this.

Tian_Feng
Employee
509 Views

In IntelCaffe, the SGD weight update process would be executed at the end of each training iteration/step. Its logic in fact is to check all learnable parameter blobs and makes corresponding operations, such as normalize/regularize/compute update value/make weight updates.

 

As for Update() func, SGD implements two ApplyUpdate() functions:

1. template <typename Dtype> void SGDSolver<Dtype>::ApplyUpdate()

2. template <typename Dtype> void SGDSolver<Dtype>::ApplyUpdate(int param_id)

 

The calling logic is:

SGD -> ApplyUpdate() -> for (all learnable param blobs) {ApplyUpdate(learnable_param_blob_id);}

 

And the axpy_axpby_copy_axpy() gets invoked when doing L2 norm regularization in ApplyUpdate(learnable_param_blob_id). (Note: it will only be triggered when you use ICC compiler or manual enable ENABLE_SGD_FUSION build flag.)

 

That’s why you saw 8 times axpy_axpby_copy_axpy() invocations and 8 times ApplyUpdate(learnable_param_blob_id) invocations. It means you have 8 learnable parameter blobs.

kala855
Beginner
509 Views

Thanks @FTian​ for your answer, and yes I know that all what you are telling me helps me a lot, but specifically I had not understand why inside ApplyUpdate(learnable_param_blob_id) -> SGDFusion -> axpy_axpby_copy_axpy is called 8 times (Now I understand). But why inside the same SGDFusion when I'm doing the analysis the net_params[param_id]->Update() is called 8 times too. I mean if axpy_axpby_copy_axpy (do normalize/regularize/compute update value/weight updates all inside the same routine) is called, why is necessary to call net_params[param_id]->Update() (this just do weights update)? If you see the source code, you understand that probably isn't called 8 times each routine, but doing some debug on the code shows what I'm asking. Please refer to: net_params[param_id]->Update

I don't know if I'm missing something or I have some kind of misunderstanding.

 

Thanks for your help. :)

Tian_Feng
Employee
509 Views

axpy_axpby_copy_axpy() in fact fused computeupdatevalue and weight update. if you see axpy_axpby_copy_axpy() is executed, net_params[param_id]->update() would not be called as is_separate_ComputeUpdateValue_Update is set to false.

kala855
Beginner
509 Views

I see, but then @FTian​ I'm using a very simple debugging process to count the number of times a routine is invoked, however the net_params[param_id]->update() is called even when axpy_axpby_copy_axpy() was called. I'm testing all of this using PIN and to be sure using some prints on Intel-Caffe source code and recompiling it. I'm executing one step of AlexNet training process. That's why I'm a little confused.

Tian_Feng
Employee
509 Views

I would suggest you to compile a debug version of intelcaffe and use GDB to trace those execution steps. there should have no net_params[param_id]->update() executed when axpy_axpby_copy_axpy() was called. if you saw that, pls paste me the call stack.

kala855
Beginner
509 Views

Now, looking at the source code, effectively as you say when net_params[param_id]->update() is called, this axpy_axpby_copy_axpy() routine isn't called. However now I understand that in this line of the code  

 

for (int param_id = 0; param_id < this->net_->learnable_params().size();++param_id){

ApplyUpdate(param_id);

}

 

The ApplyUpdate(param_id) is called 16 times. For sure 8 of this times the axpy_axpby_copy_axpy() is called and then  net_params[param_id]->update() during the other 8 times. Do you know why learnable_params().size() is equal to 16 ? Take into account that I'm using AlexNet to make the tests and executing everything using Caffe binary with this .prototxt

 

random_seed: 30                                                                                                     

test_initialization: false                                                                                                           

base_lr: 0.01                                                                                                                        

lr_policy: "step"                                                                                                                    

gamma: 0.1                                                                                                                           

stepsize: 100000                                                                                                                     

display: 20                                                                                                                          

max_iter: 1                                                                                                                          

momentum: 0.9                                                                                                                        

weight_decay: 0.0005                                                                                                                 

snapshot: 0                                                                                                                          

snapshot_prefix: "./alexnet"                                                                                                         

solver_mode: CPU 

 

Thanks for your help.

Tian_Feng
Employee
509 Views

>> Do you know why learnable_params().size() is equal to 16?

 

just like I said, it means there are 18 learnable parameters.

 

>> The ApplyUpdate(param_id) is called 16 times. For sure 8 of this times the axpy_axpby_copy_axpy() is called and then net_params[param_id]->update() during the other 8 times

 

it may be because part of learnable parameters, in you case it's 8, are same layout/count between cpu format and mkldnn prv format, and part of learnable parameters, are not same layout/count between cpu format and mkldnn prv format. as for why there is such reorder, you have to refer to mkldnn document for details. in a short, mkldnn converts nchw to some best performance formats, such as nchw16c or nchw16c8i, to boost up cache hit rate.

Reply