- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I'm working about how Intel-Caffe make some routines calls just to evaluate the performance of the framework. However I can not understand when the Update Method is called, and even when the axpy_axpby_copy_axpy is called when one step of AlexNet training is executed. I'm having issues understanding this, because as far I understand about this process each method would be called 8 times(one each layer), however in my case axpy_axpby_copy_axpy is called 8 times and then Update is called 8 times too. Why each routine is called 8 times ? I though that if the Update method is called then axpy_axpby_copy_axpy isn't called and viceversa.
Thanks for your help in advance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for looking into intel-caffe for your work. I will contact the caffe dev team about this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In IntelCaffe, the SGD weight update process would be executed at the end of each training iteration/step. Its logic in fact is to check all learnable parameter blobs and makes corresponding operations, such as normalize/regularize/compute update value/make weight updates.
As for Update() func, SGD implements two ApplyUpdate() functions:
1. template <typename Dtype> void SGDSolver<Dtype>::ApplyUpdate()
2. template <typename Dtype> void SGDSolver<Dtype>::ApplyUpdate(int param_id)
The calling logic is:
SGD -> ApplyUpdate() -> for (all learnable param blobs) {ApplyUpdate(learnable_param_blob_id);}
And the axpy_axpby_copy_axpy() gets invoked when doing L2 norm regularization in ApplyUpdate(learnable_param_blob_id). (Note: it will only be triggered when you use ICC compiler or manual enable ENABLE_SGD_FUSION build flag.)
That’s why you saw 8 times axpy_axpby_copy_axpy() invocations and 8 times ApplyUpdate(learnable_param_blob_id) invocations. It means you have 8 learnable parameter blobs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @FTian for your answer, and yes I know that all what you are telling me helps me a lot, but specifically I had not understand why inside ApplyUpdate(learnable_param_blob_id) -> SGDFusion -> axpy_axpby_copy_axpy is called 8 times (Now I understand). But why inside the same SGDFusion when I'm doing the analysis the net_params[param_id]->Update() is called 8 times too. I mean if axpy_axpby_copy_axpy (do normalize/regularize/compute update value/weight updates all inside the same routine) is called, why is necessary to call net_params[param_id]->Update() (this just do weights update)? If you see the source code, you understand that probably isn't called 8 times each routine, but doing some debug on the code shows what I'm asking. Please refer to: net_params[param_id]->Update
I don't know if I'm missing something or I have some kind of misunderstanding.
Thanks for your help. :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
axpy_axpby_copy_axpy() in fact fused computeupdatevalue and weight update. if you see axpy_axpby_copy_axpy() is executed, net_params[param_id]->update() would not be called as is_separate_ComputeUpdateValue_Update is set to false.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see, but then @FTian I'm using a very simple debugging process to count the number of times a routine is invoked, however the net_params[param_id]->update() is called even when axpy_axpby_copy_axpy() was called. I'm testing all of this using PIN and to be sure using some prints on Intel-Caffe source code and recompiling it. I'm executing one step of AlexNet training process. That's why I'm a little confused.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suggest you to compile a debug version of intelcaffe and use GDB to trace those execution steps. there should have no net_params[param_id]->update() executed when axpy_axpby_copy_axpy() was called. if you saw that, pls paste me the call stack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Now, looking at the source code, effectively as you say when net_params[param_id]->update() is called, this axpy_axpby_copy_axpy() routine isn't called. However now I understand that in this line of the code
for (int param_id = 0; param_id < this->net_->learnable_params().size();++param_id){
ApplyUpdate(param_id);
}
The ApplyUpdate(param_id) is called 16 times. For sure 8 of this times the axpy_axpby_copy_axpy() is called and then net_params[param_id]->update() during the other 8 times. Do you know why learnable_params().size() is equal to 16 ? Take into account that I'm using AlexNet to make the tests and executing everything using Caffe binary with this .prototxt
random_seed: 30
test_initialization: false
base_lr: 0.01
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 20
max_iter: 1
momentum: 0.9
weight_decay: 0.0005
snapshot: 0
snapshot_prefix: "./alexnet"
solver_mode: CPU
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> Do you know why learnable_params().size() is equal to 16?
just like I said, it means there are 18 learnable parameters.
>> The ApplyUpdate(param_id) is called 16 times. For sure 8 of this times the axpy_axpby_copy_axpy() is called and then net_params[param_id]->update() during the other 8 times
it may be because part of learnable parameters, in you case it's 8, are same layout/count between cpu format and mkldnn prv format, and part of learnable parameters, are not same layout/count between cpu format and mkldnn prv format. as for why there is such reorder, you have to refer to mkldnn document for details. in a short, mkldnn converts nchw to some best performance formats, such as nchw16c or nchw16c8i, to boost up cache hit rate.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page