- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have some doubts about how to define the CAFFE_PATH= and the data path I have to write within Phyton program to training data in my Colfax Home directory.
Do you have an example?
Thank you
- Tags:
- Python
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
Thanks for reaching out to us.
We can train caffe within the python program. A sample python program is given below.
import caffe import os solver=caffe.SGDSolver('/models/intel_optimized_models/ssd/VGGNet/VOC0712/SSD_300x300/solver.prototxt') solver.solve()
Here we have used ssd solver.prototxt from intel caffe. Please specify your solver.prototxt path.
Need to comment out the below line in solver.prototxt.
type: "SGD"
Kindly revert in case of any issues.
Regards,
Deepthi Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
Could you please confirm if the solution provided worked for you.
Regards,
Deepthi Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi Raj,
I'm sorry for the late answer.
Thank you for your suggestion, I used your example without changes using the following :
/glob/intel-python/python3/bin/caffe/
I received the following error:
"
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0911 10:30:07.087224 129990 io.cpp:82] Check failed: fd != -1 (-1 vs. -1) File$
*** Check failure stack trace: ***
/var/spool/torque/mom_priv/jobs/http://162226.c009.SC/ 162226.c009.SC: line 3: 129990 Aborted
"
I cannot acces the directory ../jobs: permission denied.
What could be wrong?
Additional questions:
1 - I would like to use my solver.prototxt, in witch folder do I have to move it?
'/models/intel_optimized_models/ssd/VGGNet/
2 - Is it possible to use GoogLeNet model?
FYI: Sorry, unfortunately, I will can not answer before Thursday.
Thank you for your support!
Best Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
Please specify your solver.prototxt path instead of "/models/intel_optimized_models/ssd/VGGNet/VOC0712/SSD_300x300/solver.prototxt"
We can use googlenet topology as well for caffe training, but need to provide corresponding solver.prototxt
Regards,
Dona
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dona,
the workaround works, the Systems started initializing the Solver using my parameters, thank you!
Now, there are 2 errors as output saved within .py.e12345 file I would like to clarify:
1 - At the beginning of the process:
WARNING: Logging before InitGoogleLogging() is written to STDERR
2 - at the end of the process:
/var/spool/torque/mom_priv/jobs/163020.c009.SC: line 3: 95903 Bus error python3 myapplication1.py
How can I fix these?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
Please share the python file and the dependent files so that we could run and check it from our side.
Regards,
Deepthi Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
Thanks for sharing.
We need the python file that you used to run it from our side. Please share the below mentioned files.
1. myapplication2.py
2. train_val.prototxt
3. train and test lmdb files
4. labelmap.prototxt
5. test_name_size.txt
Regards,
Deepthi Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi Raj,
the data.mdb are too big for a scp tranfer, the system drop the connection after some time.
Anyway, below the content of myappliaction2.py, it's really simple (poor):
import caffe
import os
solver=caffe.SGDSolver('/home/u18921/workspaceCZOO/ncappzoo/apps/dogsvscats/bvlc_googlenet/org/solver.prototxt')
solver.solve()
Please, let me know if there are something to add to improve the process.
At the same time, I have another question, how can execute "make" to compile data saved in my home (in Colfax)?
This question is linked to the case, because the make execution of the application I'm using generate the lmdb files and other files.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi Raj,
I have the train and test lmdb files, please, let me know where I can find a repository for sharing because there is not the option to attach these to this my message.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi Raj,
I found the possibility to attached files simply changing the visualization mode :-)
OK, please, see attached files as requested.
Exceptions:
train_lmdb_data.mdb
val_lmdb_data.mdb
Too big files
Please, let me know your feedback
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
Thanks for sharing :)
As mentioned in the previous conversation, please share the below files as well so that we can check where exactly the problem comes in.
1. labelmap.prototxt
2. train_val.prototxt
3. test_name_size.txt
Bus error mainly occurs when we are trying to access an invalid memory address.
Regards,
Deepthi Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi Raj,
thanks for your replay.
Sorry, what kind of memory do you mean?
I would like to clarify the contest, I'm trying to apply the following document to customize data using Caffe for NCS:
https://movidius.github.io/blog/deploying-custom-caffe-models/ https://movidius.github.io/blog/deploying-custom-caffe-models/
Below my answer to your request:
2. train_val.prototxt attached
1. labelmap.prototxt
How can I identify the labelmap.prototxt that I have used for data preparation?
As I mention, I'm using NCSDK, v2.05.
Below the list I have on my computer used to prepare the data:
/opt/movidius/ssd-caffe/data/VOC0712/labelmap_voc.prototxt
/opt/movidius/ssd-caffe/data/coco/labelmap_coco.prototxt
/opt/movidius/ssd-caffe/data/ILSVRC2016/labelmap_ilsvrc_det.prototxt
/opt/movidius/ssd-caffe/data/ILSVRC2016/labelmap_ilsvrc_clsloc.prototxt
3. test_name_size.txt
I have not a test_size.txt
Do I have to create one?
Do you have some suggestions about that?
Attached the Makefile I used for dataset preparation and to create lmdb files.
Attached again the solver.prototxt
Hoping this help.
Many thanks again
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
Could you please clarify on below queries -
1. Is the Makefile that you shared same as that you used to create lmdb. Because there are sudo commands and the caffe path is pointing to /opt/movidius/caffe which is not there in Dev Cloud.
2. Are you creating lmdb in another machine and trying to train in Dev Cloud?
Also I am attaching Makefile and create-lmdb.sh here which worked for us. Could you please try this on Dev Cloud and train caffe using the generated lmdb.
If this also does not help, please let us know a convenient time for you so that we can set up a Skype call and see what exactly the issue is.
Regards,
Deepthi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi,
2. Are you creating lmdb in another machine and trying to train in Dev Cloud?
Yes, it's!
Many thanks for the new Makefile, it was one of my doubt!
I have only changed the Caffe_Path with the new path Colfax Home dir assigned to my user:
/home/u18921/workspaceCZOO/ncappzoo/apps/dogsvscats/bvlc_googlenet/org
This is same path I used within myapplication2.py
I already used your Makefile but there is an error: [unzip] 9 error.
The Data Path in the Makefile are ok.
I suppose the system does not detect the existing train.zip and test1.zip as ZIP files.
So, I'm sending again the original files to Colfax replacing them.
Then, I run again the Make and I'll keep you informed.
Many thanks again for your useful suggestions
Cosma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi,
last question: executing the Makefile for dataset preparation, I use the Colfax login server.
Is it right?
Is it eventually possible to use a Colfax Acceleration Node also to compile and data preparation?
Thank you
Cosma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
We have to execute the Makefile from compute node, not from login node. Otherwise, it will throw memory error while creating lmdb.
Enter the compute node using the command "qsub -I" and then execute the Makefile.
Please let me know if this works for you.
Regards,
Deepthi Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi,
qsub -I Makefile run
Now I'm automatically within [u18921@c009-n089 ~]$
In this status, all command ("qstat", "nano", etc.) are not found
Questions:
sorry, in the Colfax Compute there are information using Jupyter Notebook, I'm using terminal !
1 - How can I verify the status of the job?
2 - Why the "ls" command show the /home/u18921 instead the folder where I started the job?
I'm available for a conf call today at 4:30pm CET
Below the actual output:
[u18921@c009 dogsvscats]$
[u18921@c009 dogsvscats]$ qsub -I Makefile
qsub: waiting for job 165638.c009 to start
qsub: job 165638.c009 ready
#
# Date: Wed Sep 19 00:54:28 PDT 2018
# Job ID: 165638.c009
# User: u18921
# Resources: neednodes=1:ppn=2,nodes=1:ppn=2,vmem=92gb,walltime=06:00:00
#
[u18921@c009-n089 ~]$
[u18921@c009-n089 ~]$ qstat
-bash: qstat: command not found
[u18921@c009-n089 ~]$ nano Makefile
-bash: nano: command not found
[u18921@c009-n089 ~]$
Thank you!
Cosma
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cosma,
You can execute the Makefile inside compute node in two ways.
A.) Submit the job via qsub
1. In the same folder as that of Makefile, create a file "myjob". Then add following lines.
# PBS -l nodes=1
cd $PBS_O_WORKDIR
make
The first line is a special command that requests one compute node.
The second line ensures that the script runs in the same directory as where you
have submitted it. And the third line executes the Makefile.
2. You can now submit this job as shown below:
[u100@c009 ~]# qsub myjob
This command will return a Job ID, which is the tracking number for your job.
You can track the job with:
[u100@c009 ~]# qstat
Once job is completed, the output will be in the files:
[u100@c009 ~]# cat myjob.oXXXXXX
[u100@c009~]# cat myjob.eXXXXXX
Here 'XXXXXX' is the Job ID. The .o file contains the standard output stream,and .e file contains the error stream
B.) Go to compute node and then execute the Makefile
1. Enter compute node using the command "qsub -I"
This will take you to the home folder inside compute node
2. Go to the folder which contains Makefile
3. Run the command "make"
Please check if this helps, otherwise we can have the call.
Regards,
Deepthi Raj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deepthi,
I have applied the option A)
The make command started on Colfax Nodes and finished without issue generating the expected files.
Attached the Makefile and myjob3_Make.py.e165699 file output.
No error! :-)
After this step, I executed the training data using the solveer on Colfax nodes.
Attached the file that execute the operation myapplication2.py, myappl and the error file generated myjob2.py.e165747.
The error is related to train_val.prototxt file that I have not.
Do I have to collect this file from googlenet I have on my PC and save it to the CAFFE_PATH=/home/u18921/workspaceCZOO/ncappzoo/apps/dogsvscats/bvlc_googlenet/org/ (the same path used in Makefile as CAFFE_PATH)?
or
Do I have to change the CAFFE_PATH in Makefile to CAFFE_PATH?=/glob/intel-python/python3/bin (the Caffe path in Colfax systems) and re-compile all?
Thank you again
Cosma
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page