- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Good day,
How would I reconnect to an interactive session? I started some processing and my connection dropped and I would like to reconnect.
Kind Regards
- Tags:
- General Support
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
If you are running a python code in a compute node and connection dropped due to some network issues, the program might have killed and you can't reconnect to the same interactive session again.
But if your question is on how to connect to the same compute node again, you can connect to that with the following command if you remember the node number:
qsub -I -l nodes=<Node>:ppn=2 Eg: qsub -I -l nodes=s001-n106:ppn=2
However if you have submitted a job in Devcloud it will be running in the background and connection issue won't affect that one.
You can view the submitted jobs using the "qstat" command.
This command will show all the jobs that are submitted and state such as in Running State or is in Queue.Please find the attachment.
Hope this clarifies your question. Feel free to reach out to us if you have any further queries.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Lakshmi.
Thus
- if an interactive session has been started
- and some processing is being performed
- if the internet connection dropped (loosing the ssh session)
- the processing session is closed
- and the generated data will be copied back to the user profile, overriding older data?
Is there a method to read the output stream or error stream of a submitted job that is being executed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response.
Yes.If an interactive session has been started and some processing is being performed, if the internet connection dropped(loosing the ssh session),the generated data will be copied back to the user profile overriding the older data.
- To submit a job in Devcloud
qsub <job_script>.job
- Once job is submitted, you can track the job using the below command:
qstat
- To read the output and error stream of the executing job, you can use the qpeek command as below:
qpeek -o <job_id> qpeek -e <job_id>
Please note that an output and error file will be created once the execution is completed.
For more information regarding submitting jobs,using Intel Optimized frameworks in Intel® AI DevCloud you can refer the attached document.
Hope this clarifies your query. Feel free to reach out to us if you have any further queries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please confirm whether the details provided was helpful.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lakshmi,
It did clarify the specific operations thankyou. Although it does not seem qpeek works as indicated.
I run
uxxxxx@login-1:~/$ qstat Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 438530.v-qsvr-1 script uxxxxx 01:33:24 R batch 438552.v-qsvr-1 script uxxxxx 00:34:21 R batch
then
uxxxxx@login-1:~/$ qstat -f 438530.v-qsvr-1 qstat: Unknown Job Id Error 438530.v-qsvr-1
but using qstat with adjusted id works:
uxxxxx@login-1:~/$ qstat -f 438530.v-qsvr-1.aidevcloud Job Id: 438530.v-qsvr-1.aidevcloud Job_Name = script Job_Owner = u33434@login-1 resources_used.cput = 01:47:02 ...
but qpeek still does not work:
uxxxxx@login-1:~/$ qpeek -o 438530.v-qsvr-1.aidevcloud qstat: Unknown Job Id Error 438530.v-qsvr-1 Job 438530.v-qsvr-1 is not running!
Am I missing something?
Kind Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You just need to give the jobid along with qpeek command.
Eg: qpeek -o 438530
qpeek -e 438530
Hope this clarifies your query.Please confirm whether the solution provided was helpful.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately its not working:
uxxxxx@login-1:~$ qstat Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 438530.v-qsvr-1 script uxxxxx 03:08:46 R batch 438552.v-qsvr-1 script uxxxxx 02:09:01 R batch uxxxxx@login-1:~$ qpeek -o 438530 qstat: Unknown Job Id Error 438530.v-qsvr-1 Job 438530 is not running! uxxxxx@login-1:~$
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
A new account is created for you by the DevCloud team. Could you please verify and confirm whether the qpeek command is working for you in the new account provided.
Meanwhile, we will try to fix the qpeek issue in the old account as soon as possible.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
@Lakshmi, the new account have the same problem but I found a solution.
It seems there is a bug in the qpeek command (I can not update it, I do not have the appropriate privileges). One can manually inspect the result as follow:
When submitting a job, qsub provide <JobID>
uxxxxx@login-1:~/$ qsub script 444583.v-qsvr-1.aidevcloud
Run qstat to find the <node> on which the job is being executed from exec_host:
uxxxxx@login-1:~$ qstat -f 444583.v-qsvr-1.aidevcloud Job Id: 444583.v-qsvr-1.aidevcloud Job_Name = script Job_Owner = uxxxxx@login-1 ... exec_host = s001-n102/0-1 ... submit_host = login-1
Use ssh and cat command to get current stdout stream:
uxxxxx@login-1:~$ssh s001-n102 cat /var/spool/torque/spool/444583.v-qsvr-1.aidevcloud.OU ############################################################## # Date: Thu Dec 12 12:01:26 PST 2019 # Job ID: 444580.v-qsvr-1.aidevcloud # User: uxxxxx # Resources: neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=24:00:00 ############################################################## stdout of "script" etc.
To get stderr:
ssh <node> cat /var/spool/torque/spool/<JobID>.ER
IMPORTANT:
When executing a python script, the output of the python print command will only be written to the parent stdout when the script has finished executing. To push text to the parent stdout, or the spool that can be queried, use the following:
os.system("echo Your python output.")
or the preferred method:
subprocess.run(["echo","Your python output"])
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To make qpeek work do:
cp /usr/local/bin/qpeek $HOME/bin/
then change 4 strings in qpeek script. This is diff to my changes:
73d72 < 83,84c82 < my $jb = "$jobid\.v-qsvr-1\.aidevcloud"; < my $qstat = "qstat -f $jb | grep '^[ ]*server =' | head -1 | awk '{print \$3}'"; --- > my $qstat = "qstat -f $jobid | grep '^[ ]*server =' | head -1 | awk '{print \$3}'"; 143,145c141 < my $jb = "$jobid\.v-qsvr-1\.aidevcloud"; < open(QSTAT, "qstat -f $jb | "); < --- > open(QSTAT, "qstat -f $jobid |");
Check in $PATH that you "$HOME/bin" is before system "/usr/local/bin".
Now you have local qpeek that work correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stoltz, Gene wrote:
Hi,
@Lakshmi, the new account have the same problem but I found a solution.
It seems there is a bug in the qpeek command (I can not update it, I do not have the appropriate privileges). One can manually inspect the result as follow:
When submitting a job, qsub provide <JobID>
uxxxxx@login-1:~/$ qsub script 444583.v-qsvr-1.aidevcloudRun qstat to find the <node> on which the job is being executed from exec_host:
uxxxxx@login-1:~$ qstat -f 444583.v-qsvr-1.aidevcloud Job Id: 444583.v-qsvr-1.aidevcloud Job_Name = script Job_Owner = uxxxxx@login-1 ... exec_host = s001-n102/0-1 ... submit_host = login-1Use ssh and cat command to get current stdout stream:
uxxxxx@login-1:~$ssh s001-n102 cat /var/spool/torque/spool/444583.v-qsvr-1.aidevcloud.OU ############################################################## # Date: Thu Dec 12 12:01:26 PST 2019 # Job ID: 444580.v-qsvr-1.aidevcloud # User: uxxxxx # Resources: neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=24:00:00 ############################################################## stdout of "script" etc.To get stderr:
ssh <node> cat /var/spool/torque/spool/<JobID>.ERIMPORTANT:
When executing a python script, the output of the python print command will only be written to the parent stdout when the script has finished executing. To push text to the parent stdout, or the spool that can be queried, use the following:
os.system("echo Your python output.")or the preferred method:
subprocess.run(["echo","Your python output"])Thank you.
Hi,
Thanks for confirming that you got a work around for the issue that you had faced. However, we had informed the concerned team regarding the qpeek issue. Do you want to keep this thread open until qpeek command issue is resolved or can we go ahead and close this thread?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lakshmi,
Yes I believe you can close the thread.
Kind Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for confirming. We are closing this thread.Please feel free to reach out to us in case you face any further issues.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page