Good day,
How would I reconnect to an interactive session? I started some processing and my connection dropped and I would like to reconnect.
Kind Regards
Hi,
Thanks for reaching out to us.
If you are running a python code in a compute node and connection dropped due to some network issues, the program might have killed and you can't reconnect to the same interactive session again.
But if your question is on how to connect to the same compute node again, you can connect to that with the following command if you remember the node number:
qsub -I -l nodes=<Node>:ppn=2 Eg: qsub -I -l nodes=s001-n106:ppn=2
However if you have submitted a job in Devcloud it will be running in the background and connection issue won't affect that one.
You can view the submitted jobs using the "qstat" command.
This command will show all the jobs that are submitted and state such as in Running State or is in Queue.Please find the attachment.
Hope this clarifies your question. Feel free to reach out to us if you have any further queries.
Thanks.
Thank you Lakshmi.
Thus
- if an interactive session has been started
- and some processing is being performed
- if the internet connection dropped (loosing the ssh session)
- the processing session is closed
- and the generated data will be copied back to the user profile, overriding older data?
Is there a method to read the output stream or error stream of a submitted job that is being executed?
Thanks for the response.
Yes.If an interactive session has been started and some processing is being performed, if the internet connection dropped(loosing the ssh session),the generated data will be copied back to the user profile overriding the older data.
qsub <job_script>.job
qstat
qpeek -o <job_id> qpeek -e <job_id>
Please note that an output and error file will be created once the execution is completed.
For more information regarding submitting jobs,using Intel Optimized frameworks in Intel® AI DevCloud you can refer the attached document.
Hope this clarifies your query. Feel free to reach out to us if you have any further queries.
Hi,
Could you please confirm whether the details provided was helpful.
Thanks.
Hi Lakshmi,
It did clarify the specific operations thankyou. Although it does not seem qpeek works as indicated.
I run
uxxxxx@login-1:~/$ qstat Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 438530.v-qsvr-1 script uxxxxx 01:33:24 R batch 438552.v-qsvr-1 script uxxxxx 00:34:21 R batch
then
uxxxxx@login-1:~/$ qstat -f 438530.v-qsvr-1 qstat: Unknown Job Id Error 438530.v-qsvr-1
but using qstat with adjusted id works:
uxxxxx@login-1:~/$ qstat -f 438530.v-qsvr-1.aidevcloud Job Id: 438530.v-qsvr-1.aidevcloud Job_Name = script Job_Owner = u33434@login-1 resources_used.cput = 01:47:02 ...
but qpeek still does not work:
uxxxxx@login-1:~/$ qpeek -o 438530.v-qsvr-1.aidevcloud qstat: Unknown Job Id Error 438530.v-qsvr-1 Job 438530.v-qsvr-1 is not running!
Am I missing something?
Kind Regards,
Hi,
You just need to give the jobid along with qpeek command.
Eg: qpeek -o 438530
qpeek -e 438530
Hope this clarifies your query.Please confirm whether the solution provided was helpful.
Thanks.
Unfortunately its not working:
uxxxxx@login-1:~$ qstat Job ID Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 438530.v-qsvr-1 script uxxxxx 03:08:46 R batch 438552.v-qsvr-1 script uxxxxx 02:09:01 R batch uxxxxx@login-1:~$ qpeek -o 438530 qstat: Unknown Job Id Error 438530.v-qsvr-1 Job 438530 is not running! uxxxxx@login-1:~$
Hi,
A new account is created for you by the DevCloud team. Could you please verify and confirm whether the qpeek command is working for you in the new account provided.
Meanwhile, we will try to fix the qpeek issue in the old account as soon as possible.
Thanks.
Hi,
@Lakshmi, the new account have the same problem but I found a solution.
It seems there is a bug in the qpeek command (I can not update it, I do not have the appropriate privileges). One can manually inspect the result as follow:
When submitting a job, qsub provide <JobID>
uxxxxx@login-1:~/$ qsub script 444583.v-qsvr-1.aidevcloud
Run qstat to find the <node> on which the job is being executed from exec_host:
uxxxxx@login-1:~$ qstat -f 444583.v-qsvr-1.aidevcloud Job Id: 444583.v-qsvr-1.aidevcloud Job_Name = script Job_Owner = uxxxxx@login-1 ... exec_host = s001-n102/0-1 ... submit_host = login-1
Use ssh and cat command to get current stdout stream:
uxxxxx@login-1:~$ssh s001-n102 cat /var/spool/torque/spool/444583.v-qsvr-1.aidevcloud.OU ############################################################## # Date: Thu Dec 12 12:01:26 PST 2019 # Job ID: 444580.v-qsvr-1.aidevcloud # User: uxxxxx # Resources: neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=24:00:00 ############################################################## stdout of "script" etc.
To get stderr:
ssh <node> cat /var/spool/torque/spool/<JobID>.ER
IMPORTANT:
When executing a python script, the output of the python print command will only be written to the parent stdout when the script has finished executing. To push text to the parent stdout, or the spool that can be queried, use the following:
os.system("echo Your python output.")
or the preferred method:
subprocess.run(["echo","Your python output"])
Thank you.
To make qpeek work do:
cp /usr/local/bin/qpeek $HOME/bin/
then change 4 strings in qpeek script. This is diff to my changes:
73d72 < 83,84c82 < my $jb = "$jobid\.v-qsvr-1\.aidevcloud"; < my $qstat = "qstat -f $jb | grep '^[ ]*server =' | head -1 | awk '{print \$3}'"; --- > my $qstat = "qstat -f $jobid | grep '^[ ]*server =' | head -1 | awk '{print \$3}'"; 143,145c141 < my $jb = "$jobid\.v-qsvr-1\.aidevcloud"; < open(QSTAT, "qstat -f $jb | "); < --- > open(QSTAT, "qstat -f $jobid |");
Check in $PATH that you "$HOME/bin" is before system "/usr/local/bin".
Now you have local qpeek that work correct.
Stoltz, Gene wrote:
Hi,
@Lakshmi, the new account have the same problem but I found a solution.
It seems there is a bug in the qpeek command (I can not update it, I do not have the appropriate privileges). One can manually inspect the result as follow:
When submitting a job, qsub provide <JobID>
uxxxxx@login-1:~/$ qsub script 444583.v-qsvr-1.aidevcloudRun qstat to find the <node> on which the job is being executed from exec_host:
uxxxxx@login-1:~$ qstat -f 444583.v-qsvr-1.aidevcloud Job Id: 444583.v-qsvr-1.aidevcloud Job_Name = script Job_Owner = uxxxxx@login-1 ... exec_host = s001-n102/0-1 ... submit_host = login-1Use ssh and cat command to get current stdout stream:
uxxxxx@login-1:~$ssh s001-n102 cat /var/spool/torque/spool/444583.v-qsvr-1.aidevcloud.OU ############################################################## # Date: Thu Dec 12 12:01:26 PST 2019 # Job ID: 444580.v-qsvr-1.aidevcloud # User: uxxxxx # Resources: neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=24:00:00 ############################################################## stdout of "script" etc.To get stderr:
ssh <node> cat /var/spool/torque/spool/<JobID>.ERIMPORTANT:
When executing a python script, the output of the python print command will only be written to the parent stdout when the script has finished executing. To push text to the parent stdout, or the spool that can be queried, use the following:
os.system("echo Your python output.")or the preferred method:
subprocess.run(["echo","Your python output"])Thank you.
Hi,
Thanks for confirming that you got a work around for the issue that you had faced. However, we had informed the concerned team regarding the qpeek issue. Do you want to keep this thread open until qpeek command issue is resolved or can we go ahead and close this thread?
Thanks.
Hi Lakshmi,
Yes I believe you can close the thread.
Kind Regards,
Thanks for confirming. We are closing this thread.Please feel free to reach out to us in case you face any further issues.
For more complete information about compiler optimizations, see our Optimization Notice.