Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Innovator
227 Views

Reconnecting an Interactive Session

Good day,

How would I reconnect to an interactive session? I started some processing and my connection dropped and I would like to reconnect.

Kind Regards

 

Tags (1)
0 Kudos
14 Replies
Highlighted
227 Views

Hi,

Thanks for reaching out to us.

If you are running a python code in a compute node and connection dropped due to some network issues, the program might have killed and you can't reconnect to the same interactive session again.

But if your question is on how to connect to the same compute node again, you can connect to that with the following command if you remember the node number:

qsub -I -l nodes=<Node>:ppn=2 
Eg: qsub -I -l nodes=s001-n106:ppn=2 

However if you have submitted a job in Devcloud it will be running in the background and connection issue won't affect that one.

You can view the submitted jobs using the "qstat" command.

This command will show all the jobs that are submitted and state such as in Running State or is in Queue.Please find the attachment.

Hope this clarifies your question. Feel free to reach out to us if you have any further queries.

Thanks.

0 Kudos
Highlighted
Innovator
227 Views

Thank you Lakshmi.

Thus
- if an interactive session has been started
- and some processing is being performed
- if the internet connection dropped (loosing the ssh session)
- the processing session is closed
- and the generated data will be copied back to the user profile, overriding older data?

Is there a method to read the output stream or error stream of a submitted job that is being executed?
 

0 Kudos
Highlighted
227 Views

Thanks for the response.

Yes.If an interactive session has been started and some processing is being performed, if the internet connection dropped(loosing the ssh session),the generated data will be copied back to the user profile overriding the older data.

  • To submit a job in Devcloud

               qsub <job_script>.job

    • Once job is submitted, you can track the job using the below command:  
           qstat
    
    • To read the output and error stream of the executing job, you can use the qpeek command as below:
      qpeek -o <job_id>
      qpeek -e <job_id>         

    Please note that an output and error file will be created once the execution is completed.

    For more information regarding submitting jobs,using Intel Optimized frameworks in Intel® AI DevCloud you can refer the attached document.

    Hope this clarifies your query. Feel free to reach out to us if you have any further queries.

    0 Kudos
    Highlighted
    227 Views

    Hi, 

    Could you please confirm whether the details provided was helpful.

    Thanks.

    0 Kudos
    Highlighted
    Innovator
    227 Views

    Hi Lakshmi,

    It did clarify the specific operations thankyou. Although it does not seem qpeek works as indicated.

    I run

    uxxxxx@login-1:~/$ qstat
    Job ID                    Name             User            Time Use S Queue
    ------------------------- ---------------- --------------- -------- - -----
    438530.v-qsvr-1            script           uxxxxx          01:33:24 R batch          
    438552.v-qsvr-1            script           uxxxxx          00:34:21 R batch  

    then

    uxxxxx@login-1:~/$ qstat -f 438530.v-qsvr-1
    qstat: Unknown Job Id Error 438530.v-qsvr-1
    

    but using qstat with adjusted id works:

    uxxxxx@login-1:~/$ qstat -f 438530.v-qsvr-1.aidevcloud
    Job Id: 438530.v-qsvr-1.aidevcloud
        Job_Name = script
        Job_Owner = u33434@login-1
        resources_used.cput = 01:47:02
        ...
    

    but qpeek still does not work:

    uxxxxx@login-1:~/$ qpeek -o 438530.v-qsvr-1.aidevcloud
    qstat: Unknown Job Id Error 438530.v-qsvr-1
    Job 438530.v-qsvr-1 is not running!
    

    Am I missing something?

    Kind Regards,

     

    0 Kudos
    Highlighted
    227 Views

    Hi,

     

    You just need to give the jobid along with qpeek command. 

    Eg: qpeek -o 438530

          qpeek -e 438530

     

    Hope this clarifies your query.Please confirm whether the solution provided was helpful.

    Thanks.

    0 Kudos
    Highlighted
    Innovator
    227 Views

    Unfortunately its not working:

    uxxxxx@login-1:~$ qstat
    Job ID                    Name             User            Time Use S Queue
    ------------------------- ---------------- --------------- -------- - -----
    438530.v-qsvr-1            script           uxxxxx          03:08:46 R batch          
    438552.v-qsvr-1            script           uxxxxx          02:09:01 R batch          
    uxxxxx@login-1:~$ qpeek -o 438530
    qstat: Unknown Job Id Error 438530.v-qsvr-1
    Job 438530 is not running!
    uxxxxx@login-1:~$ 
    
    0 Kudos
    Highlighted
    227 Views

    Hi, We are able to recreate the issue from our end now. We need to contact the DevCloud Admin team regarding the same as this is a temporary issue. Will get back to you as soon as possible. Thanks.
    0 Kudos
    Highlighted
    227 Views

    Hi,

    A new account is created for you by the DevCloud team. Could you please verify and confirm whether the qpeek command is working for you in the new account provided.

    Meanwhile, we will try to fix the qpeek issue in the old account as soon as possible. 

    Thanks.

     

     

    0 Kudos
    Highlighted
    Innovator
    227 Views

    Hi,

    @Lakshmi, the new account have the same problem but I found a solution.

    It seems there is a bug in the qpeek command (I can not update it, I do not have the appropriate privileges). One can manually inspect the result as follow:

    When submitting a job, qsub provide <JobID>

    uxxxxx@login-1:~/$ qsub script
    444583.v-qsvr-1.aidevcloud

    Run qstat to find the <node> on which the job is being executed from exec_host:

    uxxxxx@login-1:~$ qstat -f 444583.v-qsvr-1.aidevcloud
    Job Id: 444583.v-qsvr-1.aidevcloud
        Job_Name = script
        Job_Owner = uxxxxx@login-1
        ...
        exec_host = s001-n102/0-1
        ...
        submit_host = login-1
    

    Use ssh and cat command to get current stdout stream:

    uxxxxx@login-1:~$ssh s001-n102 cat /var/spool/torque/spool/444583.v-qsvr-1.aidevcloud.OU
    ##############################################################
    #      Date:           Thu Dec 12 12:01:26 PST 2019
    #    Job ID:           444580.v-qsvr-1.aidevcloud
    #      User:           uxxxxx
    # Resources:           neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=24:00:00
    ##############################################################
    
    stdout of "script" etc.
    

    To get stderr:

    ssh <node> cat /var/spool/torque/spool/<JobID>.ER

    IMPORTANT:

    When executing a python script, the output of the python print command will only be written to the parent stdout when the script has finished executing. To push text to the parent stdout, or the spool that can be queried, use the following:

     os.system("echo Your python output.")

    or the preferred method:

    subprocess.run(["echo","Your python output"])

    Thank you.

     

     

    0 Kudos
    Highlighted
    New Contributor I
    227 Views

    To make qpeek work do:

    cp /usr/local/bin/qpeek $HOME/bin/

    then change 4 strings in qpeek script. This is diff to my changes:

    73d72
    <   
    83,84c82
    <   my $jb = "$jobid\.v-qsvr-1\.aidevcloud";
    <   my $qstat = "qstat -f $jb | grep '^[ ]*server =' | head -1 | awk '{print \$3}'";
    ---
    >   my $qstat = "qstat -f $jobid | grep '^[ ]*server =' | head -1 | awk '{print \$3}'";
    143,145c141
    <   my $jb = "$jobid\.v-qsvr-1\.aidevcloud";
    <   open(QSTAT, "qstat -f $jb | ");
    <   
    ---
    >   open(QSTAT, "qstat -f $jobid |");

    Check in $PATH that you "$HOME/bin" is before system "/usr/local/bin".

    Now you have local qpeek that work correct.

    0 Kudos
    Highlighted
    227 Views

    Stoltz, Gene wrote:

     

    Hi,

    @Lakshmi, the new account have the same problem but I found a solution.

    It seems there is a bug in the qpeek command (I can not update it, I do not have the appropriate privileges). One can manually inspect the result as follow:

    When submitting a job, qsub provide <JobID>

    uxxxxx@login-1:~/$ qsub script
    444583.v-qsvr-1.aidevcloud

    Run qstat to find the <node> on which the job is being executed from exec_host:

    uxxxxx@login-1:~$ qstat -f 444583.v-qsvr-1.aidevcloud
    Job Id: 444583.v-qsvr-1.aidevcloud
        Job_Name = script
        Job_Owner = uxxxxx@login-1
        ...
        exec_host = s001-n102/0-1
        ...
        submit_host = login-1
    

    Use ssh and cat command to get current stdout stream:

    uxxxxx@login-1:~$ssh s001-n102 cat /var/spool/torque/spool/444583.v-qsvr-1.aidevcloud.OU
    ##############################################################
    #      Date:           Thu Dec 12 12:01:26 PST 2019
    #    Job ID:           444580.v-qsvr-1.aidevcloud
    #      User:           uxxxxx
    # Resources:           neednodes=1:gpu:ppn=2,nodes=1:gpu:ppn=2,walltime=24:00:00
    ##############################################################
    
    stdout of "script" etc.
    

    To get stderr:

    ssh <node> cat /var/spool/torque/spool/<JobID>.ER

    IMPORTANT:

    When executing a python script, the output of the python print command will only be written to the parent stdout when the script has finished executing. To push text to the parent stdout, or the spool that can be queried, use the following:

     os.system("echo Your python output.")

    or the preferred method:

    subprocess.run(["echo","Your python output"])

    Thank you.

     

    Hi,

    Thanks for confirming that you got a work around for the issue that you had faced. However, we had informed the concerned team regarding the qpeek issue. Do you want to keep this thread open until qpeek command issue is resolved or can we go ahead and close this thread?

    Thanks.

     

    0 Kudos
    Highlighted
    Innovator
    227 Views

    Hi Lakshmi,

    Yes I believe you can close the thread.

    Kind Regards,

    0 Kudos
    Highlighted
    227 Views

    Thanks for confirming. We are closing this thread.Please feel free to reach out to us in case you face any further issues.

    0 Kudos