On https://devcloud.intel.com/oneapi/get-started/base-toolkit/ the build & run script both give a PBS error: /var/spool/torque/mom_priv/epilogue.parallel: line 12: /var/spool/torque/mom_priv/epilogue.d//95-nvdir.epilogue: Permission denied
We are unable to recreate the issue. The vector add sample works fine for us in DevCloud. Did you follow the exact steps mentioned in the https://devcloud.intel.com/oneapi/get-started/base-toolkit/ ?
PFA screenshot for reference.
You are running interactively. Please read my question: I am making PBS scripts as requested. Those don't work.
Build and run the sample in batch mode
The following describes the process of submitting build and run jobs to PBS.
A job is a script that is submitted to PBS through the qsub utility. By default, the qsub utility does not inherit the current environment variables or your current working directory. For this reason, it is necessary to submit jobs as scripts that handle the setup of the environment variables. In order to address the working directory issue, you can either use absolute paths or pass the -d <dir> option to qsub to set the working directory.
Create the job scripts
Create a build.sh script with the following contents.
#!/bin/bash source /opt/intel/inteloneapi/setvars.sh make clean make all
Create a run.sh script with the following contents for executing the sample.
#!/bin/bash source /opt/intel/inteloneapi/setvars.sh make run
Build and run
Jobs submitted in batch mode are placed in a queue waiting for the necessary resources (compute nodes) to become available. The jobs will be executed on a first come basis on the first available node(s) having the requested property or label.
Build the sample on a gpu node.
qsub -l nodes=1:gpu:ppn=2 -d . build.sh
Note: -l nodes=1:gpu:ppn=2 (lower case L) is used to assign one full GPU node to the job.
Note: The -d . is used to configure the current folder as the working directory for the task.
In batch mode, the commands return immediately; however, the job itself may take longer to complete. In order to inspect the job progress, use the qstat utility.
watch -n 1 qstat -n -1
Note: The watch -n 1 command is used to run qstat -n -1 and display its results every second.
Run the sample on a gpu node after the build job completes successfully.
qsub -l nodes=1:gpu:ppn=2 -d . run.sh
The best way to determine whether a job completed or not is by using the qstat utility. When a job terminates, a couple of files are written to the disk:
We are able to build/run the vector-add sample using non-interactive mode(PBS/Batch mode). PFA screenshots for reference(zip file). We suggest you to try running the sample using the interactive mode and let us know if it works. Also make sure that you are following exact steps from this link: https://devcloud.intel.com/oneapi/get-started/base-toolkit/#cpu-gpu-vector-add-sample-walkthrough.
Can you please provide us the update on whether you were able to run the sample code with suggested steps by Rahul.
Please let us know whether your issue is resolved or not.
Batch jobs still give an error:
u37709@login-2:~/sycltrain/9_sycl_of_hell$ cat build.sh
u37709@login-2:~/sycltrain/9_sycl_of_hell$ cat *e*94
/var/spool/torque/mom_priv/epilogue.parallel: line 12: /var/spool/torque/mom_priv/epilogue.d//95-nvdir.epilogue: Permission denied
(how does this markup system work? is there a way to mark a block of text as literal/monospace/no-html?)
The epilogue error has been fixed now (which was persisting previously on few of the nodes). Could you try the sample now and check if it works? Let us know in case you face any issues.