Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1596 Discussions

Stratix 10 Job Submission Issues (DevCloud for FPGA)

Wes
New Contributor I
961 Views

I am trying to migrate from using the Arria 10 nodes to using the Stratix 10 nodes. I'm not sure if there is a bug with the job submission mechanism for S10 nodes, or if I am doing something wrong. When I submit a batch script requesting an S10 node, it gets run on an Arria 10 node.

According to the pbsnodes​ command, there are 6 total nodes in the pool on the v-qsvr-fpga​ server: 

 

u76671@login-2:~/git/gp-devcloud$ pbsnodes -s v-qsvr-fpga | grep stratix10
     properties = xeon,skl,gold6130,ram192gb,net1gbe,fpga_opencl,x2go,fpga,stratix10,darby
     properties = xeon,skl,gold6130,ram192gb,net1gbe,fpga_opencl,x2go,fpga,stratix10,darby
     properties = xeon,skl,gold6130,ram192gb,net1gbe,fpga_opencl,x2go,fpga,stratix10,darby
     properties = xeon,skl,gold6130,ram192gb,net1gbe,fpga_opencl,x2go,fpga,stratix10,darby
     properties = xeon,skl,gold6130,ram192gb,net1gbe,fpga_opencl,x2go,fpga,stratix10,darby
     properties = xeon,skl,gold6130,ram192gb,net1gbe,fpga_opencl,x2go,fpga,stratix10,darby

 

The nodes listed are s005-[n005,n006,n008,n009,n010,n011]. 

When I submit a job using the following command, it always assigns the job to an arria10 node (in this case, node s001-n138): 

 

qsub -q batch@v-qsvr-fpga -l nodes=stratix10:ppn=2 -l nodes=stratix10:ppn=2 s10_run.sh

 

The output from the beginning of my script is this:

 

########################################################################
#      Date:           Thu Oct  6 11:16:28 PDT 2022
#    Job ID:           36890.v-qsvr-fpga.aidevcloud
#      User:           u76671
# Resources:           neednodes=stratix10:ppn=2,nodes=stratix10:ppn=2,walltime=06:00:00
########################################################################

Logged into node: s001-n138.aidevcloud
Running lspci
3b:00.0 Processing accelerators: Intel Corporation Device 09c4

 

This shows that the job was assigned to `s001-n138`, which is an Arria 10 node. Confirmed by the PCI device ID of the connected PAC.

Labels (1)
0 Kudos
1 Solution
Christoph9
New Contributor II
934 Views

Hey,

 

I think I already had a similar problem. Try replaying

nodes=stratix10:ppn=2

with

nodes=1:stratix10:ppn=2

 

Hope this could help,
Christoph

View solution in original post

6 Replies
Christoph9
New Contributor II
935 Views

Hey,

 

I think I already had a similar problem. Try replaying

nodes=stratix10:ppn=2

with

nodes=1:stratix10:ppn=2

 

Hope this could help,
Christoph

Wes
New Contributor I
917 Views

Tested this out with the `-I` flag to start an interactive job, and it looks like this worked, so I'll try a regular batch job now. Thanks!

Any idea what the `1:` implies?

0 Kudos
Christoph9
New Contributor II
903 Views
Happy that this also worked for you!

The 1 tells qsub that you want to get one node for your job. What qsub is doing internally when omitting this is a little miracle to me. It seems to just takes the first free node it finds and ignores the property (maybe because it tries to convert the property to a number?).
0 Kudos
BoonBengT_Intel
Moderator
801 Views

Hi @Wes,


Greetings, thank you for posting in Intel community forum and hope all is well. 

Good to know that the interactive job works for you.


Just checking in to see if there is any further doubts in regards to this matter.

Hope your doubts has been clarified.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
735 Views

Hi @Wes,


Greetings, just checking in to see if there is any further doubts in regards to this matter.

Hope we have clarify your doubts.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
713 Views

Hi @Wes,


Greetings, as we do not receive any further clarification/updates on the matter, hence would assume challenge are overcome. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.


Best Wishes

BB


0 Kudos
Reply