Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17267 Discussions

Parallel thread execution based on thread id

Altera_Forum
Honored Contributor II
1,202 Views

Hi, 

 

I was looking to execute different instructions by threads depending on the thread in parallel.  

 

Example, if there are 4 instructions, I1,I2,I3,I4 , and I want thread 1 should execute I1, thread 2 execute I2, and so on, in parallel - how could I do it with the help of this tool? 

 

 

Thanks.
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
532 Views

 

--- Quote Start ---  

Hi, 

Example, if there are 4 instructions, I1,I2,I3,I4 , and I want thread 1 should execute I1, thread 2 execute I2, and so on, in parallel - how could I do it with the help of this tool? 

Thanks. 

--- Quote End ---  

 

 

It's definitely possible if you're running an NDRange kernel. Look at the available work-item functions of get_global_id(uint D), get_global_size(uint D), get_local_id(uint D), get_lobal_size(uint D), etc.  

 

Once you have the thread ID numbers you can then branch off and execute your different instructions.
0 Kudos
Altera_Forum
Honored Contributor II
532 Views

Hi,  

I have done it. But the issue is that this design carries out the instructions and only the output is muxed according to the thread id. This means that the latency for the final output increases. Any other input?  

 

 

 

--- Quote Start ---  

It's definitely possible if you're running an NDRange kernel. Look at the available work-item functions of get_global_id(uint D), get_global_size(uint D), get_local_id(uint D), get_lobal_size(uint D), etc.  

 

Once you have the thread ID numbers you can then branch off and execute your different instructions. 

--- Quote End ---  

0 Kudos
Altera_Forum
Honored Contributor II
532 Views

 

--- Quote Start ---  

Hi,  

I have done it. But the issue is that this design carries out the instructions and only the output is muxed according to the thread id. This means that the latency for the final output increases. Any other input? 

--- Quote End ---  

 

 

Hmm I can't think of anything branching wise that would result in a lower latency, but have you tried created separate kernels and passing the data via channels?
0 Kudos
Reply