Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16604 Discussions

Monte Carlo Black-Scholes Asian Options Pricing Design Example

Altera_Forum
Honored Contributor II
1,031 Views

After several false starts, I have a working OpenCL environment with immense help from Nallatech. Since the type of problems I want to solve is similar to Intel's Monte Carlo BS pricing example, I wanted to try this example out on a 395AB Nallatech board which has a Stratix V GXAB processor, 32GB of memory. When compiling the example, it is failing that it can't fit. Anyone have any idea? Its a simple example, so I am surprised that it failed on this board? 

 

https://www.altera.com/support/support-resources/design-examples/design-software/opencl/black-scholes.html 

 

Thanks, 

 

QG
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
349 Views

Please attach the *kernel_name*.log and quartus_sh_compile.log files from the compilation folder.

0 Kudos
Altera_Forum
Honored Contributor II
349 Views

 

--- Quote Start ---  

Please attach the *kernel_name*.log and quartus_sh_compile.log files from the compilation folder. 

--- Quote End ---  

 

 

It seems I ran out of DSPs :(. The DSPs got over utilized at 162%. Apparently, the example was specifically targeted for the D8 chip as opposed to the AB chip. Can I remove the unrolling of the loops to minimize the DSP utilization? Here is a snippet of the code: 

 

 

__kernel  

__attribute__((reqd_work_group_size(NUM_THREADS,1,1))) 

void black_scholes( int m, int n,  

float drift, 

float vol, 

float S_0, 

float K) 

// running statistics -- use double precision for the accumulator 

double sum = 0.0; 

 

// loop over all simulations 

for(int path=0;path<m;path++) { 

float S = S_0; 

float arithmetic_average = 0.0f; // We're not including the initial price in the average 

for (int t_i=0; t_i<n/VECTOR; t_i++) {  

float U[VECTOR], Z[VECTOR]; 

vec_float_ty U0 = read_channel_intel(RANDOM_STREAM_0); 

vec_float_ty U1 = read_channel_intel(RANDOM_STREAM_1); 

vec_float_ty U2 = read_channel_intel(RANDOM_STREAM_2); 

vec_float_ty U3 = read_channel_intel(RANDOM_STREAM_3); 

 

#pragma unroll vector_div4 

for (int i=0; i<VECTOR_DIV4; i++) { 

U=u0

U[i+1*VECTOR_DIV4]=U1

u[i+2*vector_div4]=u2

U[i+3*VECTOR_DIV4]=U3

 

#pragma unroll vector_div2  

for (int i=0; i<vector_div2; i++) { 

float2 z = box_muller(u[2*i], u[2*i+1]); 

z[2*i] = z.x; 

z[2*i+1] = z.y; 

 

#pragma unroll vector 

for (int i=0; i<vector; i++) { 

// convert uniform distribution to normal  

float gauss_rnd = z

 

// Simulate the path movement using geometric brownian motion  

S *= drift * exp(vol * gauss_rnd); 

arithmetic_average += S; 

 

It took close to 24-hours to compile the example on a 16-core 3.3Ghz, 128Gig machine! :o 

 

Thanks, 

 

QG
0 Kudos
Altera_Forum
Honored Contributor II
349 Views

Rather than removing the unrolls, it is probably best just to decrease the unroll factors up to a point where the predicted DSP utilization by the compiler goes below 100%. When you overutilize the DSPs, the mapper will try to map the extra functions to logic, which will significantly increase the logic usage and complicate placement and routing and result in much longer placement and routing time.

0 Kudos
Reply