Success! Subscription added.
Success! Subscription removed.
Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile.
cmdPolyEval is a command line option to generate math functions. It offers an extended library of floating-point functions, together with a restricted set of fixed-point functions. It can be found in quartus\dspba\backend\<platform>
Type cmdPolyEval to get the latest version of the information below.
cmdPolyEval.exe –correctRounding –target StratixV -speedgrade 2 –frequency 250 –name myModel FPDiv 8 23 0
Where:
-correctRounding
will produce a correctly rounded result (IEEE-754 compliance) if the operator supports it, as opposed to faithful rounding if this option is not passed.
(In faithful rounding the resource utilization is slightly less).
-target StratixV -speedgrade 2 –frequency 250
(these should be passed in this order) targets the StratixV device, speedgrade 2, with pipelining to achieve 250MHz
-name MyModel
this should come before the function to be generated
FPDiv 8 23 0
generate floating point divide, 8 bits of exponent, 23 bits of mantissa
polynomial approx version 0 (see below)
-correctRounding
-faithfulRounding
-error u
u the number of ulps of error acceptable (correctRounding=0.5, faithfulRounding=1.0)
-errors N u1 u2 ... uN
if the component has N ports, individual error bounds can be specified for each port.
This influences the test-bench generation process.
It is the user responsibility to correctly set the value for N to match the number of output ports.
-target <Device>
e.g. ArriaV, CycloneIVGX, StratixIV
-speedgrade S
-frequency MHz
-pipelining type
type = 0 -> combinatorial
| 1 -> subcycle DAG-based
| 2 -> subcycle DAG-based gen2
| 3 -> large granularity (no subcycle - old way)
-name N
-enable (generates global enable signal)
-wrapper
add input and output registers
-qnan
-noTruncMult
disables the use of truncated multipliers
-fuseCoefTables
with piecewise polynomial approximation, a number of tables are created, one for each coefficient.
When this option is passed, the tables are stitched together width-wise before mapping them to memory blocks.
It allows reducing the number of blocks used, at the expense of some synchronization logic.
For instance, for a degree 2 polynomial coefficient widths would be 11, 18, 27 (3 M20K) vs 11+18+27=56. (2 M20K)
cmdPolyEval now defaults to creating flat file structures. This means it generates files into the current directory. This mode has been enhanced in order to omit the safe_path files. With it compiled in release mode, the only file now generated is the top level .vhd file.
-testbench N
/-> a testbench of N test cases can be generated.
/-> if no testbench type (example -randomTests)is selected then no vectors will actually be generated.
-randomTests
/-> runs the test bench with N test vectors (N is an input to -testbench)
-expRange eMin eMax
/-> the stimuli are generated in the eMin eMax range for floating-point inputs
-positiveStimuli
/-> the stimuli are positive
-negativeStimuli
/-> the stimuli are negative
-specialCaseTests
/-> the floating-point special values are automatically generated including zeros, inf, NaN
-cancellationTests
/-> test addition for cancellations
-nearInfinityTests
/-> tests close to the max FP number
-nearZeroTests
/-> tests close to the min FP number
-handTesting
/-> runs the associated hand built test vectors
-piTesting n w
/-> runs tests around the k*(pi/2) regions, where k=[0,n-1]. 1024 values are tested around each value
/-> the number of values around each multiple is 2*w
/-> this is useful for stressing trigonometric functions
-noChanValid
/-> no channel and valid data are generated in the stimuli and response files
-noFileGenerate
Do not write out any files
-printMachineReadable
Print information such as latency in the following machine readable format:
@@start
@filed1_name field1_value@
@filed2_name field2_value@
@@end
-allTests
An example to generate the test-bench:
cmdPolyEval.exe –pipelining 1 –correctRounding –target StratixIV –frequency 250 –name myModel FPDiv 8 23 0 -testbench 1000 -randomTests
in order to run the test-bench you need to first set
set QUARTUS_ROOTDIR_OVERRIDE = %QUARTUS_ROOTDIR%
then you need to make sure that in the myName_atb.do
quietly set compile(altera) 0
quietly set compile(altera_mf) 0
quietly set compile(lpm) 0
quietly set compile(wysiwyg) 0
is changed to
quietly set compile(altera) 1
quietly set compile(altera_mf) 1
quietly set compile(lpm) 1
quietly set compile(wysiwyg) 1
next, you can run your test:
vsim -do modelName/modelName_atb.do
FPAdd wE wF
FPAddExpert wE wF tieBreaksToEven architecrure degradeAccuracy
tieBreaksToEven = 1 (IEEE-754 RNE, works only with arch = 0)
| 0 (IEEE-754 RNA)
archtiecture = 0 single-path low resources
| 1 dual-path low latency
degradeAccuracy {0|1} 2's complement is 1's complement
FPAddN wE wF
FPSubExpert wE wF tieBreaksToEven architecrure degradeAccuracy
tieBreaksToEven = 1 (IEEE-754 RNE, works only with arch = 0)
| 0 (IEEE-754 RNA)
archtiecture = 0 single-path low resources
| 1 dual-path low latency
degradeAccuracy {0|1} 2's complement is 1's complement
FPAddSub wE wF
FPAddSubExpert wE wF tieBreaksToEven architecrure degradeAccuracy
tieBreaksToEven = 1 (IEEE-754 RNE, works only with arch = 0)
| 0 (IEEE-754 RNA)
archtiecture = 0 single-path low resources
| 1 dual-path low latency
degradeAccuracy {0|1} 2's complement is 1's complement
FPFusedAddSub wE wF
FPMul wE wF
FPMulExpert wEA wFA wEB wFB wER wFR ieeeTieBreakRule
for correctRounding, the sticky bits are not computed. Rnd=1
FPConstMul wE wF constant
FPAcc wE wF lsbA msbA maxMSBX
FPSqrt wE wF
FPDivSqrt wE wF
FPRecipSqrt wE wF
FPCbrt wE wF
FPDiv wE wF version
version = 0 -> polynomial approximation
version = 1 -> polynomial approximation + Newton-Raphson (DP only)
version = 2 -> Newton-Raphson (NYA)
FPInverse wE wF
FPFloor wE wF
FPCeil wE wF
FPRound wE wF
FPRint wE wF
FPFrac wE wF
FPMod wE wF
FPDim wE wF
FPAbs wE wF
FPMin wE wF
FPMax wE wF
FPMinAbs wE wF
FPMaxAbs wE wF
FPMinMaxFused wE wF
FPMinMaxAbsFused wE wF
FPCompare wE wF type
type: -2=LT -1=LE 0=EQ 1=GE 2=GT 3=NEQ
FPCompareFused wE wF
(select line will select among LT, LE, EQ, GE, GT)
FPLn wE wF
FPLn1px wE wF
implements ln(1+x)
FPLog10 wE wF
FPLog2 wE wF
FPExp wE wF
FPExpFPC wE wF
FPExpM1 wE wF
FPExp2 wE wF
FPExp10 wE wF
FPPowr wE wF
FPSinX wE wF
FPCosX wE wF
FPSinCosX wE wF
FPTanX wE wF
FPCotX wE wF
FPArcsinX wE wF
FPArcsinPi wE wF
FPArccosX wE wF
FPArccosPi wE wF
FPArctanX wE wF
FPArctanPi wE wF
FPArctan2 wE wF
FPSinPiX wE wF
FPCosPiX wE wF
FPTanPiX wE wF
FPCotPiX wE wF
FPHypot wE wF
FPRangeReduction wE wF
FPFusedHorner wE wF r d a_{0} a_{1} ... a_{d}
FPFusedHornerExpert wE wF r g pOut pIn maxInExp maxCS d a_{0} a_{1} ... a_{d}
maxCS maximum cancellation size
g number of guard bits in tables
pOut res is positive (avoids final 2's complement)
pIn input is positive (avoids initial 2's complement)
maxInExp (-8/-9 typically)
FPFusedHornerMulti wE wF r d m a_{0} a_{1} ... a_{d}
m polynomials will be implemented using mults and adds
coefficient values are (d+1)*m coefs are read
r {0|1} restricted range x<=1
FPFusedMultiFunction wE wF
builds a multifunction block with Min/Max/MinMag/MaxMag
<=/</==/>=/>/!=/Saturate/Mux3:1
FXPSin precIn precOut
FXPTruncMult precInX precInY precOut
FXPTruncMultSigned precInX precInY precOut
FXPTruncMultSignedUnsinged precInX precInY precOut
FXPConstMult precInX constant precOut
FXPConstMultSigned precInX constant precOut
FXPFusedMultiFunction w
builds a multifunction block with: and, or, xor, nandn nor,
xnor, inv, bitrev, EQ, NE, GE, LT, min, max, neg, abs,
redAnd, redOr, mux3:1, bitextract
FXPDivUI w
unsigned integer divider, w-bit inputs, 2w-bit output
FXPDivU wX fX wY fY wR fR
unsigned fixed-point divider, input and output formats provided
FXPToFP w f s wE wF
FPToFXP wE wF w f s
FPToFXPExpert wE wF w f s r
r = 1 for rounding to nearest, r = 0 for truncation
FPToFXPFused wE wF w f s
a dynamic input line selects between truncation = 0 and
round to nearest iteger = 1
FPToFP wEIn wFIn wEOut wFOut
The component is generated as if a DspBuilderAdvanced primitive subsystem (if you’re familiar with DspBuilderAdvanced ) – and there are several DspBuilderAdvanced ports that aren’t strictly necessary if using the component on its own.
xIn_v : this is the DspBuilderAdvanced valid input signal – you’ll see this just goes through with a delay of 17 cycles. i.e. if this signal goes high just when you put your first data through then the output valid will go high 17 cycles later when the result comes out. If you don’t want it it’s safe to remove this with the corresponding output and delay registers.
xIn_c : this is the DspBuilderAdvanced channel input signal – Likewise safe to remove along with the delay (implemented in a memory + registers) and output. It there to help you perhaps keep track of channelized data flowing through the divide.
xIn_0 : first data input (recent componets have the fist data port named a)
xIn_1 : second dat input (if it exists, will probably be called b)
xOut_v : this is the DspBuilderAdvanced valid output signal (see notes on xIn_v above)
xOut_c : this is the DspBuilderAdvanced channel output signal (see notes on xIn_c above)
xOut_0 : result (probably set to q)
clk : clock signal
areset : asynchronous clear
bus_clk : this is a DspBuilderAdvanced bus clock port for when the design contains Avalon-MM slave interfaces. Here it’s not connected to anything, so can be removed.
h_areset : this is a DspBuilderAdvanced bus reset port for when the design contains Avalon-MM slave interfaces. Here it’s not connected to anything, so can be removed.
Using a Windows installation of ACDS 12.1(+) with DspBuilder, you should go to the folder
…\quartus\dspba\Blocksets\BaseBlocks\windows64
where you will find the executable CmdPolyEval.exe
Using a Linux installation of ACDS 12.1(+) with DspBuilder, you should go to the folder
$QUARTUS_ROOTDIR/dspba/Blocksets/BaseBlocks/linux64
where you will find the executable cmdPolyEval. You must invoke cmdPolyEval in a directory that you have write permissions and must ensure that LD_LIBRARY_PATH contains directories containing shared libraries used by cmdPolyEval. The following Bash command will ensure LD_LIBRARY_PATH includes the required directories:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$QUARTUS_ROOTDIR/dspba/Blocksets/BaseBlocks/linux64:$QUARTUS_ROOTDIR/linux64
Using the command-line tool:
> cmdPolyEval
should bring up the help, explaining the cores which can be generated but also some details on the generation options. When generating a component with the corresponding test-vectors you will execute something like:
> cmdPolyEval.exe -subcycle -target CycloneIVE –frequency 150 FPMin 8 23 -testbench 100
which will generate the topModel/ folder in the current folder containing the VHDL code for the components and 100 test-vectors. In this example we are generating the floating point Minimum function. For the test-vectors, you should be able to find:
topModel_xIn.stm
topModel_xOut.stm
The first file contains the input stimuli and the second one contains the output response.
One line of the input stimuli file looks like this for FPMin:
Valid_Line Channel_Line X Y
1 00000000 01000000100000000000000000000000 11000000100000000000000000000000
Here 01000000100000000000000000000000 11000000100000000000000000000000 are the input stimuli for this test-vector, in binary, IEEE-754 Single Precision notation - e.g. X here is interpretted as
0 10000001 00000000000000000000000
S EXP FRAC
The second file topModel_xOut.stm, will contain the outputs corresponding to the inputs. The first m lines of the file, in the case the operator is pipelined and has a latency of m cycles will be just zeros, corresponding to the time needed for the first set of inputs to reach the output. The first output line you should be interested in has a leading one:
Valid_Line Channel_Line R_low R_high
1 00000000 11000000100000000000000000000000 11000000100000000000000000000000
Again, you may ignore the first two chunks after performing the detection. The next two chunks (identical for this operator, but different by one unit in the last place for faithfully rounded functions), represent the corresponding output value for the test-case in the input file. Essentially, the operator passes the test-vector if the output is any of these two value. This is as close as possible to obtaining the test-vectors without going through the Simulink interface (without using a license).
Note; when generating you may also restrict the exponent range,
cmdPolyEval.exe -pipelinig 1 -target CycloneIVE –frequency 150 FPMin 8 23 -testbench 100 –expRange -5 5 -randomTests
(generates inputs roughly in the range (-63,63), with the closest to zero being 2^-5 * 1.000000XXX). Also, in order to generate positive stimuli only, you scan use
cmdPolyEval.exe -pipelining 1 -target CycloneIVE –frequency 150 FPMin 8 23 -testbench 100 –expRange -5 5 -positiveStimuli -randomTests
Community support is provided Monday to Friday. Other contact methods are available here.
Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
For more complete information about compiler optimizations, see our Optimization Notice.