IMSL routine substituite required

anthonyrichards · ‎10-06-2011

I have CVF application that uses an IMSL routine DZPORC to solve for polynomial roots. It also uses the IMSL routine ERSET to supppress unwanted warnings etc. I do not have IMSL installed with IVF. Is there a routine in the Intel MKL library that will substitute?

Where can I find help on the MKL library which supposedly is installed with IVF? I cannot see any reference to it in the reference guide I have located .

Steven_L_Intel1 · ‎10-06-2011

Since you're having trouble with the help, start at file:///C:/Program%20Files/Intel/Composer%20XE%202011%20SP1/Documentation/en_US/mkl/mkl_documentation.htm

mecej4 · ‎10-06-2011

You can continue to use the IMSL4 that came with CVF, using the Intel Compiler option /iface:cvf, or by declaring STDCALL linkage for those IMSL routines, but this will work only with F77 style calls, and you cannot USE any CVF-IMSL modules.

In this particular instance, you can also use Alan Miller's rpoly.f90 .

anthonyrichards · ‎10-06-2011

Really? I can use something that came with CVF in IVF? I am very surprised as I thought modules are not compatible betweeen compilers . In CVF I access the routines using USE NUMERICAL_LIBRARIES, indicating a module which normally would need to be recompiled from the original code. Are you sure?

Steven_L_Intel1 · ‎10-06-2011

No, you can't use the IMSL from CVF. The library is not link-compatible as it was compiled with CVF. You can't use the modules either.

mecej4 · ‎10-06-2011

If you use F77 style calls and do not use any of the IMSL-4 modules, you may get things the old IMSL routines to work for you sometimes. Here is the example for ZPORC from the IMSL-4 manual:

[fxfortran]      INTEGER NDEG
      PARAMETER (NDEG=3)
C
      REAL COEFF(NDEG+1)
      COMPLEX ZERO(NDEG)
      EXTERNAL WRCRN, ZPORC
      DATA COEFF/-2.0, 4.0, -3.0, 1.0/
C
      CALL ZPORC (NDEG, COEFF, ZERO)
C
      CALL WRCRN ('The zeros found are', 1, NDEG, ZERO, 1, 0)
C
      END
[/fxfortran]

I compiled and linked using

S:\lang > ifort /iface:cvf zrx.f imsl.lib imsls_err.lib dformd.lib /link /force:multiple

Note that I had to include CVF runtime library, which can have symbol clashes with the IVF runtime.

This example works, but you have no guarantee against failure.

anthonyrichards · ‎10-10-2011

I have downloaded your reference to RPOLY and tested it.
Here are some results for RPOLY compared to the results for the IMSL routine DZPORC.
The test program in the download you referenced gave the coefficients of the expanded version of the following polynomial as a test for RPOLY:

F(x) = (x-1)(x-2)(x-3)(x-4)(x-5)(x-6)(x-7)(x-8)(x-9)(x-10)=0

The solutions have no imaginary part.
I evaluated the roots and computed the (real) part of the Function at the root value (the residual)

RPOLY gives:

RPOLY EXAMPLE 1. POLYNOMIAL WITH ZEROS 1,2,...,10.
degree= 10
Real1 Imaginary1 RESIDUAL1
1.00000000236858 0.00000000000000 -0.859507825225592E-03
2.00002195897126 0.00000000000000 0.885352318640798
3.00046637606215 0.00000000000000 -4.69867384294048
3.99858534916397 0.00000000000000 -6.11660781502724
5.01275188234334 0.00000000000000 -36.6232777899131
5.94435077399900 0.00000000000000 -157.787998303305
7.13492931023351 0.00000000000000 -617.135774260387
7.85730317808534 0.00000000000000 -1196.21611317340
9.06783881416795 0.00000000000000 -3055.54442214733
9.98375235460490 0.00000000000000 -5629.92364508007

SOLUTION FROM DZPORC
Real1 Imaginary1 RESIDPORC
1.00000000000000 0.00000000000000 0.465661287307739E-09
1.99999999999978 0.00000000000000 -0.372529029846191E-08
2.99999999999955 0.00000000000000 0.931322574615479E-09
4.00000000002234 0.00000000000000 0.754371285438538E-07
4.99999999987061 0.00000000000000 0.565778464078903E-06
6.00000000035088 0.00000000000000 0.113574787974358E-05
6.99999999947034 0.00000000000000 0.332156196236610E-05
8.00000000045782 0.00000000000000 0.282796099781990E-05
8.99999999978774 0.00000000000000 0.161821953952312E-04
10.0000000000409 0.00000000000000 0.153565779328346E-04

Both routines give zero imaginary components, but clearly DZPORC gives much better accuracy straightaway. Note that the order of coefficients required for DZPORC is the exact reverse of the order required for RPOLY (the latter requiring coefficients in order of DECREASING powers of x)

The RPOLY values can be improved by expanding F(x) as a Taylor series about the zero.
Keeping only the first two terms, if Ri= the (real part of the) residual of the function evaluated at the ith root xi and if dF/dxi is the (real part of the ) gradient of the function evaluated at x=xi, then a better root value xi' is obtained using

xi'=xi-Ri/(dF/dxi)

Doing this three times for each successive approximation to the roots gives the following set of root values:

Real1 Real2 Real3 Real4
1.00000000236858 1.00000000000000 1.00000000000000 1.00000000000000
2.00002195897126 1.99999999917169 2.00000000000019 1.99999999999998
3.00046637606215 2.99999976189563 2.99999999999995 3.00000000000115
3.99858534916397 3.99999877505027 3.99999999999404 3.99999999999340
5.01275188234334 4.99996137344039 4.99999999965393 5.00000000002791
5.94435077399900 6.00114110229331 6.00000025614736 5.99999999969058
7.13492931023351 7.00400832753991 7.00000970043653 7.00000000022358
7.85730317808534 8.03868722619185 8.00142150729159 8.00000219723833
9.06783881416795 9.00644192286231 9.00006984753153 9.00000000826967
9.98375235460490 10.0007901402308 10.0000017614324 10.0000000000019

The residual function values at successive values of the roots are:

Resid1 Resid2 Resid3 Resid4
-0.859507825225592E-03 -0.465661287307739E-09 0.465661287307739E-09 -0.465661287307739E-09
0.885352318640798 -0.334051437675953E-04 0.838190317153931E-08 -0.186264514923096E-08
-4.69867384294048 0.240009278059006E-02 0.121071934700012E-07 -0.162981450557709E-07
-6.11660781502724 -0.529176509007812E-02 0.279396772384644E-08 -0.363215804100037E-07
-36.6232777899131 0.111245213076472 0.107707455754280E-05 0.137835741043091E-06
-157.787998303305 3.28711832221597 0.738595612347126E-03 -0.131968408823013E-05
-617.135774260387 -17.3584323525429 -0.419054212979972E-01 -0.747852027416229E-06
-1196.21611317340 405.968537265901 14.3510280316696 0.221468377858400E-01
-3055.54442214733 -262.614858402405 -2.81659480044618 -0.330557115375996E-03
-5629.92364508007 287.367600691039 0.639194277580827 -0.119977630674839E-04

So the RPOLY initial values require about 3 iterations of the Taylor series expansion approximation to get close to the DZPORC results. In practice, once a correction is less than a defined small amount, iterations can be stopped earlier for some roots and more iterations may be done for other roots that are slower to converge on the exact value.

I have written my own wrapper for RPOLY that uses the above method to refine its solutions and have used it in my application and found that I get acceptable results. However, an MKL polynomial root solver that gets close to the DZPORC performance would be of use.

mecej4 · ‎10-10-2011

The Fortran-77 version, from which rpoly.f90 was presumably constructed, is at Netlib. That file, in fact, has your test problem, and produces acceptable results with no need for polishing:

[bash]COEFFICIENTS
   0.1000000000000000d+01    0.0000000000000000d+00
  -0.5500000000000000d+02    0.0000000000000000d+00
   0.1320000000000000d+04    0.0000000000000000d+00
  -0.1815000000000000d+05    0.0000000000000000d+00
   0.1577730000000000d+06    0.0000000000000000d+00
  -0.9020550000000000d+06    0.0000000000000000d+00
   0.3416930000000000d+07    0.0000000000000000d+00
  -0.8409500000000000d+07    0.0000000000000000d+00
   0.1275357600000000d+08    0.0000000000000000d+00
  -0.1062864000000000d+08    0.0000000000000000d+00
   0.3628800000000000d+07    0.0000000000000000d+00

ZEROS
   0.1000000000000002d+01    0.7792703114739563d-19
   0.1999999999999929d+01   -0.7013417293629121d-18
   0.3000000000000861d+01    0.4112170258374257d-16
   0.3999999999994268d+01    0.4839294773690021d-12
   0.5000000000015782d+01   -0.2904410975693509d-11
   0.6999999999949581d+01   -0.9682737253035049d-11
   0.5999999999993388d+01    0.7260798235880263d-11
   0.8000000000104230d+01    0.7097097174811780d-11
   0.8999999999919025d+01   -0.2570980448390771d-11
   0.1000000000002294d+02    0.3162632907703995d-12[/bash]

Something is amiss in the Fortran 90 translation.

anthonyrichards · ‎10-10-2011

I agree, sometthing is amiss, but what exactly?
What you posted is impressive.

mecej4 · ‎10-10-2011

I did a little more peeking and found that there are actually two TOMS algorithms for the Jenkins Traub root finder: TOMS 419, which is the one I mentioned, is for polynomials with complex coefficients, and TOMS 493, from which Miller's F90 translation was prepared, is for polynomials with real coefficients.

I just tried the Fortran 77 version of TOMS 493, and it has the same problems that you pointed out. In this instance, then, 'new' is not 'improved'. Miller's quadruple precision version, q_rpoly.f90, does work.

As with most TOMS algorithms, one has to adapt the code by inserting machine parameters (Base, Epsilon, Tiny, Huge). I did this before running the examples.

The two codes are significantly different, and the erroneous, more recent one has the author listed on Netlib as Jenkins!

If I find out more about why there are two routines and a respectable journal accepted buggy code, and why this bug has gone unnoticed for years, I'll let you know.

It is possible that the TOMS 493 algorithm needs higher precision than double precision. A hint of that is mention of a Burroughs machine, with machine epsilon corresponding to 75 bits, in the source code. Miller does produce a quadruple precision version.

anthonyrichards · ‎10-10-2011

Thanks.
What values did you use for the machine-dependent values as listed below for the IBM360(!!!)?

      SUBROUTINE MCON(ETA,INFINY,SMALNO,BASE)                           MCON4840
C MCON PROVIDES MACHINE CONSTANTS USED IN VARIOUS PARTS OF THE
C PROGRAM. THE USER MAY EITHER SET THEM DIRECTLY OR USE THE
C STATEMENTS BELOW TO COMPUTE THEM. THE MEANING OF THE FOUR
C CONSTANTS ARE -
C ETA       THE MAXIMUM RELATIVE REPRESENTATION ERROR
C WHICH CAN BE DESCRIBED AS THE SMALLEST POSITIVE
C FLOATING-POINT NUMBER SUCH THAT 1.0D0 + ETA IS
C GREATER THAN 1.0D0.
C INFINY    THE LARGEST FLOATING-POINT NUMBER
C SMALNO    THE SMALLEST POSITIVE FLOATING-POINT NUMBER
C BASE      THE BASE OF THE FLOATING-POINT NUMBER SYSTEM USED
C LET T BE THE NUMBER OF BASE-DIGITS IN EACH FLOATING-POINT
C NUMBER(DOUBLE PRECISION). THEN ETA IS EITHER .5*B**(1-T)
C OR B**(1-T) DEPENDING ON WHETHER ROUNDING OR TRUNCATION
C IS USED.
C LET M BE THE LARGEST EXPONENT AND N THE SMALLEST EXPONENT
C IN THE NUMBER SYSTEM. THEN INFINY IS (1-BASE**(-T))*BASE**M
C AND SMALNO IS BASE**N.
C THE VALUES FOR BASE,T,M,N BELOW CORRESPOND TO THE IBM/360.
      DOUBLE PRECISION ETA,INFINY,SMALNO,BASE
      INTEGER M,N,T
      BASE = 16.0D0
      T = 14
      M = 63
      N = -65
      ETA = BASE**(1-T)
      INFINY = BASE*(1.0D0-BASE**(-T))*BASE**(M-1)
      SMALNO = (BASE**(N+3))/BASE**3
      RETURN
      END

mecej4 · ‎10-10-2011

I cheated, and use the facilities of Fortran 9x:

BASE = 2.0D0
ETA = Epsilon(1d0)
INFINY = Huge(1d0)
SMALNO = Tiny(1d0)

mecej4 · ‎10-15-2011

Here are two modifications to Alan Miller's rpoly.f90 that will improve its performance on the two examples that it comes with.

1. Tighten the tolerance by setting eta = Epsilon(1d0)*1D-5 in line-73, subroutine RPOLY.

2. Declare the variable pv, used for cumulative sums in subroutine REALIT, as REAL*16 (or, if you prefer, real(kind=SELECTED_REAL_KIND(30,300)) ) at line 500.

With these changes, for the two examples I get

[bash] EXAMPLE 1. POLYNOMIAL WITH ZEROS 1,2,...,10.
 Real part           Imaginary part
   1.00000000000000       0.00000000000000
   2.00000000000000       0.00000000000000
   3.00000000000000       0.00000000000000
   4.00000000000000       0.00000000000000
   5.00000000000000       0.00000000000000
   6.00000000000000       0.00000000000000
   7.00000000000000       0.00000000000000
   8.00000000000000       0.00000000000000
   9.00000000000000       0.00000000000000
   10.0000000000000       0.00000000000000

 Now try case where 1 is an obvious root
  Real part           Imaginary part
 -0.110018257580103E-10   1.00000000001015
 -0.110018257580103E-10  -1.00000000001015
  0.110018781586402E-10  0.999999999989854
  0.110018781586402E-10 -0.999999999989854
   1.00000000000000       0.00000000000000
[/bash]

Please let me know if these modifications work in your application (I do not know what degree polynomials you need to solve and what the distribution of their roots are like).

anthonyrichards · ‎10-16-2011

Thanks for taking the trouble. I will give that a trial when i get a chance at work tomorrow. I only require solutions to 5th degree polys and the solution with smallest absolute value.

Meanwhile, while changing my project to IVF from CVF, I used a makefile supplied by my third-party application that contained the /Qauto option. This resulted in a gotcha in RPOLY because it does not explicitly deallocate all of its allocated arrays, so the treatment of both scalars AND arrays forced by /Qauto meant allocated arrays were saved. So when the next call to RPOLY required the allocation of two particular arrays, they were found to be already allocated. The test in RPOLY for ALLOCATED is not done for all of the allocated arrays. I changed the RPOLY code to explicitly deallocate, even though switching to /Qauto_scalars prevented the problem occuring.

Thanks again.

anthonyrichards · ‎10-17-2011

With the same changes to the code, I find the same impressive increase in accuracy for the roots to the first polynomial - zero residuals for all 10 roots.

However, I get slightly different results for the other test polynomial:

Now try case where 1 is an obvious root
degree= 5
Real part Imaginary part RESIDUAL
0.197756430118995E-08 0.999999997240704 0.00000000000000
0.197756430118995E-08 -0.999999997240704 0.00000000000000
-0.197756428827899E-08 1.00000000275930 0.00000000000000
-0.197756428827899E-08 -1.00000000275930 0.00000000000000
1.00000000000000 0.00000000000000 0.00000000000000

The errors for the real parts are small but still ~100 -times the errors you post. Don't know why.

mecej4 · ‎10-17-2011

The program has a number of places where single precision reals are used (constants and variables). Again, the precision of IEEE real*4 may be less than that of the Burroughs computer used in the original Jenkins-Traub program.

It will take me some time to look through and change things as needed, For now, please try the /real-size:64 option of the Intel compiler.