Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6877 Discussions

## Student's t Test Valued Contributor II
975 Views

Is the Student's t Test included in mkl?

11 Replies Moderator
973 Views

what do you mean by a Student Test? Black Belt
962 Views

It would be useful to have a function in MKL to obtain the inverse of the cumulative Student t-distribution function:

t = tinv(p, n_f)

where p is the probability and n_f is the number of degrees of freedom. This function is available in Matlab, R (tinv), IMSL (tin), etc. It is useful in obtaining the p-percent confidence intervals of the parameter values obtained from fitting an expression to data using nonlinear least squares (MKL TRNLSP). Moderator
955 Views

Thanks for the clarification, mecej. I see what John wanted to see. We will try to add this topic to further plans and will keep this thread informed. Employee
939 Views

Hi John,

in oneMKL we have Random Number Generators component where we provide distribution generators (routines to generate random numbers sequences with the specific statistical properties).

Could you please clarify – are you interested in tinv(p, n_f) like function only? Or also in Student t-distribution generator?

Best regards,
Pavel Valued Contributor II
932 Views

Pavel:

The most common statistical test in experimental work after the mean and the standard deviation is the Student's t Test.  It is a very simple test to use - essentially it takes 2 sets of numbers and tells you if they are different and how different.  Guinness  Brewery invented for quality control.

An example, the average age of a kindergarten class is 5.  And the average age of the people who use this forum would be 30+. The two are distinct.  Student's t Test allows you to quantify it. It has a fairly simple number >2 different <2 same, is a good approximation.

For large data sets it is the best test.   The reason we need it in Fortran is where there are more than 1.2 million data points, EXCEL will not do an analysis.  At the moment I am looking at sets with 6.5 million records, only Fortran or C will handle this quickly.

In searching for MKL - I was surprised how many of the many standard programs like MAPLE, NASTRAN, AutoCAD use mkl. Valued Contributor II
926 Views

A quick and dirty attempt at the Student's t Test for million element sets, where I pick out the first peak between indices of the count array from 89 to 101 provides a set test data to determine if the natural frequency changes with time.  It appears to - each million elements is 100 days and each million elements contains about 4% density in the required range.

``````  subroutine mean(i,j,countFZ, meanA, stdevA,countFZA,meanFZA, standevFZA, flag)
implicit none

integer i, j, k, l
integer countFZ(kl)
integer countFZA(kl, mn_9)
REAL (KIND=dp) meanFZA(mn_9),standevFZA(mn_9)
integer flag
REAL (KIND=dp) counter, count, counterV, countold,t, sp,sp1
REAL (KIND=dp) meanA,  variance, stdevA, meanold, varianceold

counter = 0.0
count = 0.0
counterV = 0.0
meanA = 0.0
variance = 0.0

do 100 k = i,j

counter = counter + real(countFZ(k)) * real(k)
count = count + real(countFZ(k))
! write(*,1000)k, countFZ(k), counter, count
1000    Format(i7,3('  ',f12.0) )

100 end do

meanA = real(counter)/real(count)
count = 0.0
do 101 k = i,j

counterV = counterV + (real(countFZ(k))) * ((real(k) - meanA) ** 2.0)
count = count + real(countFZ(k))

! write(*,1001)k, countFZ(k), counter, counterV, count
1001    Format(i7,4('  ',f12.0) )

101 end do

variance = real(counterV)/real(count)

write(*,500)0, meanA, variance, sqrt(variance), count
meanold = meanA
varianceold = variance
countold = count

500 Format(" Data set  :: ", i4,3('  ',f10.3), ' Count :: ',f7.0, ' Sp : ', f10.3, ' sp1 : ',f10.3,' t-Stat : ',f10.3)
stdevA = sqrt(variance)

write(*,120)flag
120 Format("       Flag for the data size :: ", i3)

do 300 l = 1, flag
counter = 0
count = 0
do 200 k = i,j

counter = counter + real(countFZA(k,l)) * real(k)
count = count + real(countFZA(k,l))
!  write(*,1000)k, countFZA(k,l), counter, count

200     end do
meanFZA(l) = counter/count

count = 0.0
counterV = 0.0
do 400 k = i,j

counterV = counterV + (real(countFZA(k,l))) * ((real(k) - meanFZA(l)) ** 2.0)
count = count + real(countFZA(k,l))

!   write(*,1001)k, countFZA(k,l), counter, counterV, count

400     end do

variance = counterV/count
standevFZA(l) = sqrt(variance)
sp = sqrt((((countold-1.0)*varianceold)+((count-1.0)*variance))/(countold+count-2))
sp1 = sp*sqrt((1.0/countold)+(1.0/count))
t = (meanFZA(l) - meanold)/sp1

write(*,500)l,meanFZA(l),variance,sqrt(variance),count,sp,sp1,t
meanold = meanFZA(l)
varianceold = variance
countold = count

300 end do

return
end subroutine`````` I took the standard algorithm from wikipedia -- not equal means or variances.

A t-Stat of 31 to 80 represents an almost zero chance they are the same - mean is 95 column and the variance is the 1.15 column.  The counts make is so accurate.

Got to check it and put in error checking.

There is a temperature dependence on the stiffness, but it is impossible to discern at 100 day blocks with any accuracy. Employee
908 Views

John,

thanks a lot for the detailed description.

In oneMKL we haven’t the specific routine for Student’s t-test.
But oneMKL contains Summary Statistics component which can be useful here.

oneMKL Summary Statistics provides routines that compute basic statistical estimates for single and double precision multi-dimensional datasets.

As I understand mean and variance computation routines (supported in oneMKL Summary Statistics) could help to reduce the code and increase performance of the Student’s t-test application.

Fortran APIs description can be found here: https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/statistical-functions/summary-statistics.html

Best regards,
Pavel Valued Contributor II
903 Views

Pavel:

Thank you for the explanation.

The data is in a format where is is simpler to take the first moment and then the second moment to get the variance.

The interesting issue is the amplitude of the t-Stat -- it is way beyond the published tables. So I will need to encode estimating the probability as well.  A 1:1000 p value has a t-Stat of about 3.5 for this number of elements, but we are at 37.  Interestingly the same problem arises in EXCEL with large data sets when you start looking for 5 and 6 standard deviations p estimates.

I am pretty sure you have the Box Mueller Random Number generator.  There is a standard reference that some have used for discussing the Box Mueller Theory from the 1950s.  When I was writing a paper, I thought I should get a copy of this original paper if I wanted to reference it - you are supposed to read what you reference.  It came from a famous statistical group at a major Eastern University.  I sent them a message and asked for a pdf of the paper as I could not find it online.  I wanted to see the original math, so I could follow the Fortran.

They came back and said they had no idea, but they finally tracked down a copy and scanned it for me.  Apparently I was the first person to ask for it in a long time.  There is a brief version in a journal in 1958, but not the entire mathematics, if I remember correctly.

Everyone assumes that someone else has seen it so it is ok to reference.

The same thing happens with a famous book on Seismology in Japan. This book is quoted often, a friend once found there is one acknowledged copy of this book in a Scottish University  and no others. There are no pdf's.  Valued Contributor II
900 Views

List of Applied Statistics Algorithms ASA Algorithm 5 is the Student's test - with a non-central parameter.

It is under a Public License - https://people.sc.fsu.edu/~jburkardt/f_src/asa005/asa005.html

Professor Burkardt publishes a lot of interesting code example. Black Belt
874 Views

John,

I am not sure that this has anything to do with your issue...

>>A quick and dirty attempt at the Student's t Test for million element sets

``````integer countFZ(kl)
...
REAL (KIND=dp) counter, count, counterV, countold,t, sp,sp1
...
counter = counter + real(countFZ(k)) * real(k)
...
meanA = real(counter)/real(count)
...
counterV = counterV + (real(countFZ(k))) * ((real(k) - meanA) ** 2.0)
count = count + real(countFZ(k))
...
counter = counter + real(countFZA(k,l)) * real(k)
count = count + real(countFZA(k,l))
...
counterV = counterV + (real(countFZA(k,l))) * ((real(k) - meanFZA(l)) ** 2.0)
count = count + real(countFZA(k,l))
``````

The code is using real(dp) for variable types of the results (lhs) contained within loops where k iterates a million (millions?) of times.

the use of real(something) as opposed to real(something, kind=dp) may introduce undesirable round off errors.

Jim Dempsey Moderator
763 Views

This request will not be implemented. The issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only. 