It would be useful to have a function in MKL to obtain the inverse of the cumulative Student t-distribution function:
t = tinv(p, n_f)
where p is the probability and n_f is the number of degrees of freedom. This function is available in Matlab, R (tinv), IMSL (tin), etc. It is useful in obtaining the p-percent confidence intervals of the parameter values obtained from fitting an expression to data using nonlinear least squares (MKL TRNLSP).
in oneMKL we have Random Number Generators component where we provide distribution generators (routines to generate random numbers sequences with the specific statistical properties).
Could you please clarify – are you interested in tinv(p, n_f) like function only? Or also in Student t-distribution generator?
The most common statistical test in experimental work after the mean and the standard deviation is the Student's t Test. It is a very simple test to use - essentially it takes 2 sets of numbers and tells you if they are different and how different. Guinness Brewery invented for quality control.
An example, the average age of a kindergarten class is 5. And the average age of the people who use this forum would be 30+. The two are distinct. Student's t Test allows you to quantify it. It has a fairly simple number >2 different <2 same, is a good approximation.
For large data sets it is the best test. The reason we need it in Fortran is where there are more than 1.2 million data points, EXCEL will not do an analysis. At the moment I am looking at sets with 6.5 million records, only Fortran or C will handle this quickly.
In searching for MKL - I was surprised how many of the many standard programs like MAPLE, NASTRAN, AutoCAD use mkl.
A quick and dirty attempt at the Student's t Test for million element sets, where I pick out the first peak between indices of the count array from 89 to 101 provides a set test data to determine if the natural frequency changes with time. It appears to - each million elements is 100 days and each million elements contains about 4% density in the required range.
subroutine mean(i,j,countFZ, meanA, stdevA,countFZA,meanFZA, standevFZA, flag) implicit none integer i, j, k, l integer countFZ(kl) integer countFZA(kl, mn_9) REAL (KIND=dp) meanFZA(mn_9),standevFZA(mn_9) integer flag REAL (KIND=dp) counter, count, counterV, countold,t, sp,sp1 REAL (KIND=dp) meanA, variance, stdevA, meanold, varianceold counter = 0.0 count = 0.0 counterV = 0.0 meanA = 0.0 variance = 0.0 do 100 k = i,j counter = counter + real(countFZ(k)) * real(k) count = count + real(countFZ(k)) ! write(*,1000)k, countFZ(k), counter, count 1000 Format(i7,3(' ',f12.0) ) 100 end do meanA = real(counter)/real(count) count = 0.0 do 101 k = i,j counterV = counterV + (real(countFZ(k))) * ((real(k) - meanA) ** 2.0) count = count + real(countFZ(k)) ! write(*,1001)k, countFZ(k), counter, counterV, count 1001 Format(i7,4(' ',f12.0) ) 101 end do variance = real(counterV)/real(count) write(*,500)0, meanA, variance, sqrt(variance), count meanold = meanA varianceold = variance countold = count 500 Format(" Data set :: ", i4,3(' ',f10.3), ' Count :: ',f7.0, ' Sp : ', f10.3, ' sp1 : ',f10.3,' t-Stat : ',f10.3) stdevA = sqrt(variance) write(*,120)flag 120 Format(" Flag for the data size :: ", i3) do 300 l = 1, flag counter = 0 count = 0 do 200 k = i,j counter = counter + real(countFZA(k,l)) * real(k) count = count + real(countFZA(k,l)) ! write(*,1000)k, countFZA(k,l), counter, count 200 end do meanFZA(l) = counter/count count = 0.0 counterV = 0.0 do 400 k = i,j counterV = counterV + (real(countFZA(k,l))) * ((real(k) - meanFZA(l)) ** 2.0) count = count + real(countFZA(k,l)) ! write(*,1001)k, countFZA(k,l), counter, counterV, count 400 end do variance = counterV/count standevFZA(l) = sqrt(variance) sp = sqrt((((countold-1.0)*varianceold)+((count-1.0)*variance))/(countold+count-2)) sp1 = sp*sqrt((1.0/countold)+(1.0/count)) t = (meanFZA(l) - meanold)/sp1 write(*,500)l,meanFZA(l),variance,sqrt(variance),count,sp,sp1,t meanold = meanFZA(l) varianceold = variance countold = count 300 end do return end subroutine
I took the standard algorithm from wikipedia -- not equal means or variances.
A t-Stat of 31 to 80 represents an almost zero chance they are the same - mean is 95 column and the variance is the 1.15 column. The counts make is so accurate.
Got to check it and put in error checking.
There is a temperature dependence on the stiffness, but it is impossible to discern at 100 day blocks with any accuracy.
thanks a lot for the detailed description.
In oneMKL we haven’t the specific routine for Student’s t-test.
But oneMKL contains Summary Statistics component which can be useful here.
oneMKL Summary Statistics provides routines that compute basic statistical estimates for single and double precision multi-dimensional datasets.
As I understand mean and variance computation routines (supported in oneMKL Summary Statistics) could help to reduce the code and increase performance of the Student’s t-test application.
Fortran APIs description can be found here: https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/statistical-functions/summary-statistics.html
Please let me know if I could help you more here.
Thank you for the explanation.
The data is in a format where is is simpler to take the first moment and then the second moment to get the variance.
The interesting issue is the amplitude of the t-Stat -- it is way beyond the published tables. So I will need to encode estimating the probability as well. A 1:1000 p value has a t-Stat of about 3.5 for this number of elements, but we are at 37. Interestingly the same problem arises in EXCEL with large data sets when you start looking for 5 and 6 standard deviations p estimates.
I am pretty sure you have the Box Mueller Random Number generator. There is a standard reference that some have used for discussing the Box Mueller Theory from the 1950s. When I was writing a paper, I thought I should get a copy of this original paper if I wanted to reference it - you are supposed to read what you reference. It came from a famous statistical group at a major Eastern University. I sent them a message and asked for a pdf of the paper as I could not find it online. I wanted to see the original math, so I could follow the Fortran.
They came back and said they had no idea, but they finally tracked down a copy and scanned it for me. Apparently I was the first person to ask for it in a long time. There is a brief version in a journal in 1958, but not the entire mathematics, if I remember correctly.
Everyone assumes that someone else has seen it so it is ok to reference.
The same thing happens with a famous book on Seismology in Japan. This book is quoted often, a friend once found there is one acknowledged copy of this book in a Scottish University and no others. There are no pdf's.
List of Applied Statistics Algorithms ASA Algorithm 5 is the Student's test - with a non-central parameter.
It is under a Public License - https://people.sc.fsu.edu/~jburkardt/f_src/asa005/asa005.html
Professor Burkardt publishes a lot of interesting code example.
I am not sure that this has anything to do with your issue...
>>A quick and dirty attempt at the Student's t Test for million element sets
integer countFZ(kl) ... REAL (KIND=dp) counter, count, counterV, countold,t, sp,sp1 ... counter = counter + real(countFZ(k)) * real(k) ... meanA = real(counter)/real(count) ... counterV = counterV + (real(countFZ(k))) * ((real(k) - meanA) ** 2.0) count = count + real(countFZ(k)) ... counter = counter + real(countFZA(k,l)) * real(k) count = count + real(countFZA(k,l)) ... counterV = counterV + (real(countFZA(k,l))) * ((real(k) - meanFZA(l)) ** 2.0) count = count + real(countFZA(k,l))
The code is using real(dp) for variable types of the results (lhs) contained within loops where k iterates a million (millions?) of times.
the use of real(something) as opposed to real(something, kind=dp) may introduce undesirable round off errors.
This request will not be implemented. The issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.