- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is the Student's t Test included in mkl?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
what do you mean by a Student Test?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would be useful to have a function in MKL to obtain the inverse of the cumulative Student t-distribution function:
t = tinv(p, n_f)
where p is the probability and n_f is the number of degrees of freedom. This function is available in Matlab, R (tinv), IMSL (tin), etc. It is useful in obtaining the p-percent confidence intervals of the parameter values obtained from fitting an expression to data using nonlinear least squares (MKL TRNLSP).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the clarification, mecej. I see what John wanted to see. We will try to add this topic to further plans and will keep this thread informed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John,
in oneMKL we have Random Number Generators component where we provide distribution generators (routines to generate random numbers sequences with the specific statistical properties).
Could you please clarify – are you interested in tinv(p, n_f) like function only? Or also in Student t-distribution generator?
Best regards,
Pavel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pavel:
The most common statistical test in experimental work after the mean and the standard deviation is the Student's t Test. It is a very simple test to use - essentially it takes 2 sets of numbers and tells you if they are different and how different. Guinness Brewery invented for quality control.
An example, the average age of a kindergarten class is 5. And the average age of the people who use this forum would be 30+. The two are distinct. Student's t Test allows you to quantify it. It has a fairly simple number >2 different <2 same, is a good approximation.
For large data sets it is the best test. The reason we need it in Fortran is where there are more than 1.2 million data points, EXCEL will not do an analysis. At the moment I am looking at sets with 6.5 million records, only Fortran or C will handle this quickly.
In searching for MKL - I was surprised how many of the many standard programs like MAPLE, NASTRAN, AutoCAD use mkl.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A quick and dirty attempt at the Student's t Test for million element sets, where I pick out the first peak between indices of the count array from 89 to 101 provides a set test data to determine if the natural frequency changes with time. It appears to - each million elements is 100 days and each million elements contains about 4% density in the required range.
subroutine mean(i,j,countFZ, meanA, stdevA,countFZA,meanFZA, standevFZA, flag)
implicit none
integer i, j, k, l
integer countFZ(kl)
integer countFZA(kl, mn_9)
REAL (KIND=dp) meanFZA(mn_9),standevFZA(mn_9)
integer flag
REAL (KIND=dp) counter, count, counterV, countold,t, sp,sp1
REAL (KIND=dp) meanA, variance, stdevA, meanold, varianceold
counter = 0.0
count = 0.0
counterV = 0.0
meanA = 0.0
variance = 0.0
do 100 k = i,j
counter = counter + real(countFZ(k)) * real(k)
count = count + real(countFZ(k))
! write(*,1000)k, countFZ(k), counter, count
1000 Format(i7,3(' ',f12.0) )
100 end do
meanA = real(counter)/real(count)
count = 0.0
do 101 k = i,j
counterV = counterV + (real(countFZ(k))) * ((real(k) - meanA) ** 2.0)
count = count + real(countFZ(k))
! write(*,1001)k, countFZ(k), counter, counterV, count
1001 Format(i7,4(' ',f12.0) )
101 end do
variance = real(counterV)/real(count)
write(*,500)0, meanA, variance, sqrt(variance), count
meanold = meanA
varianceold = variance
countold = count
500 Format(" Data set :: ", i4,3(' ',f10.3), ' Count :: ',f7.0, ' Sp : ', f10.3, ' sp1 : ',f10.3,' t-Stat : ',f10.3)
stdevA = sqrt(variance)
write(*,120)flag
120 Format(" Flag for the data size :: ", i3)
do 300 l = 1, flag
counter = 0
count = 0
do 200 k = i,j
counter = counter + real(countFZA(k,l)) * real(k)
count = count + real(countFZA(k,l))
! write(*,1000)k, countFZA(k,l), counter, count
200 end do
meanFZA(l) = counter/count
count = 0.0
counterV = 0.0
do 400 k = i,j
counterV = counterV + (real(countFZA(k,l))) * ((real(k) - meanFZA(l)) ** 2.0)
count = count + real(countFZA(k,l))
! write(*,1001)k, countFZA(k,l), counter, counterV, count
400 end do
variance = counterV/count
standevFZA(l) = sqrt(variance)
sp = sqrt((((countold-1.0)*varianceold)+((count-1.0)*variance))/(countold+count-2))
sp1 = sp*sqrt((1.0/countold)+(1.0/count))
t = (meanFZA(l) - meanold)/sp1
write(*,500)l,meanFZA(l),variance,sqrt(variance),count,sp,sp1,t
meanold = meanFZA(l)
varianceold = variance
countold = count
300 end do
return
end subroutine
I took the standard algorithm from wikipedia -- not equal means or variances.
A t-Stat of 31 to 80 represents an almost zero chance they are the same - mean is 95 column and the variance is the 1.15 column. The counts make is so accurate.
Got to check it and put in error checking.
There is a temperature dependence on the stiffness, but it is impossible to discern at 100 day blocks with any accuracy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John,
thanks a lot for the detailed description.
In oneMKL we haven’t the specific routine for Student’s t-test.
But oneMKL contains Summary Statistics component which can be useful here.
oneMKL Summary Statistics provides routines that compute basic statistical estimates for single and double precision multi-dimensional datasets.
As I understand mean and variance computation routines (supported in oneMKL Summary Statistics) could help to reduce the code and increase performance of the Student’s t-test application.
Fortran APIs description can be found here: https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/statistical-functions/summary-statistics.html
Please let me know if I could help you more here.
Best regards,
Pavel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pavel:
Thank you for the explanation.
The data is in a format where is is simpler to take the first moment and then the second moment to get the variance.
The interesting issue is the amplitude of the t-Stat -- it is way beyond the published tables. So I will need to encode estimating the probability as well. A 1:1000 p value has a t-Stat of about 3.5 for this number of elements, but we are at 37. Interestingly the same problem arises in EXCEL with large data sets when you start looking for 5 and 6 standard deviations p estimates.
I am pretty sure you have the Box Mueller Random Number generator. There is a standard reference that some have used for discussing the Box Mueller Theory from the 1950s. When I was writing a paper, I thought I should get a copy of this original paper if I wanted to reference it - you are supposed to read what you reference. It came from a famous statistical group at a major Eastern University. I sent them a message and asked for a pdf of the paper as I could not find it online. I wanted to see the original math, so I could follow the Fortran.
They came back and said they had no idea, but they finally tracked down a copy and scanned it for me. Apparently I was the first person to ask for it in a long time. There is a brief version in a journal in 1958, but not the entire mathematics, if I remember correctly.
Everyone assumes that someone else has seen it so it is ok to reference.
The same thing happens with a famous book on Seismology in Japan. This book is quoted often, a friend once found there is one acknowledged copy of this book in a Scottish University and no others. There are no pdf's.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
List of Applied Statistics Algorithms ASA Algorithm 5 is the Student's test - with a non-central parameter.
It is under a Public License - https://people.sc.fsu.edu/~jburkardt/f_src/asa005/asa005.html
Professor Burkardt publishes a lot of interesting code example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John,
I am not sure that this has anything to do with your issue...
>>A quick and dirty attempt at the Student's t Test for million element sets
integer countFZ(kl)
...
REAL (KIND=dp) counter, count, counterV, countold,t, sp,sp1
...
counter = counter + real(countFZ(k)) * real(k)
...
meanA = real(counter)/real(count)
...
counterV = counterV + (real(countFZ(k))) * ((real(k) - meanA) ** 2.0)
count = count + real(countFZ(k))
...
counter = counter + real(countFZA(k,l)) * real(k)
count = count + real(countFZA(k,l))
...
counterV = counterV + (real(countFZA(k,l))) * ((real(k) - meanFZA(l)) ** 2.0)
count = count + real(countFZA(k,l))
The code is using real(dp) for variable types of the results (lhs) contained within loops where k iterates a million (millions?) of times.
the use of real(something) as opposed to real(something, kind=dp) may introduce undesirable round off errors.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This request will not be implemented. The issue is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page