- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have the following problem with the LAPACK driver routines used for solving symmetric eigenvalue problems in C++. As the input, I have complex Hermitian matrices with the dimension ~30X30. On these matrices, I run the routines provided by the MKL. In particular I tried the routines "heev", "hpev" and "hpevd" to compute the eigenvalue decompositions including the eigenvectors.
Normally, there is no problem with theses routines. The run time is stable around a certain mean value. However, using one routine many times in a row causes problems:
- After ~ 2 000 000 function calls, at a certain point, the runtime increases to its double.
- After this peak in the run time, there are tremendous variations in the run time and peaks in run times occur regularly.
- After restarting the application, the run time gets back to normal, however, after many function calls, the run time gets unstable again.
These problems occur with the routines "heev", "hpev" and "hpevd". However, if I use only diagonal matrices for input, the runtime is stable.
What can I do to fix this issiu?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If possible, can you please upload a test program showing how you make the two million calls and how you measure the run time? I'd also suggest you run it through a profiler (such as VTune) to get some hint on where the time was spent. How is the memory space for the matrices allocated, and how is it de-allocated?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Zhang.
Let me answer your questions:
>> If possible, can you please upload a test program showing how you make the two million calls and how you measure the run time?
Making a test program is not that easy since my c++ application is just part of a larger application. I measure the run time within calling the routine zhpev ~6000 times.:
zhpev (&jobz, &uplo, &dim, (double_complex *)&par->A_H_MKL[0], (double *)&D[0], (double_complex *)&U[0][0], &ldz, par->work,
par->rwork, &info);
There is no doubt that the increase in run time after many calls is due to zhpev. After replacing this routine by some dummy routine, there are no run time issues any more.
>> How is the memory space for the matrices allocated, and how is it de-allocated?
In my opinion memory is not the issue here. All arrays are static. Checking the memory consumption in the task manager tells me that the ram is only occupied for 1/6 and only 500 MB out of 3 GB are used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm trying to reproduce your problem. Can you please let me know a few details:
- Your OS, compiler, MKL version?
- CPU information.
- Which timing function do you use?
- How do you link MKL into your code, dynamic, static, single dynamic library?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Zhang
let me answer your questions:
- Your OS, compiler, MKL version: OS: Windows Embedded Standard SP3. I use the MKL version that comes with the Code Composer XE 2013 SP1.
- CPU information. Intel Xeon E5530 @2,4 GHz
- Which timing function do you use? I use the timing function that is provided in our company.
- How do you link MKL into your code, dynamic, static, single dynamic library? I link the MKL static into my code.
All the best,
Adrian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Zhang,
Thank you for all you effort.
I found out that my problem occures only if I call the eigendecomposition routine inside a thread. To create a thread, I use the CreateThread-Routine from Windows.
I tried "mkl_set_num_threads( 1 );" but the run time issues are still present.
Are there any rules that I have to obay if I call the eigendecomposition routine inside a thread?
All the best,
Adrian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is some small code example to reproduce the run time problems. For two threads, the run time increases after ~500 time measurements (for my computer).
Let me explain my code
- I have used the multi-thread example of http://msdn.microsoft.com/en-us/library/windows/desktop/ms682516(v=vs.85).aspx
- In the thread function, I inserted the function "eigendecomposition" which contains "zhpev"
- I also included functions to measure the run time.
This is my code
#include <windows.h>
#include <tchar.h>
#include <strsafe.h>
#include "stdafx.h"
#include "mkl_lapack.h"
#include "mkl_service.h"
#include <time.h>
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <iostream>
using namespace std;
#define CACHE_LINE 32
#define CACHE_ALIGN __declspec(align(CACHE_LINE))
#define MAX_THREADS 2
#define BUF_SIZE 255
DWORD WINAPI MyThreadFunction( LPVOID lpParam );
void ErrorHandler(LPTSTR lpszFunction);
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// This is the critical function.
void Eigendecomposition();
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
typedef struct MyData {
int val1;
int val2;
} MYDATA, *PMYDATA;
int _tmain()
{
PMYDATA pDataArray[MAX_THREADS];
DWORD dwThreadIdArray[MAX_THREADS];
HANDLE hThreadArray[MAX_THREADS];
std::ofstream ofs;
double tstart;
double tend;
double proc_time_pure;
for(int j=0;j<10000;j++){
// Start one iteration
tstart = clock();
// Create MAX_THREADS worker threads.
for( int i=0; i<MAX_THREADS; i++ )
{
pDataArray[i] = (PMYDATA) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY,
sizeof(MYDATA));
if( pDataArray[i] == NULL )
{
ExitProcess(2);
}
pDataArray[i]->val1 = i;
pDataArray[i]->val2 = i+100;
// Create the thread to begin execution on its own.
hThreadArray[i] = CreateThread(
NULL, // default security attributes
0, // use default stack size
MyThreadFunction, // thread function name
pDataArray[i], // argument to thread function
0, // use default creation flags
&dwThreadIdArray[i]); // returns the thread identifier
if (hThreadArray[i] == NULL)
{
ErrorHandler(TEXT("CreateThread"));
ExitProcess(3);
}
} // End of main thread creation loop.
// Wait until all threads have terminated.
WaitForMultipleObjects(MAX_THREADS, hThreadArray, TRUE, INFINITE);
for(int i=0; i<MAX_THREADS; i++)
{
CloseHandle(hThreadArray[i]);
if(pDataArray[i] != NULL)
{
HeapFree(GetProcessHeap(), 0, pDataArray[i]);
pDataArray[i] = NULL; // Ensure address is not reused.
}
}
tend = clock();
proc_time_pure = tend-tstart;
// Print processing time into console and write it into a file
printf(" Processing time: %4.3f \n", proc_time_pure/1000.0);
ofs.open ("Processing_time.txt", std::ofstream::out | std::ofstream::app);
ofs << proc_time_pure/1000.0 << " ";
ofs.close();
}
return 0;
}
DWORD WINAPI MyThreadFunction( LPVOID lpParam )
{
HANDLE hStdout;
PMYDATA pDataArray;
TCHAR msgBuf[BUF_SIZE];
size_t cchStringSize;
DWORD dwChars;
hStdout = GetStdHandle(STD_OUTPUT_HANDLE);
if( hStdout == INVALID_HANDLE_VALUE )
return 1;
pDataArray = (PMYDATA)lpParam;
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// Critical function
Eigendecomposition();
// !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
return 0;
}
void ErrorHandler(LPTSTR lpszFunction)
{
// Retrieve the system error message for the last-error code.
LPVOID lpMsgBuf;
LPVOID lpDisplayBuf;
DWORD dw = GetLastError();
FormatMessage(
FORMAT_MESSAGE_ALLOCATE_BUFFER |
FORMAT_MESSAGE_FROM_SYSTEM |
FORMAT_MESSAGE_IGNORE_INSERTS,
NULL,
dw,
MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
(LPTSTR) &lpMsgBuf,
0, NULL );
// Display the error message.
lpDisplayBuf = (LPVOID)LocalAlloc(LMEM_ZEROINIT,
(lstrlen((LPCTSTR) lpMsgBuf) + lstrlen((LPCTSTR) lpszFunction) + 40) * sizeof(TCHAR));
StringCchPrintf((LPTSTR)lpDisplayBuf,
LocalSize(lpDisplayBuf) / sizeof(TCHAR),
TEXT("%s failed with error %d: %s"),
lpszFunction, dw, lpMsgBuf);
MessageBox(NULL, (LPCTSTR) lpDisplayBuf, TEXT("Error"), MB_OK);
// Free error-handling buffer allocations.
LocalFree(lpMsgBuf);
LocalFree(lpDisplayBuf);
}
void Eigendecomposition(){
const int M = 32;
typedef MKL_Complex16 double_complex;
const char jobz = 'V';
const char uplo = 'L'; // lower triangular part of input matrix is used
const MKL_INT dim = M;
const MKL_INT ldz = M;
const MKL_INT LWORK = (2*M-1);
const MKL_INT LRWORK = (3*M-2);
MKL_INT info = 0;
double_complex A_H_MKL[(M*M+M)/2];
CACHE_ALIGN double_complex work[LWORK];
CACHE_ALIGN double rwork[LRWORK];
double D[M];
double_complex U[M][M];
for(int i=0;i<500;i++ ){
// Create the input matrix
for (int tmp=0; tmp < (M*M+M)/2; tmp++){
A_H_MKL[tmp].real = 1 ;
A_H_MKL[tmp].imag = 0;}
// This is the mkl function
zhpev(&jobz, // const char* jobz,
&uplo, // const char* uplo,
&dim, // const MKL_INT* n,
(double_complex *)&A_H_MKL[0], // double_complex* ap,
(double *)&D[0], // double* w,
(double_complex *)&U[0][0], // double_complex* z,
&ldz, // const MKL_INT* ldz,
work, // double_complex* work,
rwork, // double* rwork,
&info); // MKL_INT* info
}
}

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page