- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I wrote a randomly generated file program and generate three files filled by random numbers (0, 1)
The sizes of these files are: 5000 * 1000, 10000 * 3000, 10000 * 5000.
I just run two example with these files.
And I use cout << eigenvectors->getNumberOfRows() << " " << eigenvectors->getNumberOfColumns() << endl; to check if the size of the eigenvector is correct.
The first two file can generate eigenvector normally, But both example programs just broke when running pca using 10000 * 5000 file.
Here's the error message.
Unhandled exception at 0x7588C54F in pca_cor_dense_batch.exe: Microsoft C++ exception: daal::services::interface1::Exception at memory location 0x0035F8E8.
I wonder if there's a limit on the number of features? What's the limit of the number of samples and features?
Or do I need to use distributed computation instead of batch computation for large size file computation?
Really appreciate for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hanxi,
I found one Win7 64bit machine (2 core, 4 HT) and install 32bit DAAL there. I test both the PCA_Cor and PCA_SVD. using Win32, Debug.dynamic.threaded. as the total memory used by the program is less than 2G (3.1x of data set), the code can run without the large memory address on.
If using Debug.static.threaded, the total memory of PCA_SVD is beyond 2G (~4x of dataset(0.55G)), switch on the option. It seems both of them run ok on the machine. I attached the screencopy for your reference. Could you show how many thread and memory used by task memory?
the test code is as below:
/* Input data set parameters */
//const string dataFileName = "../data/batch/pca_normalized.csv";
const string dataFileName = "pca_dataset_5000_10000_0.csv";
const size_t nVectors = 10000;
services::SharedPtr<pca::Result> result = algorithm.getResult();
printNumericTable(result->get(pca::eigenvalues), "Eigenvalues:",1,1);
// printNumericTable(result->get(pca::eigenvectors), "Eigenvectors:");
return 0;
And as you see, when the size required by application on win32 will limited by memory and address space allowed by Window platform. So yes, please consider to distributed computation.
Best Regards
Ying
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hanxi,
I'm afraid there is limitation about 2GB limitation, either in memory buffer or in stack. What OS and 32bit or Intel 64 bit program are you working and the exception happened on which source code line?
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, the file is about 336MB.
My OS is Windows 7 64-bit and I'm using 32-bit library.
Exception happened here:
algorithm.compute();
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Then how about if using X64 platform, and 64bit library?
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
WOW, I've tested again with 10000 * 5000 and 20000 * 5000 and 10000 * 10000,
all of them work just fine with 64-bit library.
Would you tell me the reason? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hanxi
thanks for the reply. I just tried the pca_svd_dense_batch.cpp with data set 10000x5000 on Linux machine. It seems both ia32 and intel64 run fine.
I haven't 32bit install under window, just try X64bit, which run fine too. So there is problem with 32bit. we need check with developer team.
Do you build in MSVC environment? You may try to add Property— Linker— System increase the Heap Reserve Size and Stack Reserve Size see if it can workaround the problem.
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
I've followed the steps of your comment. I use visual studio 2015 and here's my steps:
Configuration Properties -> Linker -> System -> Enable Large Address (Yes)
But the error message is still the same.
Unhandled exception at 0x76E1C54F in pca_cor_dense_batch.exe: Microsoft C++ exception: daal::services::interface1::Exception at memory location 0x003BF790.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hanxi,
I found one Win7 64bit machine (2 core, 4 HT) and install 32bit DAAL there. I test both the PCA_Cor and PCA_SVD. using Win32, Debug.dynamic.threaded. as the total memory used by the program is less than 2G (3.1x of data set), the code can run without the large memory address on.
If using Debug.static.threaded, the total memory of PCA_SVD is beyond 2G (~4x of dataset(0.55G)), switch on the option. It seems both of them run ok on the machine. I attached the screencopy for your reference. Could you show how many thread and memory used by task memory?
the test code is as below:
/* Input data set parameters */
//const string dataFileName = "../data/batch/pca_normalized.csv";
const string dataFileName = "pca_dataset_5000_10000_0.csv";
const size_t nVectors = 10000;
services::SharedPtr<pca::Result> result = algorithm.getResult();
printNumericTable(result->get(pca::eigenvalues), "Eigenvalues:",1,1);
// printNumericTable(result->get(pca::eigenvectors), "Eigenvectors:");
return 0;
And as you see, when the size required by application on win32 will limited by memory and address space allowed by Window platform. So yes, please consider to distributed computation.
Best Regards
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying,
Thanks a lot.
Now I'm clear about this problem.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page