- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I have an OpenCL program of simulation that consists in a loop that launch 4 kernels per iteration. The execution can last hours.
I've launched this same application in Nvidia Fermi, ATI Radeon HD, Intel CPU X5650, Intel CPU E5... Now, I'm launching this application in Xeon Phi.
The problem is: I execute the application in the Xeon Phi node using the Xeon Phi as a OpenCL device (ACCELERATOR opencl type). More or less, at the second minute of execution, the mic_server process starts to consume more and more memory (RES memory in linux top command), and when this memory reaches 1GB the mic_server process dies. The compiler is ICPC 13.1.1 and the Intel Opencl version is 1.2-3.2.1.16712
Did anyone have this same problem? I appreciate any help.
Thank you in advance
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
What version of MPSS are you using? Please note that officially supported version for this release is MPSS 3.1.1.
In case this doesn't help (or you're using exactly this version) I'd like to ask for a reproducer for this issue. It could be either your entire application or some minimal, stripped down version which exposes the problem.
Thanks, Yuri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Yuri,
The MPSS version is 3.1.1. I forgot to say before that for the smallest size of the program, there are 160K iterations (4 kernels for iteration) and the problems appear after 80K iterations more or less. There are not mallocs, allocs, or any memory allocation in the iterations.
I will try to reproduce the problem but today it is impossible because I don't have time :-/ Probably at the weekend.
Thank you again :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yuri,
I've attached a similar example of my program simulator. I've used C++ Opencl wrapper. I've simplified a lot the program and now, in each iteration of the loop, I only launch a kernel and transfer 4 bytes from Device to Host. This code has a makefile and a dummy kernel that does not anything.
Currently, the execution of this code abort at 544K iterations. And the cause is that mic_server starts to consume a lot of memory until it dies.
Could the driver not be able to unallocate temporal memory that probably is used in clEnqueueReadBuffer?
https://www.dropbox.com/s/ropi10clyhgg48i/moises_break.tar.gz
Thank you for the help,
Moisés Viñas Buceta
http://gac.udc.es/~moises/index_en.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page