Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

CPU not fully utilized under different memory usage situations

calmag
Beginner
294 Views
Hi all,

I have been puzzled with the behavior of the threading. In my example below, if the parameters and data array size are small, I can usually get the cpu usage consistent with the # of threads I specified in nThread. When I sress tested with a much larger data structure size - 10-20GB memory, which is something I eventually need, the cpu usage dropped dramatically to about 15-16% (on a 8-core computer). In this example, I have to allocate/deallocate arrays inside the loop because the massive size of the arrays. Does this allocate/deallcate cause the problem? If so, why was this not obvious in the small case, but caused problem in the larger case?

Any suggestion would be much appreciated.
!$OMP PARALLEL PRIVATE(iLooper) Firstprivate(pSize) NUM_THREADS(nThread)
!$OMP DO SCHEDULE(Dynamic)
DO iLooper = 1, UniqCT1-1
ALLOCATE(Ejd(iLooper)%unit( Noofarcs ))
ALLOCATE(Pred(iLooper)%unit( noofarcs ))
ALLOCATE(pathtmp(iLooper)%unit(maxnu_pa))
CALL RETRIEVE_VEH_PATH(Arg_OriginSet(iLooper), &
Arg_DestSet(iLooper), &
Arg_TimeSet(iLooper), &
iLooper,1,pSize)
DEALLOCATE(Ejd(iLooper)%unit)
DEALLOCATE(Pred(iLooper)%unit)
ENDDO
!$OMP END DO
!$OMP END PARALLEL
CalmagC
0 Kudos
3 Replies
SergeyKostrov
Valued Contributor II
294 Views
Hi,

Quoting calmag
...When I sress tested with a much larger data structure size - 10-20GB memory, which is something I eventually need, the cpu usage dropped dramatically to about 15-16% (on a 8-core computer)...

[SergeyK] This is expected drop in performance when a Virtual Memory ( VM )file is used and
because of this CPUs idle and don't do too much. CPUs are waiting forcompletion of I/O
operations.

I have the same problem when I try to allocate a memory for avery large matrix and I have
a significant drop in performance until the memory is completely allocated.

...I have to allocate/deallocate arrays inside the loop because the massive size of the arrays. Does this allocate/deallcate cause the problem?

[SergeyK] Yes, ifthe VM is used and try to look at the Task Manager for more details.

If so, why was this not obvious in the small case, but caused problem in the larger case?...

[SergeyK] Please see the comment above.

Best regards,
Sergey
0 Kudos
jimdempseyatthecove
Honored Contributor III
294 Views
See if you can re-use the allocations. I know your dummy stress code is likely not representative of what your application will use, but you can modify the dummy stress code to simulate re-use of the allocations.

Example, make

Change Ejd(iLooper)%unit(:) from ALLOCATABLE to POINTER. (same with Pred(iLooper)%unit(:)), pathtmp)
Create a seperate cache of these unit(:) allocations that represent your working set.
Then have your parallel do loop rotate the use of the pointers (allocating or reallocating only when necessary).
Note, each thread can have its own cache of these allocations.

Jim Dempsey


0 Kudos
tomorrowwillbefine
294 Views

Microsoft Office 2010 is actually the newest software from microsoft office 2010 keys Microsoft Corporation introduced in the last year. Its leading aims tend to be to catch the present business requirements and to be on top of every competition with regard to the international market criteria. This can be a very good idea to obtain Microsoft Office 2010 Key immediately to maintain norton antivirus keys yourself up-to-date and to present you with the vast qualified progress opportunities for success. Microsoft Office 2010 is available in both 32-bit and 64-bit editions, but attention please the two are not able to co-exist on the very same personal computer. All of the Office 2010 editions are kaspersky antivirus keys suitable for Windows XP SP3, Windows Vista and Windows 7.

www.keyyeah.com

0 Kudos
Reply