Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware

aminer10
Novice
392 Views

Hello,


My Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware is here, now you
don't need to allocate your arrays in different NUMA-nodes, cause i have implemented all the NUMA functions for you, this new algorithm
is NUMA-aware and cache-aware and it's really scalable on NUMA-architecture and on multicores, so if you have a NUMA architecture just run the "test.pas" example that i have included on the zipfile and you will notice that my new algorithm is really scalable on NUMA architecture.

Frankly i think i have to write something like a PhD paper to explain more my new algorithm , but i will let it at the moment as it is... perhaps i will do it in the near future.

This scalable Parallel library is especially designed for large scale industrial engineering problems that you find on industrial Finite element problems and such, this scalable Parallel library was ported to both FreePascal and all the Delphi XE versions, hope you will find it really good.

My new algorithm contains two parts that are the most expensive, and those two parts are: a vector multiplication by a transpose of a matrix, and a vector multiplication by a matrix, but when i have parallelized my previous algorithm, i have parallelized just the memory data cache transfer from the L2 cache-line hit to the CPU that costs around 10 CPU cycles for every double type, and i have parallelized also the multiplication of two doubles and addition of two doubles, but this was not enough, cause what we have to do also is parallelize the memory data transfers from the memory to the L2 cache , and this is what we call a NUMA aware algorithm that really scale on NUMA architecture, and this is what i have done in my new algorithm, the memory data transfers from memory to the L2 cache was also parallelized and this have made my new algorithm NUMA aware and really scalable on NUMA architecture and my new algorithm is also cache-aware.

You can download my Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware from:


https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware


Thank you,
Amine Moulay Ramdane. 
 

0 Kudos
0 Replies
Reply