topic Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware in IntelĀ® Moderncode for Parallel Architectures
https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalable-Parallel-implementation-of-Conjugate-Gradient-Linear/m-p/1029175#M6658
<P>Hello,<BR />
<BR />
<BR />
My Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware is here, now you<BR />
don't need to allocate your arrays in different NUMA-nodes, cause i have implemented all the NUMA functions for you, this new algorithm<BR />
is NUMA-aware and cache-aware and it's really scalable on NUMA-architecture and on multicores, so if you have a NUMA architecture just run the "test.pas" example that i have included on the zipfile and you will notice that my new algorithm is really scalable on NUMA architecture.<BR />
<BR />
Frankly i think i have to write something like a PhD paper to explain more my new algorithm , but i will let it at the moment as it is... perhaps i will do it in the near future.<BR />
<BR />
This scalable Parallel library is especially designed for large scale industrial engineering problems that you find on industrial Finite element problems and such, this scalable Parallel library was ported to both FreePascal and all the Delphi XE versions, hope you will find it really good.</P>
<P>My new algorithm contains two parts that are the most expensive, and those two parts are: a vector multiplication by a transpose of a matrix, and a vector multiplication by a matrix, but when i have parallelized my previous algorithm, i have parallelized just the memory data cache transfer from the L2 cache-line hit to the CPU that costs around 10 CPU cycles for every double type, and i have parallelized also the multiplication of two doubles and addition of two doubles, but this was not enough, cause what we have to do also is parallelize the memory data transfers from the memory to the L2 cache , and this is what we call a NUMA aware algorithm that really scale on NUMA architecture, and this is what i have done in my new algorithm, the memory data transfers from memory to the L2 cache was also parallelized and this have made my new algorithm NUMA aware and really scalable on NUMA architecture and my new algorithm is also cache-aware.</P>
<P><SPAN style="font-size: 1em; line-height: 1.5;">You can download my Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware from:</SPAN></P>
<P><BR />
<A class="moz-txt-link-freetext" href="https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware">https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware</A><BR />
<BR />
<BR />
Thank you,<BR />
Amine Moulay Ramdane. <BR />
</P>Fri, 19 Dec 2014 19:49:09 GMTaminer102014-12-19T19:49:09ZScalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware
https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalable-Parallel-implementation-of-Conjugate-Gradient-Linear/m-p/1029175#M6658
<P>Hello,<BR />
<BR />
<BR />
My Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware is here, now you<BR />
don't need to allocate your arrays in different NUMA-nodes, cause i have implemented all the NUMA functions for you, this new algorithm<BR />
is NUMA-aware and cache-aware and it's really scalable on NUMA-architecture and on multicores, so if you have a NUMA architecture just run the "test.pas" example that i have included on the zipfile and you will notice that my new algorithm is really scalable on NUMA architecture.<BR />
<BR />
Frankly i think i have to write something like a PhD paper to explain more my new algorithm , but i will let it at the moment as it is... perhaps i will do it in the near future.<BR />
<BR />
This scalable Parallel library is especially designed for large scale industrial engineering problems that you find on industrial Finite element problems and such, this scalable Parallel library was ported to both FreePascal and all the Delphi XE versions, hope you will find it really good.</P>
<P>My new algorithm contains two parts that are the most expensive, and those two parts are: a vector multiplication by a transpose of a matrix, and a vector multiplication by a matrix, but when i have parallelized my previous algorithm, i have parallelized just the memory data cache transfer from the L2 cache-line hit to the CPU that costs around 10 CPU cycles for every double type, and i have parallelized also the multiplication of two doubles and addition of two doubles, but this was not enough, cause what we have to do also is parallelize the memory data transfers from the memory to the L2 cache , and this is what we call a NUMA aware algorithm that really scale on NUMA architecture, and this is what i have done in my new algorithm, the memory data transfers from memory to the L2 cache was also parallelized and this have made my new algorithm NUMA aware and really scalable on NUMA architecture and my new algorithm is also cache-aware.</P>
<P><SPAN style="font-size: 1em; line-height: 1.5;">You can download my Scalable Parallel implementation of Conjugate Gradient Linear System solver library that is NUMA-aware and cache-aware from:</SPAN></P>
<P><BR />
<A class="moz-txt-link-freetext" href="https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware">https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware</A><BR />
<BR />
<BR />
Thank you,<BR />
Amine Moulay Ramdane. <BR />
</P>Fri, 19 Dec 2014 19:49:09 GMThttps://community.intel.com/t5/Intel-Moderncode-for-Parallel/Scalable-Parallel-implementation-of-Conjugate-Gradient-Linear/m-p/1029175#M6658aminer102014-12-19T19:49:09Z