Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

A new algorithm of Parallel implementation of Conjugate Gradient Sparse Linear System Solver library

aminer10
Novice
689 Views

Hello,

I have just implemented today a new parallel algorithm of a Parallel implementation of Conjugate Gradient Sparse Linear System Solver library.. this library is designed for sparse matrices of linear equations arising from industrial Finite element problems and such, and my new parallel algorithm is cache-aware and very fast..

So as you have noticed, i have implemented now two parallel algorithms, one that is cache-aware an NUMA-aware and that is scalable on NUMA architecture, and this scalable Parallel algorithm is designed for dense matrices that you find on Linear Equations arising from Integral Equation Formulations, here it is:

https://sites.google.com/site/aminer68/scalable-parallel-implementation-of-conjugate-gradient-linear-system-solver-library-that-is-numa-aware-and-cache-aware

And my new parallel algorithm that i have just implemented today is designed for sparse matrices of linear equations arising from industrial Finite element problems and such:


Read here:

https://en.wikipedia.org/wiki/Sparse_matrix
 

As you have noticed it says:


"When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are slow and inefficient when applied to large sparse matrices as processing and memory are wasted on the zeroes. Sparse data is by nature more easily compressed and thus require significantly less storage. Some very large sparse matrices are infeasible to manipulate using standard dense-matrix algorithms."

I have taken care of that on my new algorithm, i have used my ParallelHashList datastructure to store the sparse matrices of the linear systems so that it become very fast and so that it doesn't waste on the zeros, in fact my new algorithm doesn't store the zeros of the sparse matrix of the linear system.

And my new parallel algorithm that i have just implemented today is designed for sparse matrices of linear equations arising from
industrial Finite element problems and such..

Here is my new library of my new parallel algorithm:

https://sites.google.com/site/aminer68/parallel-implementation-of-conjugate-gradient-sparse-linear-system-solver

Author: Amine Moulay Ramdane

Description:

I have come up with a new algorithm of my Parallel Conjugate gradient sparse solver library, now it has become cache-aware, but you have to notice that this new cache-aware algorithm is more efficient on multicores, since i have benchmarked it against my previous algorithm and it has given a scalability of 5X on a Quadcore over the single thread of my previous algorithm , that's a really a big improvement !.

This Parallel library is especially designed for large scale industrial engineering problems that you find on industrial Finite element problems and such, this scalable Parallel library was ported to FreePascal and all the Delphi XE versions and even to Delphi 7, hope you will find it really good.

The Parallel implementation of Conjugate Gradient Sparse Linear System Solver that i programmed here is designed to be used to solve large sparse systems of linear equations where the direct methods can exceed available machine memory and/or be extremely time-consuming. for example the direct method of the Gauss algorithm takes O(n^2) in the back substitution process and is dominated by the O(n^3) forward elimination process, that means, if for example an operation takes 10^-9 second and we have 1000 equations , the elimination process in the Gauss algorithm will takes 0.7 second, but if we have 10000 equations in the system , the elimination process in the Gauss algorithm will take 11 minutes !. This is why i have develloped for you the Parallel implementation of Conjugate Gradient Sparse Linear System Solver in Object Pascal, that is very fast.
You have only one method to use that is Solve()

function TParallelConjugateGradient.Solve(var A: arrarrext;var B,X:VECT;var RSQ:DOUBLE;nbr_iter:integer;show_iter:boolean):boolean;

The system: A*x = b

The important parameters in the Solve() method are:

A is the matrix , B is the b vector, X the initial vector x,

nbr_iter is the number of iterations that you want and show_iter to show the number of iteration on the screen.

RSQ is the sum of the squares of the components of the residual vector A.x - b.

I have got over 5X scalability on a quad core.

The Conjugate Gradient Method is the most prominent iterative method for solving sparse systems of linear equations. Unfortunately, many textbook treatments of the topic are written with neither illustrations nor intuition, and their victims can be found to this day babbling senselessly in the corners of dusty libraries. For this reason, a deep, geometric understanding of the method has been reserved for the elite brilliant few who have painstakingly decoded the mumblings of their forebears. Conjugate gradient is the most popular iterative method for solving large systems of linear equations. CG is effective for systems of the form A.x = b where x is an unknown vector, b is a known vector, A is a known square, symmetric, positive-definite (or positive-indefinite) matrix. These systems arise in many important settings, such as finite difference and finite element methods for solving partial differential equations, structural analysis, circuit analysis, and math homework

The Conjugate gradient method can also be applied to non-linear problems, but with much less success since the non-linear functions have multiple minimums. The Conjugate gradient method will indeed find a minimum of such a nonlinear function, but it is in no way guaranteed to be a global minimum, or the minimum that is desired. But the conjugate gradient method is great iterative method for solving large, sparse linear systems with a symmetric, positive, definite matrix.

In the method of conjugate gradients the residuals are not used as search directions, as in the steepest decent method, cause searching can require a large number of iterations as the residuals zig zag towards the minimum value for ill-conditioned matrices. But instead conjugate gradient method uses the residuals as a basis to form conjugate search directions . In this manner, the conjugated gradients (residuals) form a basis of search directions to minimize the quadratic function f(x)=1/2*Transpose(x)*A*x + Transpose(b)*x and to achieve faster speed and result of dim(N) convergence.

Language: FPC Pascal v2.2.0+ / Delphi 7+: http://www.freepascal.org/

Operating Systems: Windows, Mac OSX , Linux...

Required FPC switches: -O3 -Sd -dFPC -dFreePascal

-Sd for delphi mode....

Required Delphi switches: -$H+ -DDelphi

{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems

{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems


Thank you,
Amine Moulay Ramdane.

0 Kudos
3 Replies
aminer10
Novice
689 Views

Hello,

I have updated my new Parallel implementation of Conjugate Gradient Linear Sparse System Solver library to version 1.24, i have corrected a bug, and i have thoroughly tested it and i think it's stable and very fast now.

You can download it from here:

https://sites.google.com/site/aminer68/parallel-implementation-of-conjugate-gradient-sparse-linear-system-solver

 

Thank you,

Amine Moulay Ramdane.

0 Kudos
aminer10
Novice
689 Views

Hello,

I have updated my new Parallel implementation of Conjugate Gradient Linear Sparse System Solver library to version 1.3, i have added the support for the scalable distributed reader-writer mutex, and please comment the MREW define's option and uncomment the DRWLOCK define's option inside defines.inc or defines2.inc to use the scalable distributed reader-writer mutex that scales better, and when you set it to DRWLOCK , please set the number of rwlocks parameter of the TSparseMatrix constructor to around 2000 for example, this will use less memory and this will be scalable. 

You can download my new Parallel implementation of Conjugate Gradient Linear Sparse System Solver library version 1.3 from here: 

https://sites.google.com/site/aminer68/parallel-implementation-of-conjugate-gradient-sparse-linear-system-solver

Thank you,
Amine Moulay Ramdane.

 

0 Kudos
Bernard
Valued Contributor I
689 Views

I wonder why nobody here engages in discussion with you.

0 Kudos
Reply