I have a Incomplete Cholesky preconditioner and run the CG using RCI communication and it behaved very poor. THere is only a very little improvement from sequential and parallel mode. It took 80 secs for parallel and 86 for sequential. I am using intel Xeon X5650 2.67. Is it normal for iterative solver? I used the latest MKL 11. It scaled almost linear with direct solver (Pardiso)
I also noticed that when linking the parallel MKL libraries, the backward and forward substitutions (Ax=L*U*x=b<==>L*y=b, U*x=y) is almost the same as its sequential versions. Although the CPU usage is close to 100%, the speed for solving the equation A*x=b is not accelerated at all.
I also expect the triangular solver can be parallized in the near future.
Thanks very much!