There's astorage conflicton array c inside the parallel region. Each thread has a private copy of the loop index j, but the threads can still access the same elements of c. This could be causing cache conflicts.
I suggest analyzing this code with the Intel Threading Tools. Thread Checker will help you find race conditions. Thread Profiler will you tune threaded performance.
Message Edited by hagabb on 12-22-2005 07:28 AM
Message Edited by hagabb on 12-22-2005 07:30 AM
Message Edited by hagabb on 12-22-2005 07:31 AM
Message Edited by izryu on 12-22-2005 06:37 PM
Try placing the results of the MATMUL into a private (to thread)and declared storage area. I think what is happening is the compiler is allocating a common static temporary array to hold the results of the MATMUL and this temporary array is being used by both threads (resulting in cache problems).