My platform is KNL 7250 ( 1.4GHz, 68 cores).
I used Intel Vtune Amplifier to check the total thread.
I want to execute HPL 2.3 with full threads (272 threads) but can not. I tried 2 way: setup environment variables (OMP_NUM_THREADS=272) or modified HPL_pdgesv.c with add
#pragma omp parallel
if( ( ALGO->depth == 0 ) || ( GRID->npcol == 1 ) )
HPL_pdgesv0( GRID, ALGO, A );
HPL_pdgesvK2( GRID, ALGO, A );
* Solve upper triangular system
if( A->info == 0 ) HPL_pdtrsv( GRID, A );
But HPL just works with a maximum 68 threads. If I decrease numbers of threads less than 68, it will work with my number of threads setup. But it does not work with numbers of threads greater than 68.
How can I use full threads with HPL?
For more complete information about compiler optimizations, see our Optimization Notice.