- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My Machine
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6252N CPU @ 2.30GHz
Stepping: 7
CPU MHz: 1699.871
CPU max MHz: 3600.0000
CPU min MHz: 1000.0000
BogoMIPS: 4600.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 36608K
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Core topology: two sockets, 24 cores per socket, 48 cores total
SMT status: enabled, but not utilized
Max clock rate: 1.7GHz (single-core and multicore)
Peak performance:
--single-core: 54.4 GFLOPS(double-precision)
--multicore: 54.4 GFLOPS/core (double-precision) 2611.2 GFLOPS/48 cores(double-precision)
I have fixed the frequency of the CPU at 1.7GHz by commands: sudo cpupower -c all frequency-set -u 1.7GHz, sudo cpupower -c all frequency-set -d 1.7GHz.
Code sample
int main(int argc, const char *argv[])
{
// matrix parameters: A * X = B, column major
int N, NRHS, LDA, LDB;
// test parameters
int N_START = 1000, N_END = 30000, NRHS_START = 1000, NRHS_END = 1000, INC = 1000, REPEAT = 3;
// N, NRHS, REPEAT=3
N = N_START, LDA = N, NRHS = NRHS_START, LDB = N;
double gflops[50][50][10];
while (N <= N_END){
while(NRHS <= NRHS_END){
double *A = NULL, *B = NULL;
int *IPIV = NULL;
for(int re_count = 0; re_count < REPEAT; ++ re_count)
{
A = (double *) malloc (sizeof(double) * N * N);
B = (double *) malloc (sizeof(double) * N * NRHS);
int seed[] = {0, 0, 0, 1};
LAPACKE_dlarnv(1, seed, N * N, A);
LAPACKE_dlarnv(1, seed, N * NRHS, B);
IPIV = (int *) malloc (sizeof(int) * N);
struct timeval start, finish;
gettimeofday(&start, NULL);
int info = LAPACKE_dgesv(LAPACK_COL_MAJOR, N, NRHS, A, LDA, IPIV, B, LDB);
gettimeofday(&finish, NULL);
if(info == 0){
double d_n = N, d_nrhs = NRHS;
double ops = ((2.0*d_n*d_n*d_n/3.0 - d_n*d_n/2.0 + 5.0*d_n/6.0) + (d_nrhs * (2*d_n*d_n - d_n))) * 1.0e-9;
gflops[N/INC - 1][NRHS/INC - 1][re_count] = ops / ( (finish.tv_sec - start.tv_sec) * 1.0 + (finish.tv_usec - start.tv_usec) * 1.0e-6 );
}
else{
fprintf(stderr, "[ERROR]: LAPACKE_dgesv failed\n");
exit(EXIT_FAILURE);
}
free(A), free(B), free(IPIV);
A = NULL, B = NULL, IPIV = NULL;
}
NRHS += INC;
}
N += INC, LDA = N, NRHS = NRHS_START, LDB = N;
sleep(10);
}
return 0;
}
[xx@cn0 code]$ export OMP_NUM_THREADS=48 GOMP_CPU_AFFINITY="0-47:1"
[xx@cn0 code]$ make test_dgesv_mkl.x
gcc -O2 -fopenmp -fPIC -o test_dgesv.o -c test_dgesv.c
gcc test_dgesv.o -L/home/xx/lib/intel/oneapi/mkl/2022.1.0/lib/intel64/ -lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -lpthread -lm -ldl -fopenmp -o test_dgesv_mkl.x -lm -fopenmp -fPIC
[xx@cn0 code]$ numactl --interleave=all ./test_dgesv_mkl.x
In the best case(N=27 000,NRHS=1 000), MKL can reach 65.64%(1714.609/2611.2) of the theoretical peak. Have I gotten the right results? Where can I find some relevant experimental results?
Regards,
lianchen.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lianchen,
Thanks for reaching out to us.
>>Peak performance:
--single-core: 54.4 GFLOPS(double-precision)
--multicore: 54.4 GFLOPS/core (double-precision) 2611.2 GFLOPS/48 cores(double-precision)
Could you please let us know how did you calculate the GFLOPS for single-core and multicore in this case?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vidya,
single-core:
1.7 (Ghz) * 8 (AVX512 contains eight doubles) * 2 (FMA) * 2 (ways of FPU) = 54.4GFLOPS.
multi-cores:
54.4 (GFLOPS) * 48 (cores, not utilized SMT) = 2611.2 (GFLOPS)
Regards,
Lianchen.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lianchen,
I tried running the code on CPU model Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz and I'm attaching the results(the gflops count from the code) here.
Could you please check it once and confirm if the similar behaviour is replicated with this CPU model so that we can proceed further in this case?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lianchen,
As we haven't heard back from you, could you please provide us with an update regarding the issue?
Regards,
Vidya.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Lianchen,
As we haven't heard back from you, we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.
Regards,
Vidya.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page