- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Allen_1215
GROMACS is a special case, almost all hot kernels are hand coded and there is not much for the compiler to optimize.
I would, however, recommend to not use OMP threads but just MPI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @TobiasK
Thank you for your response. While waiting, I continued studying and analyzing related issues. I also found another benchmark set here, which provides many free benchmarks. The benchBFC and benchBFI tests gave me the idea that the water_GMX50_bare benchmark might be too lightweight to effectively assess GROMACS builds with IntelMPI and OpenMPI. To address this, I adjusted the .mdp file and regenerated the benchmark data.
In particular, I modified pme.mdp as follows:
nstcalcenergy = 100 ; !autogen => nstcalcenergy = 1 ; !autogen
Regenerated the name of benchmark: *TI.tpr
Based on this adjustment, I noticed a more pronounced performance difference. I think that more intensive or complex calculations in GROMACS benchmarks may highlight the performance advantages of using OneAPI. (I'm not sure if this is correct; anyone with more knowledge about GROMACS is welcome to discuss)
/mnt/mount_test/gromacs/exec/openmpi/fftw/bin/gmx_mpi mdrun -pin on -v --noconfout -ntomp 16 -s /mnt/mount_test/gromacs/benchmark/waterTI/0384_TI.tpr -dlb yes -nsteps 3000 -resetstep 1000
========
Benchmark: 0384_TI.tpr
Using 1 MPI process
Using 16 OpenMP threads
Core t (s) Wall t (s) (%)
Time: 846.390 52.899 1600.0
(ns/day) (hour/ns)
Performance: 6.536 3.672
/mnt/mount_test/gromacs/exec/oneapi/mkl/bin/gmx_mpi mdrun -pin on -v --noconfout -ntomp 16 -s /mnt/mount_test/gromacs/benchmark/waterTI/0384_TI.tpr -dlb yes -nsteps 3000 -resetstep 1000
========
Benchmark: 0384_TI.tpr
Using 1 MPI process
Using 16 OpenMP threads
Core t (s) Wall t (s) (%)
Time: 743.222 46.451 1600.0
(ns/day) (hour/ns)
Performance: 7.444 3.224
In other benchmarks which has the same phenomenon
I would, however, recommend to not use OMP threads but just MPI
=> Could you explain the reason? Because with the OMP threads supporting, the performance is better
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would, however, recommend to not use OMP threads but just MPI
=> Could you explain the reason? Because with the OMP threads supporting, the performance is better
I did not run some GROMACS benchmarks, but last time I checked MPI only parallelization was still superior to OpenMP only parallelization. In your case, running only with OpenMP you even don't need Intel MPI nor OpenMPI. For single node only runs GROMACS did also offer an internal MPI implementation. Please ask the GROMACS developers for guidance on how to run the benchmarks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @TobiasK
Thank you for your input, and I understand your point. You are correct regarding the current case on a single node without using the MPI process. However, we aim to deploy GROMACS on multiple nodes with the MPI process in the future. For now, we are experimenting on a single node to evaluate performance improvements. We have confirmed that using DPC++ to build GROMACS is efficient.
Thank you for your help, and have a great day.
Best regards
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page