I have developed an application that is parallelized using MPI and makes use of a parallel HDF5 library for I/O. The application writes several 3D arrays collectively and its performance is to a large extent I/O bound. While developing and testing the application on small computer systems, ex. i7 with 8 processors, I was getting a good scaling for I/O operations but when I switched to Devcloud, the I/O performance hits the wall dead. It's taking around 5-6 minutes to write a simple file of size 250 MB. Ideally, it should take a fraction of seconds (it took 0.01 s on i7 with 8 MPI processes). I tested my application only on 1 node using 24 MPI processes on DevCloud. I don't know the parallel file system which DevCloud uses in the background and what is the maximum block size it can handle. I performed the above-mentioned test on a small grid for which the file size is just 250 MB but during the production phase, the file size may go up to 3-5 GB and that would be a huge problem.
Please find the module that I'm using for parallel I/O in the attachment. I can't figure out whether it's because of something I'm not doing in a correct way or is it because of the parallel file system on the DevCloud.
Any suggestions on how to improve the parallel I/O efficiency would be warmly welcomed.
Thanks and Regards,
I can understand the performance issue you can getting over DevCloud. One thing I wanted to add here is DevCloud is not built for handling compute-intensive applications or running high-end HPC applications. It's just an environment for testing and exploring other toolkits provided by Intel.
You will definitely face the performance issue on DevCloud even if you get the environment details, as the fabrics are not designed for such applications. You can still run your applications and obtain the outputs over all nodes.
The issue is still not resolved, parallel HDF5 I/O is still a huge bottleneck. I had to switch to ASCII with each process writing its own file.
We suggest that if possible you load the entire file into the main memory and then do the processing. Or alternately you can also use the "/tmp" directory to place your data.
Note that the data may get lost from the /tmp directory since it is shared.
You will still face the performance issue on DevCloud as it is not designed for I/O intensive applications.
Can you please send us a small reproducer code for your scenario?