I'm trying to figure out how to use APS to help benchmark/characterize a code I help maintain. However, it seems that when the program gets too big, APS breaks down. Well, aps-report breaks down.
First, the code I'm working on is GEOS, a climate model, and we can run in various resolutions from C24 (~400 km resolution, mainly for debugging/regression) to C720 (~12 km) and beyond. To run these, we run on 96 processors up to 1500+ for C720. Now, for the lower resolutions, C24->C180, aps-report seems just happy. C180 is run on 216 processors, the others on 96.
But, when we get to C360 (864 processors) and C720 (1536), aps-report starts failing. APS is definitely making a stat-XXX._bin file for each process, but when aps-report tries to process them, it takes a long time and eventually just fails.
To wit, the script I wrote to make a report essentially runs:
aps-report --all $APS_OUTPUT_DIR > aps_report.txt 2> /dev/null
where the /dev/null bit is to keep that stderr progress percentage print from writing a bajillion lines in the output file.
Now with 96 processes in the run, aps-report takes about 8 seconds. With 216, it takes about 19. With 864, aps-report runs for 200 seconds and then just crashes out. The C720 job obviously fails as well.
So, does anyone know if I'm hitting some limit? Am I filling up a TMPDIR silently? I can't seem to find any debugging flags for aps or aps-report that would report more information.
I also tried:
(4774) $ /usr/bin/time -p aps --report=aps_result_20180316/ ERROR: Cannot parse directory: aps_result_20180316//hwmetrics aps Error: Failed to generate the report. Command exited with non-zero status 2 real 182.68 user 141.68 sys 5.23
which is a clue perhaps? But I'm not calling aps any differently in one case compared to the other.