You're probably better off writing this in the language of your choice. Even for the uniform strided cases which are covered by ?amax you are likely to get better results with any modern compiler, with the possible exception of Microsoft's.
Intel C++ certainly qualifies as a modern auto-vectorizing compiler, and should do a good job with a reasonable source code implementation.
Level 1 BLAS functions aren't particularly good candidates for PBLAS, so I'm not surprised the developers extended only a small group of them, and didn't add capabilities.
Evidently, if ?amax fits your definition of "distributed," and your problem is large enough, you might be interested in PBLAS.
p?amax and i?amax are part of PBLAS and BLAS, but p?amin and i?amin are not part of the standard PBLAS and BLAS (though we added i?amin to Intel MKL when a customer requested it quite a while back).