Certainly, MPI (possibly with MPI_THREAD_FUNNELED) is a proven method for managing all the cores on a cluster. You seem to be vacillating as to whether you want a cluster, or a single shared memory machine, where OpenMP by itself would be another option, simpler, but with more limitations.
In the last year, OpenMPI and Intel MPI have been the most successful MPIs in my experience. Both have excellent built-in schemes for processor affinity. As Jim suggested, you can find the experts on Intel MPI on the Intel HPC forum, while OpenMPI has excellent public forum which you would find it useful to follow.
With so many issues on your plate, I wonder if you have time to deal with low level memory management issues where you might see a difference between 32- and 64-bit (beyond the vastly expanded upper limit), unless you have gone out of your way to write non-portable code. You have raised too many issues to expect adequate coverage in a single book.