Your question is quite broad but I will try to address it.
First, internal memory transfers do depend on how the initial matrix is distributed among processes. Though it is hard to predict whether communication will decrease or increase with the changes. The goal of reordering in general is to reduce fill-in, so the algorithm tries to decrease the overall memory consumption.
Second, I don't think that using the permutation returned from the analysis phase can affect performance of the entire functionality since presumably the main efforts will still be spent in the factorization and solving.
Third, your questions are quite theoretical, and we have complicated internal algorithms which are handling those thus it is hard to give any decisive answers. Is there any practical reason behind your questions? If you run our Cluster Sparse Solver, do you face any problems with memory consumption / performance / scaling?