seg-faults were not a problem for me, at least under Linux (I cannot say for windows), unless I increased the matrices size to dimension 1024.
In this case, I could identify that seg-faults occured at line 105 of the main program (that is right after diagonalization was performed), when the overlap matrix is calculated.
Replacing the matmul line by an equivalent do-loop operation got rid of those seg-faults.
For some reason, increasing the stack size to gigabyte order did not seem to do anything in that respect.
The code I provided used a dimension of 256 which was the minimal value for which I was able to reproduce the previously reported behavior.