There is significant performance slow-down when I run a simple zgemv program with multiple instances on a machine.
If running a.exe take x time, when I run 4 a.exe at the same time on this machine, each takes 2x+ time. This is on Linux, multi-core (12), large memory machine. The matrix size is about 1400x1400.
What I can do to improve the performance with multiple instance run here?
Perhaps you want to link the sequential version to run zgemv on just one thread then launch multiple instances. You could also use mkl_set_num_threads() or other available environment variables.
Perhaps you're aware of all this and you were running into something else, but in that case we could use a little more information on how you're linking your program and how you expect to handle parallelism.