Can anybody help me to understand the following situation:
I am trying to understand the processor performance for a parallel application. I use an Intel quad core with hyperthreading (4 physical cores and 4 logical cores). I get three responses. The first response, I get the maximum performance when I use the physical cores (it is normal), the second response is the minimum performance when the number of cores increase from the maximum number of physical cores to logical cores (for example from the fourth core to fifth core) and third when I use the logical cores, the performance decrease around 30% to 10% when the number of cores are greater than six cores (it is normal).
I don’t understand the second response (see attachment), usually the performance does not increase or increase a little bit more. What happen with the processor in this situation? I get this response in different intel processors.
I use the time command and GNU compiler on linux OS to get the execution time.
Thanks in advance!
abdule m. wrote:Is the output file required to be written in order? (t1, t2, t3, ...) If so, and if additional speed-up is desired, then consider a parallel pipeline approach. This will also help the unordered output file to some extent. Create a list of buffers, greater than number of cores, that are a shared resource (with thread-safe queue), Or for each thread, create N buffers (N>1), which need not be thread-safe. Then, oversubscribe by 1 thread. Learning something new about OpenMP, use the OpenMP task, in a loop to enqueue either a task to write (when results present) or compute (when buffer available). I will let you experiment with your first attempt(s). When you get stuck, come back. Note, this may benefit from using !$OMP ATOMIC for counter. Jim Dempsey
]Write( Z(t1)) | Write( Z(t2)) | Write(Z(t3))