Hi, I'm working on a project where I have to change a program so it can take advantage of mulit-core techn. I'm using C++ and i would like to know if there is anyway to know how many cores a processor has. I mean, if it is mono-core the program will execute with no threads and if it is a quad with 4. I am trying to know this because I see that using for example 4 threads in a mono-core system doesn't take any advantage and using 2 threads in a quad I don't take full advantage of it.
Multi-Processor/Core operating systems have system function calls to obtain information relating to the architecture of the hardware platform. If you are running on Windows, check Platform SDK for which calls are available. For an easier route, most C++ compilers support threading by use of a common interface called OpenMP. OpenMP is less specific than the underlaying operating system calls but the calls are common regardless of the operating system. Also, OpenMP is easier to program.
To obtain the number of processors under OpenMP use omp_get_num_procs (check the case on the spelling for your implementation).
The number of processors returned by this function will include HT "processors". So if you were on a dual processor each with HT the omp_get_num_procs function will return 4. Your application may want to know 2. If you require this knowledge then use the underlaying operating system calls.
Additionally, depending on circumstances beyond the control of the application, the operating system may restrict the application to running on fewer than the total available processors and the application itself may specify a subset of the available processors to run on. Look under Processor Affinity in both sets of documentation.
Multiple threading adds some overhead for starting/stopping and managing the cooperation of threads. Sections of your application mustconsume sufficiently large processing time such that it will be advantageous to program using multiple threads.
I suggest you experiment using OpenMP and explore the example programs included with your compiler or sample programs obtained from the internet.
For most users obtaining the number of processors or reading an environment variable is satisfactory for configuring the threads within an application.
If you require to go to the extra effort then consider using GetLogicalProcessorInformation.
When programming on large SMP systems there are many things that affect performance
logical processors which share a single core (Hyper Threading)
logical processors which share a cache (e.g. multi-core die)
logical processors which share a package (multiple dies within package)
logical processors that share a NUMA node (and at which level)
Cach association can be complicated - is the cache within the core, is the cache within the die, is the cache within the package, is the cache between the package and the memory bus (and potentially other issues).
An "advanced" application initialization code would perform a system survey, then with the foreknowledge of the relationships of the data and threads it will use Affinity masking to restrict threads to (or conversely restrict threads from) groupings of logical processors.
If an application has a thread(s) that have mostly integer based computatioinsthen Hyper Threading is not of a concern to those threads. If the thread(s) are mostely floating point computations then you would want to exclude from use all but one of the logical processors within an HT collection.
It may be advantagious to place threads which have a high degree of data sharing on the logical processors that share the same cache as a first choice, then as a second choice on the sameNUMA node, then on adjacent NUMA nodes, then further distanced NUMA nodes. NUMA node distance is typically expressed in "hops". Memory next to processor package 1 hop, additional memory (typically on same motherboard) 2 hops, memory on adjecent mother/daughter board 2 hops, etc...)
It may be advantagious to place threads with low (or no) degree of data sharing in the reverse priority order of the previous paragraph.
I would suggest the you get your feet wet with some basic OpenMP programming skills as opposed to addressing the advanced practices.
I would venture to guess that your system has but one of the Quad Core processors. e.g. Q6600. This is one processor package with two physical processor dies within the package. All 4 cores reside on a single memory bus (only one memory node). Each processor die has two processors sharing a cache. The associations of the cores to the cache system will tend to be constant and can be hard wired for now e.g. logical processors 0 and 1 share a cache, logical processors 2 and 3 share a different cache, and all 4 processors share a memory bus.
Experimenting with processor affinities should be the last area to address for program optimizations.