Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

How can i scale only one core's frequency in a multi-core processor?(Monitor through PMU)

ke__yang
Beginner
2,169 Views

Hi,

I am trying to scale only one core's frequency in a multi-core processor with linux 3.8 kernel based on intel i7 3610QM.

I scaled each core's frequency through program A  .And i got a result like this:

    cat  /proc/cpuinfo  | grep MHz

    cpu MHz        : 1200.000
    cpu MHz        : 1200.000    //core 0
    cpu MHz        : 1600.000
    cpu MHz        : 1600.000      //core 1
    cpu MHz        : 2000.000
    cpu MHz        : 2000.000      //core 2
    cpu MHz        : 2300.000
    cpu MHz        : 2300.000      //core 3

Then , i run a openmp test program with 6 threads . (program B)
And monitor the event UNHALTED_CORE_CYCLES of each core using PMU through a Linux module.  

          |    core 0     |        core 1       |        core 2       |       core 3      |
    |thread0 | thread1 |  thread2  | thread3 | thread4  |  thread5 | thread6 | thread7 |
    2294805446 76295634 2294803945 2294804700 2294803738 2294803508 38357212 2294805673
    2294766152 70455048 2294766097 2294766570 2294766156 2294765874 30905408 2294766779
    2294781461 80676090 2294780031 2294780267 2294781153 2294780650 37552709 2294780340
    2294789708 69952866 2294786896 2294786492 2294786812 2294786996 33818385 2294788832
    2294783242 78313422 2294782071 2294782291 2294781417 2294781856 32139270 2294783222
    2294790337 71224453 2294788792 2294789465 2294789916 2294787994 37572423 2294789002
            ^         x         ^          ^          ^           ^        x         ^
   

The problem is that the four cores' frequency are actually the same at 2.3GHz ,instead of 1.2GHz,1.6GHz,2.0GHz and 2.3GHz respectively . I wonder is there a method that can scale each core's frequency at different speeds in a multi-core processor .

/****************************************  program A   *********************************************/

[cpp]
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <string.h>


const char *convtab[] = {"2301000","2300000","2200000","2100000","2000000","1900000","1800000", "1700000", "1600000", "1500000", "1400000", "1300000", "1200000" };


/***********freq_config**********/
void freq_scale0(int core, int lev)
{
    int fd;
    char file[128];
    sprintf(file, "/sys/devices/system/cpu/cpu%d/cpufreq/scaling_setspeed", core);
    fd = open(file, O_RDWR);
    write(fd, convtab[lev], 7);
    close(fd);
}

void freq_scale(int core, int lev)
{
    freq_scale0(core, lev);
}

void freq_mode(int core, const char *mode)
{
    int fd;
    char file[128];
    sprintf(file, "/sys/devices/system/cpu/cpu%d/cpufreq/scaling_governor", core);
    fd = open(file, O_RDWR);
    write(fd, mode, strlen(mode));
    close(fd);
}

int main(int argc, char *argv[])
{
    /* intel i7 3610QM  4 cores \ 8 threads */
    if(argv[1][0]=='s')
    {
    freq_mode(0, "userspace");
    freq_mode(1, "userspace");
    freq_mode(2, "userspace");
    freq_mode(3, "userspace");
    freq_mode(4, "userspace");
    freq_mode(5, "userspace");
    freq_mode(6, "userspace");
    freq_mode(7, "userspace");

    freq_scale0(0, 12);    //1.2GHz  core 0
    freq_scale0(1, 12);    //1.2GHz  core 0
    freq_scale0(2, 8);    //1.6GHz  core 1
    freq_scale0(3, 8);    //1.6GHz  core 1
    freq_scale0(4, 4);    //2.0GHz  core 2
    freq_scale0(5, 4);    //2.0GHz  core 2
    freq_scale0(6, 1);    //2.3GHz  core 3
    freq_scale0(7, 1);    //2.3GHz  core 3
    }
    else
    {
    freq_mode(0, "ondemand");
    freq_mode(1, "ondemand");
    freq_mode(2, "ondemand");
    freq_mode(3, "ondemand");
    freq_mode(4, "ondemand");
    freq_mode(5, "ondemand");
    freq_mode(6, "ondemand");
    freq_mode(7, "ondemand");
    }
    return 0;
}
[/cpp]

/****************************************  program B   *********************************************/
[cpp]
#include <stdio.h>
#include <omp.h>
int main()
{
int i;
int x=1;
omp_set_num_threads(6);//6 of 8 threads
#pragma    omp parallel for  private(x) schedule(dynamic)
for(i=0;i<1000000000;i++)
    x=x*x+1;

return 0;
}

[/cpp]

0 Kudos
9 Replies
Patrick_F_Intel1
Employee
2,169 Views

Here are the instructions I've used before. I've not tried changing just one core. The instructions look like they require changing all the cpus. That is, if HT is enabled, you probably need to change both cpus on the core.

1.) Is the system capable of software CPU speed control? If the "directory" /sys/devices/system/cpu/cpu0/cpufreq exists, speed is controllable. -- If it does not exist, you need to go to the BIOS and turn on EIST and any other C and P state control and visibility, then reboot

2.) What speed is the box set to now? Do the following:

$ cd /sys/devices/system/cpu
$ cat ./cpu0/cpufreq/cpuinfo_max_freq
3193000
$ cat ./cpu0/cpufreq/cpuinfo_min_freq
1596000

3.) What speeds can I set to? Do
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
It will list highest settable to lowest; example from my NHM "Smackover" DX58SO HEDT board, I see:
3193000 3192000 3059000 2926000 2793000 2660000 2527000 2394000 2261000 2128000 1995000 1862000 1729000 159600
You can choose from among those numbers to set the "high water" mark and "low water" mark for speed. If you set "high" and "low" to the same thing, it will run only at that speed.

4.) Show me how to set all to highest settable speed! Use the following little sh/ksh/bash script:
$ cd /sys/devices/system/cpu # a virtual directory made visible by device drivers
$ newSpeedTop=`awk '{print $1}' ./cpu0/cpufreq/scaling_available_frequencies`
$ newSpeedLow=$newSpeedTop  # make them the same in this example
$ for c in ./cpu[0-9]* ; do
>   echo $newSpeedTop > ${c}/cpufreq/scaling_max_freq
>   echo $newSpeedLow >${c}/cpufreq/scaling_min_freq
> done
$

5.) How do I return to the default - i.e. allow machine to vary from highest to lowest? Edit line # 3 of the script above, and re-run it.  Change the line:
$ newSpeedLow=$newSpeedTop  # make them the same in this example
To read
$ newSpeedLow=`awk '{print $NF}' ./cpu0/cpufreq/scaling_available_frequencies`
And then re-run the script.

 Hopefully these instructions give you enough info to resolve your questions.

Pat

0 Kudos
ke__yang
Beginner
2,169 Views

Hi pat,
 Thank you for your reply !

I have tried your method,but the same problem still happens.In fact,every core scales to the same frequency when i scale any one of the eight cores' frequency.

So,i wonder does my processor have the hardware characteristic to scale each core to different frequencies .

Songtao

0 Kudos
Bernard
Valued Contributor I
2,169 Views

could  that be done by writing to this MSR_TURBO_RATIO_LIMIT MSR register?

0 Kudos
Zhihui_D_
Beginner
2,169 Views

When we change the frequency of the given core with linux command like

echo 1200000>/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed

will the voltage of the given core change or not?

0 Kudos
McCalpinJohn
Honored Contributor III
2,169 Views

If I recall correctly, the Sandy Bridge processors with the "client" uncore only supported a single clock frequency, which was applied to all cores.  This processor is an Ivy Bridge with the "client" uncore, so it probably has the same limitation.

The processors with the "server" uncore support independent frequencies on each core, but the Power Control Unit (PCU) in the uncore may override your frequency requests if it thinks that having some cores running slowly is hurting performance.  The Xeon E5-2600 v2 series uncore performance monitoring guide provides descriptions of some counters in the PCU that can track when the PCU has overridden your frequency requests for various reasons.

0 Kudos
Patrick_F_Intel1
Employee
2,170 Views

Hello Zhihui,

I'm not sure if the voltage changes with the frequency but you can probably figure it out.

MSR IA32_PERF_STATUS (0x198) shows the current voltage ID and frequency multiplier in bits 0:15. Intel doesn't document which bits are the voltage and which bits are the frequency multiplier but it is easy to see the multiplier. If you got the frequency set to 2 GHz and the base clock is 100 MHz (sandybridge and later chips) then the multiplier will be 20 (0x16). The bits which are not 0x16 are the voltage ID.

So you can try changing the frequency (via the Linux scaling_setspeed api) and track the bits in IA32_PERF_STATUS.

Pat

0 Kudos
Amit_H_
Beginner
2,170 Views

If there is a single clock shared between multiple processors or cores the changing the frequency of core will not increase or decrease the overall performance. Because source of frequency is same for all the cores and for that reason all cores will work at same frequency. If system contains identical clocks for each core then we can actually change the frequency and see the performance difference between multiple cores.

0 Kudos
Amit_H_
Beginner
2,170 Views

Is there any way to change clock frequency even though all cores are using same hardware frequency?

0 Kudos
McCalpinJohn
Honored Contributor III
2,170 Views

In some sense there is a single "clock" on most of these systems -- the 100 MHz reference clock.   The other clocks are derived as multiples of this reference clock.

So there are three issues:

  1. Can you request different frequencies on different physical cores?
    • On most systems the answer is "yes".  The MSR IA32_PERF_CTL (0x199) is a per-core or per-thread register on most recent processors.
    • This does not mean that the hardware will honor your request -- just that the interface exists to make the requests.
  2. Can you request different voltages on different physical cores?
    • On systems before Haswell, the "performance state" requested with MSR IA32_PERF_CTL is a combination of a core frequency multiplier and a voltage.
    • Most systems have fewer independent power supply voltages than they have cores, so the cores share a voltage "plane".
    • Even if independent frequencies are allowed, the processor would have to request the highest voltage needed by any of the cores, so any slower cores would get only a fraction of the power savings that would be possible from reducing both frequency and voltage.
    • Haswell has on-chip voltage regulators to allow different voltages to different cores.
    • This does not mean that the hardware will honor your request...
  3. Will the hardware honor my requests for different frequencies?
    • It is important to remember that the IA32_PERF_CTL MSR is a "request", not a direct setting.  The Power Control Unit has the final say over the actual frequencies.
    • I have never been able to get a chip to actually run the cores at different frequencies at the same time while running user code, though I have not tried this on a Haswell (Xeon E5 v3) part yet.   On prior systems, it appears that the Power Control Unit overrides my requests and runs all the cores at the same frequency as the fastest core.
    • There are some counters in the Power Control Unit of the Uncore of the Xeon E5 (v1, v2, v3) processors that can provide information on *why* the Power Control Unit overrode the user's frequency selections.  These are described in the Uncore Performance Monitoring Manuals for the various Xeon E5 products.
0 Kudos
Reply