- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have an i7-4700EQ processor. I want to use 4 cores with parallel. I compiled below code and run it. With only 1 core time measurement was 7198.200000us. But with 4 cores, i saw 18290.221667us for each cores. How can it possible? I should have seen about 7198us, right? Because I used independent tasks and independent memories.
build specs: CC_ARCH_SPEC = -march=core2 -nostdlib -fno-builtin -fno-defer-pop -m64 -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone -mavx2 -fno-implicit-fp
code;
void MultiCoresExample(int iA, int iB, int affin);
void TempMultiCoreCopy(int iA, int iB, int affin);
double dtime1[4], dtime2[4];
typedef struct
{
float *vInput[4];
float *vOutput[4];
}tempStruct;
tempStruct tmpStr;
void MultiCoresExample(int iA, int iB, int affin)
{
TASK_ID tids[4]; /* some task IDs */
char taskName[32];
int cpuIx[] = {0,1,2,3}; /* core ID's*/
int i, j;
cpuset_t affinity;
float *f0, *f1, *f2, *f3, *f4, *f5, *f6, *f7;
float *fIn[4];
float *fOut[4];
f0 = memalign(128, iA*iB*4);
f1 = memalign(128, iA*iB*4);
f2 = memalign(128, iA*iB*4);
f3 = memalign(128, iA*iB*4);
f4 = memalign(128, iA*iB*4);
f5 = memalign(128, iA*iB*4);
f6 = memalign(128, iA*iB*4);
f7 = memalign(128, iA*iB*4);
tmpStr.vInput[0] = f0;
tmpStr.vInput[1] = f1;
tmpStr.vInput[2] = f2;
tmpStr.vInput[3] = f3;
tmpStr.vOutput[0] = f4;
tmpStr.vOutput[1] = f5;
tmpStr.vOutput[2] = f6;
tmpStr.vOutput[3] = f7;
/******* init ***************/
for(i=0; i<affin; i++)
{
for(j=0; j<iA*iB; j++)
{
f0
f1
f2
f3
}
}
/****************************/
printf("Cores are setting...\n");
for(i=0; i<affin; i++)
{
CPUSET_ZERO (affinity);
CPUSET_SET(affinity, cpuIx);
sprintf(taskName, "t%s%d", "task", i);
tids = taskCreate(taskName, 120, TASK_OPTIONS, 65536, (FUNCPTR)TempMultiCoreCopy, iA, iB, affin, 0,0,0,0, 0, 0, 0);
printf("Task created:0x%08x\n", tids);
if (tids == NULL)
{
/*return (ERROR);*/
printf("Task create error:0x%08x\n", tids);
}
if(affin != -1)
{
/* Clear the affinity CPU set and set index for CPU */
if (taskCpuAffinitySet(tids, affinity) == ERROR)
{
/* Either CPUs are not enabled or we are in UP mode */
printf("Affinity error \n");
taskDelete(tids);
/*return (ERROR);*/
}
taskDelay(sysClkRateGet()/10);
taskCpuAffinityGet(tids, &affinity);
printf("Task Affinity:%d\n", affinity);
}
}
for(i=0; i<affin; i++)
{
taskActivate(tids);
}
taskDelay(sysClkRateGet()* 4); /* for finish all cores.*/
for(i=0; i<affin; i++)
{
printf("\nStartTime[%d]=%f FinishTime[%d]=%f ExecutionTimeForCore[%d]=%f us\n", i, dtime1, i, dtime2, i, (dtime2-dtime1));
}
for(i=0; i<affin; i++)
{
taskDelete(tids);
}
}
void TempMultiCoreCopy(int iA, int iB, int affin)
{
int kk;
int iCpuId = vxCpuIdGet();
dtime1[iCpuId] = getTimeDouble(2);
for(kk=0; kk<1000; kk++)
{
memcpy(tmpStr.vOutput[iCpuId], tmpStr.vInput[iCpuId], iA*iB*4);
}
dtime2[iCpuId] = getTimeDouble(2);
}
screen;
sp MultiCoresExample,16,2048,1
Task spawned: id = 0xffff80000efd1510, name = t1
value = -140737236888304 = 0xffff80000efd1510
A->Cores are setting...
Task created:0x0efe2020
Task Affinity:1
StartTime[0]=218186962.911667 FinishTime[0]=218194161.111667 ExecutionTimeForCore[0]=7198.200000 us
sp MultiCoresExample,16,2048,2
Task spawned: id = 0xffff80000efd1510, name = t2
value = -140737236888304 = 0xffff80000efd1510
A->Cores are setting...
Task created:0x0efe2020
Task Affinity:1
Task created:0x0f1ea810
Task Affinity:2
StartTime[0]=264755500.995000 FinishTime[0]=264773712.746667 ExecutionTimeForCore[0]=18211.751667 us
StartTime[1]=264755514.550000 FinishTime[1]=264773614.643333 ExecutionTimeForCore[1]=18100.093333 us
sp MultiCoresExample,16,2048,3
Task spawned: id = 0xffff80000efd1510, name = t3
value = -140737236888304 = 0xffff80000efd1510
A->Cores are setting...
Task created:0x0efe2020
Task Affinity:1
Task created:0x0f1ea810
Task Affinity:2
Task created:0x0efe2510
Task Affinity:4
StartTime[0]=288507258.976667 FinishTime[0]=288525447.206667 ExecutionTimeForCore[0]=18188.230000 us
StartTime[1]=288507271.261667 FinishTime[1]=288525387.871667 ExecutionTimeForCore[1]=18116.610000 us
StartTime[2]=288507259.561667 FinishTime[2]=288514408.870000 ExecutionTimeForCore[2]=7149.308333 us
sp MultiCoresExample,16,2048,4
Task spawned: id = 0xffff80000efd1510, name = t4
value = -140737236888304 = 0xffff80000efd1510
A->Cores are setting...
Task created:0x0efe2020
Task Affinity:1
Task created:0x0f1ea810
Task Affinity:2
Task created:0x0f413610
Task Affinity:4
Task created:0x0f413b00
Task Affinity:8
StartTime[0]=307985065.768333 FinishTime[0]=308003355.990000 ExecutionTimeForCore[0]=18290.221667 us
StartTime[1]=307985078.606667 FinishTime[1]=308003284.243333 ExecutionTimeForCore[1]=18205.636667 us
StartTime[2]=307985064.923333 FinishTime[2]=308003229.746667 ExecutionTimeForCore[2]=18164.823333 us
StartTime[3]=307985066.711667 FinishTime[3]=308003220.956667 ExecutionTimeForCore[3]=18154.245000 us
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What are the values of iA and iB? It is not possible to figure out which parts of the memory hierarchy are being used if the sizes of the arrays are not known.
Your processor supports HyperThreading. If HyperThreading is enabled, the system might map logical processors 0,1,2,3 to different physical cores, or it might map logical processors 0,2,4,6 to different physical cores.
Your 3-thread result suggests that you do have HyperThreading enabled and that logical processors 0,1 are mapped to physical core 0, 2,3 are mapped to physical core 1, 4,5 are mapped to physical core 2, and 6,7 are mapped to physical core 3. So in the 3-thread case, threads 0 and 1 are sharing physical core 0 (and therefore running slowly), while thread 2 is running by itself on physical core 1 (and running at full speed).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks John,
I changed the HyperThreading mode and it works. iA=64 and iB=2048.
How about without disabling the HyperThreading? It can work with HyperThreading? When I set 0,2,4and 6. cores I saw"Affinity error" for 4. and 6. cores on screen. Can I run this code with HyperThreading(enable)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on the output of your first run, it looks like you should try changing
int cpuIx[] = {0,1,2,3}; /* core ID's*/
to
int cpuIx[] = {0,2,4,6}; /* core ID's*/
When HyperThreading is enabled, this should place one thread on each physical core.
If HyperThreading is disabled, then this won't work, since the available cores will be [0,1,2,3], so you will need the code to be able to compensate.
It gets trickier if you need to programmatically determine whether or not HyperThreading is enabled and how the "logical processors" are mapped to the physical cores, and I don't know how to attempt to do this on VxWorks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks John,
I tried cpuIx[] = {0,2,4,6}; but it didnt work correctly.(When HyperThreading is enabled) When I set 4 and 6 cpuId, I took "Affinity error" from my code. I changed these Ids and tried all Ids but it didnt work. As soon as, I changed the mode of HyperThreading(disable), it worked right with cpuIx[] = {0,1,2,3};.
I didnt understand why it didnt work. Also I tried mapping loggical cores to physical cores.
Other problem is data size;
When iA=16 and iB=2048 so data size is equal iA*iB*sizeOf(float), it works parallel with all cores (HyperThreading is enabled )
When iA=64 and iB=2048 so data size is equal iA*iB*sizeOf(float), it does not work parallel with all cores (HyperThreading is enabled )
Do you have any idea ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't have any experience with VxWorks, so I can't really speculate on what is going on with the affinity calls.
I just noticed that your compilation options include several flags that are specific to generating code for running inside the kernel, but the rest of the code does not look like it is set up as a kernel module. This could be the cause of some of the troubles?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page