Software Archive
Read-only legacy content
17061 Discussions

combining pthread and offload

songlinhai
Beginner
661 Views

Hi,

    I have some old codes written pthread. i want to simply add some #pragma, and offload some hot loops to mic. But I got some problems. For example, the following codes create 5 threads, and each of them sum the numbers in pArray altogether. I got two problems there: 1) only one thread can provide correct sum result, and the other 4 just print 0; 2) the program will hang in the "free(pArray)" statement. Any hints to explain these two problems?

    Thanks a lot!

#include <pthread.h>
#include <stdio.h>
#define NUM_THREADS 5

#define SIZE 1000000
int * pArray;

void * PrintHello(void *threadid)
{
long tid;
tid = (long)threadid;
printf("Hello World! It's me, thread #%ld!\n", tid);
pthread_exit(NULL);
}

void * pCount(void * threadid)
{
int i;
int iSum = 0;

for(i=0; i<SIZE; i++ )
{
pArray = pArray + 1;
}

printf("done!\n");
pthread_exit(NULL);
}

int main()
{
pthread_t threads[NUM_THREADS];
int rc;
long t;
int i;

pArray = (int *)malloc(SIZE * sizeof(int));
for(i=0; i<SIZE; i++)
{
pArray = i;
}


for(t=0; t<NUM_THREADS; t++) {
printf("In main: creating thread %ld\n", t);
rc = pthread_create(&threads, NULL, pCount, (void *)t);
if(rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}

for(t=0; t<NUM_THREADS; t++)
{
pthread_join(threads, NULL);
}

free(pArray);

pthread_exit(NULL);
}

0 Kudos
5 Replies
songlinhai
Beginner
661 Views

I forgot to add #pragma offload in my post. The codes I tried are attached in this post. 

0 Kudos
songlinhai
Beginner
661 Views

#pragma offload target(mic:0) \
inout(pArray:length(SIZE))
#pragma omp parallel for private(i) num_threads(100) reduction(+:iSum)
for(i=0; i<SIZE; i++ )
{
iSum += pArray;
}

I attached wrong programs again. Sorry about these two mistakes. 

0 Kudos
Sumedh_N_Intel
Employee
661 Views

I am still not quite sure what you are trying to accomplish with this example. Could you please give us some background and the correct code for your example. 

0 Kudos
songlinhai
Beginner
661 Views

Thanks a lot for the reply!

I just want to test how to mix pthread and offload. I attach the "correct" codes I use as follows:


#include <pthread.h>
#include <stdio.h>
#define NUM_THREADS 5

#define SIZE 1000000
__attribute__((target(mic))) int * pArray;

void * PrintHello(void *threadid)
{
long tid;
tid = (long)threadid;
printf("Hello World! It's me, thread #%ld!\n", tid);
pthread_exit(NULL);
}

void * pCount(void * threadid)
{
int i;
int iSum = 0;

#pragma offload target(mic:0) \
inout(pArray:length(SIZE))
#pragma omp parallel for private(i) num_threads(100) reduction(+:iSum)
for(i=0; i<SIZE; i++ )
{
iSum += pArray;
}

printf("%d\n", iSum);
pthread_exit(NULL);
}

int main()
{
pthread_t threads[NUM_THREADS];
int rc;
long t;
int i;

pArray = (int *)malloc(SIZE * sizeof(int));
for(i=0; i<SIZE; i++)
{
pArray = i;
}


for(t=0; t<NUM_THREADS; t++) {
printf("In main: creating thread %ld\n", t);
rc = pthread_create(&threads, NULL, pCount, (void *)t);
if(rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);
exit(-1);
}
}

for(t=0; t<NUM_THREADS; t++)
{
pthread_join(threads, NULL);
}

printf("after join\n");

free(pArray);

printf("after free");
pthread_exit(NULL);
}

The program will hang in "free(pArray)" statement. I do not understand why the program hangs. 

Best,

0 Kudos
Sumedh_N_Intel
Employee
661 Views

Only one thread in your code provides the correct answer because there is a race condition in your code. The offload runtime links the host-side and coprocessor-side arrays based on a the host-side pointers. In your case, since the host-side array is the same, the offload runtime does not create a separate copy of the array for each of the offloads (five offloads in this case: one offload for each pthread). Since the offload runtime is trying to allocate the same array again and again, only the first thread succeeds where as the others fail. This results in the incorrect answers.

If you reorder your code such that allocations and free of the array happen only  once then all your threads will report the correct answer. Your code should look similar to this: 

[cpp]

#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>
#define NUM_THREADS 5

#define SIZE 1000000
__attribute__((target(mic))) int * pArray;

void * PrintHello(void *threadid)
{
long tid;
tid = (long)threadid;
printf("Hello World! It's me, thread #%ld!\n", tid);
pthread_exit(NULL);
}

void * pCount(void * threadid)
{
int i;
int iSum = 0;

#pragma offload target(mic:0) \
in(pArray:length(0) alloc_if(0) free_if(0))
#pragma omp parallel for private(i) num_threads(100) reduction(+:iSum)
for(i=0; i<SIZE; i++ )
{
iSum += pArray;
}

printf("%d\n", iSum);
pthread_exit(NULL);
}

int main()
{
pthread_t threads[NUM_THREADS];
int rc;
long t;
int i;

pArray = (int *)malloc(SIZE * sizeof(int));
for(i=0; i<SIZE; i++)
{
pArray = i;
}

#pragma offload_transfer target(mic) in(pArray:length(SIZE) alloc_if(1) free_if(0))

for(t=0; t<NUM_THREADS; t++) {
printf("In main: creating thread %ld\n", t);
rc = pthread_create(&threads, NULL, pCount, (void *)t);
if(rc) {
printf("ERROR; return code from pthread_create() is %d\n", rc);

exit(-1);
}
}

for(t=0; t<NUM_THREADS; t++)
{
pthread_join(threads, NULL);
}

#pragma offload_transfer target(mic) out(pArray:length(SIZE) alloc_if(0) free_if(1))

printf("after join\n");

free(pArray);

printf("after free");
//pthread_exit(NULL);
}

[/cpp]

I must admit that I am unsure why you are spawning 5 pthreads and starting 5 offloads simultaneously on the same coprocessor. It would be better to spawn multiple threads on the host if you were offloading to different coprocessors for each thread. You should also note that through your 5 offloads you are trying to spawn about 500 threads which is more than the number of hardware threads available in the coprocessor. 

Lastly, on further inspection I noticed that your code hangs at the pthread_exit() and not the free. If you comment it out then your code will work just fine. However, I am still unsure of why this is causing your code to hang. I will investigate further and get back to you with what I find. 

0 Kudos
Reply