Software Archive
Read-only legacy content

Intel Composer XE2013 - Cilk not linked?

Marek_C_
Beginner
837 Views

Hello everyone,

Yesterday I got fresh version of XE Composer 2013 (finally...). After painless installation, I have turned on the library [Intel Composer XE 2013 - > Use Intel C++]. Furthermore I have set up additional Include Directories: .../mkl/include & ...compiler/include.

Of course I haven't forgotten about the headers -> #include <cilk\cilk.h>.

So after all of this, Visual Studio 2010 is still showing me the error - "_Cilk_for" is undefined, the same for "spawn" and "sync". Why?

Additionally, since I am writing my code in C, I wanted to use C99 standard, but there is also a problem. "Enable C99 support - > Yes (/Qstd = c99)".
When call a loop :
for(int i, ...)
{
...
}
VS tells me that I cannot typedef int i, inside a loop - so C99 is not working.

What have I done wrong? Thank you for all replies.

Regards
Marek

PS. Before I have been using intel MKL, without rest of the XE Composer package - no problems there.

0 Kudos
18 Replies
Barry_T_Intel
Employee
837 Views

I just did the following in VS2010:

  1. Created a new Win32 console application named "fib2010"
  2. Converted the application to build with Intel C++
  3. Typed in the application
  4. Built it.  It built fine.

I've zipped up the project and attached it for you to try.

    - Barry

0 Kudos
TimP
Honored Contributor III
837 Views

As Barry pointed out, one of the necessary steps was to select Intel C++ in your project properties.  Lack of recognition of C99 and cilk_for is an intended property of Microsoft CL (the default for a VS C++ project).

You are right to prefer C99 syntax in cilk_for().   I didn't see it documented, but the induction variable has to have local scope in order to be treated as private (unlike OpenMP syntax for C89).

MKL, on the other hand, works equally well with CL or ICL compilers.

0 Kudos
Marek_C_
Beginner
837 Views

Hey everyone,

Thanks for your answers. I have seen Fib example on the internet (manuals I think) so I know it is supposed to work. In my opinion I have already "told" VS to use Intel c++ compiler and everything. I have even notcied a bit of spike in speed of my program, since I am using few intel mkl functions.

When I was browsing internet for some solutions to my problem, I have found a document, where I was supposed to add additional macro in preprocessor (probably I will misspell something here) "AOS_CILK_FOR". So, I have done that, and there is no longer an error about cilk_for, so I can build my program and run it. But...
1) I didnt notice any improvement in speed and (sorry if I am wrong, I am pretty basic programmer) I don't think this work, like
cilk_for(i = 0; i < 5; ++i)
{
...here I have some operations on arrays, calling mkl functions, and there is no operation like A = 5 + A[i-1] (so no conflicts between loop working in parallel) ...
printf("Finished run number %d\n, i+1);
}

Here I was kinda expecitng to see, i.e. "Finished run number 5, 2, 4, 1, 3" since I am using cilk, but I still get the same 1,2,3,4,5 pattern as without it.

2) Still no C99 and no array notation A[:][0:10] = ...

My guess is that this preprocessor directive, forces the VS to accept cilk_for call, but I doesn't change anything.

I am 99% sure (there is never 100%) that I did everything exactly like I was told in user manuals and other websites, but I still find it strange, that I have this kind of problems.

Marek

0 Kudos
TimP
Honored Contributor III
837 Views

Recognition of array notation and /Qstd=c99 (no embedded spaces) would come automatically when Intel C++ is switched into your project.

There has been some documentation of the cilk_for worker assignment algorithm.  Admittedly, spending the effort to understand that is contrary to the simplification hoped for from Cilk(tm) Plus.   I suppose it's entirely possible with only 5 iterations they may all be assigned to a single worker, as the algorithm attempts to avoid multiple workers when it would be slower.

0 Kudos
Jim_S_Intel
Employee
837 Views

Do you see speedups with the "fib" example that Barry attached?    If you are trying to figure out whether you've set up Cilk Plus on your system properly, it is easiest to start with a program that is known to exhibit speedups.

As far as I know, there is no generic "AOS_CILK_FOR" macro that needs to be defined.   The only document I found on the web with that macro is describing a particular application (Sepia filter), and that macro seems to be an application-specific macro?   I don't quite know what problem you are seeing is... perhaps if you post a code example with the problem, someone will be able to spot something?

If you are still seeing "1 2 3 4 5", then it suggests that no steals are happening from the cilk_for.  Several possibilities come to mind:

1.  If the work of the entire cilk_for loop is is too small, then a steal is unlikely to happen before the worker that starts the loop also ends up finishing the loop.   This behavior is the expected one for a work stealing scheduler, which is what the Cilk Plus runtime uses.
How long does the entire loop take to execute serially?  Do you know how many cores are available on the system you are running on

2. You could have problems with false sharing in your arrays, especially if your arrays are small.

3.  Are the MKL functions that you are calling themselves multithreaded?   If so, then it is possible that the MKL threads could be using all the cores on the system, and that might be keeping the Cilk Plus runtime workers from stealing in the cilk_for loop

There could be other reasons, but those are the ones I can think of at the moment.
Cheers,

Jim

0 Kudos
Marek_C_
Beginner
837 Views

Hello all,

Thanks for all those answers. You really helped me a lot.

On one side, Cilk is turned on, I can use array notation, on the other hand sometimes it doesn't work.

Piece of code of sample code:

Here is how I operate on vectors and matrices (row-major):
typedef struct Matrix{
    int width;
    int height;
    double ** pMatrix;
    double * pContinous;
} Matrix;

typedef struct Vector{
    int height;
    double * pVector;
} Vector;

And here are sample operations:

int vector_log(Vector * pOutput, Vector * pInput)
{
    if(initVectorDim(pOutput, pInput->height) != 0)
        return 1;
    pOutput->pVector[:] = log(pInput->pVector[:]);
    return 0;
}

This code works nice. VS still underlines the [:] and says: "Error: expected an expression" but I can compile and run it.

But...

int matrix_add_scalar(Matrix * pInput, double scalar)
{
    for(int i = 0; i < pInput->height; ++i)
    {
        pInput->pMatrix[:] += scalar;
    }
    return 0;
}

Here VS tells me that there is compilation error of type:
IntelliSense: expected an expression

The same problem is with even simpler operation:

int matrix_zeros(Matrix * pMatrix)
{
    pMatrix->pContinous[:] = 0;
    return 0;
}

I have no idea how to deal with that problem. It is like Cilk is working but not working. Any ideas?

Marek

EDIT:

New observations so far. So in my program I have many *.c files with code. vector.c and matrix.c are libraries I have created, woth many vector / matrix operations. When I changed every loop in vector.c to cilk array notation, program compiles and runs without any problem. So I tried to do the same with matrix.c. When I am in vector.c tab I get errors from intellisense about vector and error #10298. When I go to matrix.c, the same situation but intellisense errors are about matrix.c + #10298. When I am in main.c tab I only get #10298.

I have commented every array notation operation in matrix.c - program compiles and is ready to run.
In main.c I have made simple operation:

Matrix temporary;

    if(initMatrixDim(&temporary, 2, 10) != 0)
        return 1;

    temporary.pMatrix[0][:] = 1;
    temporary.pMatrix[1][:] = 2;

Again, IntelliSense errors. It looks like VS and compiler allows me use array notation in only one file.

EDIT2:

When I used #include <cilk\cilk.h> I cannot use cilk_for function. Only when I add one more include - #include <cilk\cilk_stub.h>  cilk_for becomes available.

Something here is very wrong, and I do not know what.

0 Kudos
Jim_S_Intel
Employee
837 Views

I believe implicit arguments for the indexes of arrays (e.g., a[:]) is only allowed in cases when the compiler is able to figure out the dimensions of the array statically, at at compile time.   I don't know that that is true for your Matrix or Vector classes.
You might try specifying the offset / length arguments explicitly.

Not sure if that is the specific error the compiler is complaining about in your case, but it may be something to watch out for.

Cheers,

Jim

0 Kudos
Barry_T_Intel
Employee
837 Views

You should not be using cilk_stub.h.  That's using macros to replace _Cilk_for with an ordinary serial for loop.  Needing to use cilk_stub.h is a sign that you aren't using the Intel compiler.

Do you have main.c and matrix.c in different projects?  If so, you'll need to modify both projects to use the Intel compiler.

    - Barry

0 Kudos
Marek_C_
Beginner
837 Views

@Jim Sukha

Hmmm, it is true that maybe compiler may not know the dimensions, because they are held in the structure under integer variables like (for matrix):
int height;
int width;

So what if I try:     pInput->pVector[0:pInput->height] ?

@Barry Tannenbaum

When I wrote the edit2 about it I have noticed that in cilk_stub, I also have defined _cilk_for, _cilk_spawn, _cilk_sync, so those may overwrite the same defines in cilk.h

BUT, without cilk_stub the compiler doesn;t even see those three in cilk.h. I have no idea how intel compiler is not turned on, since under properties I have clicked "use intel c++" and I see additional options, and in output during compilation it says that intel c++ is used. I have also set up some of the optimalization for intel, and diagnostics to get more thorough report.

When I don't ise array notation (but cilk_for with cilk_stub.h - now I know that was mistake) I got many warnings that all of my loops cannot be vectorized or parallel is not efficient - that is the sign that compiler is working - C99 also started to work, I can declare private variables in loops.

Intel compiler should be working fully, not only few custmizations like C99 and diagnostics.

This is a very strange problem to me, and since I am beginner in C, not mentioning Intel composer, I have no idea what I have done wrong and how to fix it.

Edit:

Oh, I forgot. I have only one project with many header and code files. In headers I just declare the functions, in *.c files I wrote the definitions. So there is no possibility that intel compiler works for one file, but not for others since they are in the same project.

0 Kudos
Jim_S_Intel
Employee
837 Views

I believe something like "pInput->pVector[0:pInput->height]" should work.
You might also try code from one of the sample programs on the website to see if that works for you.

http://www.cilkplus.org/tutorial-set-sequence

Jim

0 Kudos
ARCH_R_Intel
Employee
837 Views

The [:] form works only for complete array types (i.e. array dimension known), not pointers or incomplete array types.  The example below shows some correct and incorrect uses of the [:] form.

[cpp]

int a[4][4];
int (*b)[4];
int *c[4];

void foo() {
    a[:][:] = 0; // Okay
    b[:][0] = 0; // Wrong
    b[0][:] = 0; // Okay
    c[:][0] = 0; // Okay
    c[0][:] = 0; // Wrong
}

[/cpp]

0 Kudos
Marek_C_
Beginner
837 Views

My idea to use array notation was to fully vectorize my matrix/vector operations - I really need to boost speed of my code, since is computation cost is significant. Matrices are row-major so I was hoping to use single for loop A->pMatrix[:], but now I understand that dimensions must be known.
If by any chance A->pMatrix[0:A->height] is not working, how can I change my class for matrices to enable this possibility? All of my matrices and vectors are dynamically allocated, so I need to keep height and width values for each of them, since my for loops are iterating till those values.

I am very grateful how many useful information I have received from you.

Regards,

Marek

0 Kudos
Jim_S_Intel
Employee
837 Views

The notation "A->pMatrix[0:A->height]" should work.

But even if array notation were not available, it is still possible to write the equivalent code using a normal for loop and rely on the compiler to vectorize the loop.   Array notation is convenient syntax, but it is not a requirement for vectorization.

You can use the -vec-report flag to see which loops the compiler is vectorizing.

http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-win/index.htm#GUID-3D61D83A-857D-49C3-A6C9-A1037BFA63CD.htm

Jim

0 Kudos
ARCH_R_Intel
Employee
837 Views

An alternative to array notation is a for-loop marked with #pragma simd.  I generally prefer #pragma simd since it lets me write multiple assignments in the same loop.

0 Kudos
Marek_C_
Beginner
837 Views

So far:

Error #10298 pops-up when there are some problems like beforementioned situation where length must be specified for incomplete array.

A.pVector[0:A.height] does not work, but if I create the variable like:
int size = A.height
A.pVector[0:size] shows no error.

Cilk_for is still not recognized, despite #include <cilk/cilk.h>. However, when I use this cilk_for and my loop is not single entry, single exit, compiler shows an error about it. Intel compiler knows how to behave when cilk_for is used, but doesn't recognize cilk_for.

http://cilkplus.org/tutorial-array-notation

In this website I have noticed very interesting builtin functions for array sections, like __sec_reduce_mul (A[:])
I want to use that nice functions, since sum += A.pVector[0:size]; causes error: rank mismatch in array section expression

I guess they should come together with cilk_for, but since that one is not working, neither will others.

0 Kudos
ARCH_R_Intel
Employee
837 Views

Can you post a small example that demonstrates that A.pVector[0:A.height] will not work?  If so, I can send a bug report to the compiler team.  

The spelling of the keyword is "_Cilk_for", not Cilk_for.  <cilk/cilk.h> has a #define cilk_for _Cilk_for.

A.pVector[0:size] has rank 1, sum has rank 0 since it is a scalar, hence the mismatch.  __sec_reduce_add(A[0:size]) computes the sum of A[0:size] and returns a rank 0 result.  You do not need any include files to use __sec_reduce_add.  A dot product of two vectors A and B would be expressed as __sec_reduce_add(A[0:size]*B[0:size]).

0 Kudos
TimP
Honored Contributor III
837 Views

Arch D. Robison (Intel) wrote:

An alternative to array notation is a for-loop marked with #pragma simd.  I generally prefer #pragma simd since it lets me write multiple assignments in the same loop.

With increasingly aggressive fusion in icc, array notation shouldn't inhibit optimization.  I did find it necessary to check opt-report for desired and bad fusion.  The latter can be prevented by #pragma nofusion.

I'd wonder about best policy for reductions, whether they need to be expressed as separate reducers.  We've been warned against using #pragma simd where there are unsupported reductions, and confused by the partial introduction of #pragma omp simd.

0 Kudos
ARCH_R_Intel
Employee
837 Views

#pragma simd is a work in progress, and like any other evolving language feature has different versions.   There are essentially three versions of it:

  1. The public specification, which is what is driving current work on the gcc and LLVM versions.  We continue to grind burs off this spec.
  2. The Intel compiler version, which has extensions such as "assert".
  3. The OpenMP 4.0 release candidate, which changes spellings and lifts some restrictions on control flow (e.g. some gotos allowed).

I personally try to stick to using (1) for now, figuring that eventually I'll just have to respell the pragmas when (3) is widely supported.  I believe the set of supported reductions is the same for all 3, since the supported reductions are the supported reductions in OpenMP (though through an oversight of ours, we didn't say which version of OpenMP.:-(  That's one of the burs to grind off.)  My impression is that the reductions currently supported are the ones for built-in types.  I have not tried pragma simd reduction for a user-defined type.

0 Kudos
Reply