Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2465 Discussions

segfault at alloca() function in Strassen algorithm

makos999
Beginner
442 Views
Hello Folks,

There is a ready-for-use implementation of Strassen algorithm given by TBB group as an example of using. Here is the original code.
In my application I need to use floats instead of doubles, so that I changed them. Anyway it does not matter mainly about the problem, but numbers below might gives more insight.
A global variable coordinates the size of the matrix in the program, this variable called `int size` in line 19th.

Trying to modify it, in case of using doubles value of `size` variable up to 384 works correctly, but by increasing the value the program is halted by `Segmentation fault`. 384 times 8bytes = 3072 bytes of memory
Anoter case is using of floats, whether number of them can be maximally 575 and cannot more even by one.
575 * 4 = 2300 bytes of memory.

What could be connection between using floats or doubles and the difference about used amount of memory?
I can give you the stack by the help of core-file and gdb. I hope this is a detailed report of the error!

I would like to work with much bigger matrices, how is that possible in this way, whether is that able to work?
[plain]Core was generated by `./Strassen_debug'.
Program terminated with signal 11, Segmentation fault.
#0  0x0804aeca in StrassenMultiply::execute (this=0xb5097aa0) at ../src/Strassen.cpp:329
329                      ay, as, A, ax, ay, as, a_cum6, 0, 0, n_2);
(gdb) bt
#0  0x0804aeca in StrassenMultiply::execute (this=0xb5097aa0) at ../src/Strassen.cpp:329
#1  0xb6da37ab in tbb::internal::custom_scheduler<:INTERNAL::INTELSCHEDULERTRAITS>::local_wait_for_all(tbb::task&, tbb::task*) ()
   from /home/makos/intel/tbb30_056oss/lib/ia32/cc4.1.0_libc2.4_kernel2.6.16.21/libtbb.so.2
#2  0xb6da18d5 in tbb::internal::generic_scheduler::local_spawn_root_and_wait(tbb::task&, tbb::task*&) ()
   from /home/makos/intel/tbb30_056oss/lib/ia32/cc4.1.0_libc2.4_kernel2.6.16.21/libtbb.so.2
#3  0xb6da17dc in tbb::internal::generic_scheduler::spawn_root_and_wait(tbb::task&, tbb::task*&) ()
   from /home/makos/intel/tbb30_056oss/lib/ia32/cc4.1.0_libc2.4_kernel2.6.16.21/libtbb.so.2
#4  0x0804a353 in tbb::task::spawn_root_and_wait (root=...) at /home/makos/intel/tbb30_056oss/include/tbb/task.h:629
#5  0x08049b50 in strassen_mult_par (n=512, A=0xb5938008, ax=0, ay=0, as=512, B=0xb5737008, bx=0, by=0, bs=512, C=0xb5134008, cx=0, cy=0, cs=512, d=1, s=32)
    at ../src/Strassen.cpp:399
#6  0x0804a016 in main (argc=1, argv=0xbfface24) at ../src/Strassen.cpp:456
(gdb) p size
$1 = 512
(gdb) 
[/plain]

This issue in the code highlighted in code like this:
[cpp] // p6 = (a21 - a11) x (b11 + b12)
 double* a_cum6 = (double *) alloca (sizeof(double) * n_2 * n_2);
 double* b_cum6 = (double *) alloca (sizeof(double) * n_2 * n_2);
 matrix_sub(n_2, n_2, A, ax+n_2,
            ay, as, A, ax, ay, as, a_cum6, 0, 0, n_2);[/cpp]
[plain]      /* report from gdb, during debug:
	 (gdb) p a_cum6
	 $8 = (float *) 0xbf7feb50
	 (gdb) p *a_cum6
	 Cannot access memory at address 0xbf7feb50
	 (gdb) p *b_cum6
	 Cannot access memory at address 0xbf77eb40
      */[/plain]
0 Kudos
7 Replies
jimdempseyatthecove
Honored Contributor III
442 Views
alloca is allocating off stack. The dump seems to indicate n_2 is 512. Therefore 8*512*512 = 2MB. Two of these make 4MB. It is likely your stack size is too small. I suggest you consider using allocate as opposed to alloca (or enlarging your stack).

Jim Dempsey
0 Kudos
Alexey-Kukanov
Employee
442 Views
I second to Jim.
In case you need to enlarge the stack of TBB worker threads, specify the desired stack size as the second parameter to the constructor of tbb::task_scheduler_init. Look into TBB Reference Guide for more details.
0 Kudos
makos999
Beginner
442 Views
Thanks!
I've already tried it, but did not help. Some difference is the segfault happens a few lines earlier, not like in the first case.
0 Kudos
Alexey-Kukanov
Employee
442 Views
The "Here is the original code" link in the first post is broken, so it's impossible for now to see the code. Withoutcode it's hard to say what (else) can be wrong there.
0 Kudos
makos999
Beginner
442 Views
Strange, something had overwritten the link somehow... Now the link points to some intel's domain but it wasn't that originally.
Now, here it is: http://pastebin.com/kpWWcEdS
0 Kudos
Alexey-Kukanov
Employee
442 Views
I think I got it actually - probably you also need to increase the stack size of the main (i.e. default) thread created for the process. With Visual Studio on Windows, it is done by /F compiler option or /STACK linker option. On Linux, setrlimit is probably the way to go. Or you can check the guess without source modifications, by using ulimit.
0 Kudos
RafSchietekat
Valued Contributor III
442 Views
Why have such big objects on the stack at all? Legacy code?

Also note that even with increased stack size, it may take a thread out of the worker pool way before any actual overflow occurs, by prohibiting it from stealing other work once the stack is half full or so (please correct me if this fraction or this whole assumption is wrong), which may or may not be relevant in this algorithm (not if alloca only occurs in a leaf task's code).
0 Kudos
Reply