- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As part of a 2D spline interpolation routine, I'm calling dgesv(). That routine is giving me a segmentation fault in MKL_get_N_Cores(). The debugger output is:
Dump of assembler code for function MKL_get_N_Cores:
0x0063f190 <+0>: push %ebx
0x0063f191 <+1>: push %esi
0x0063f192 <+2>: push %edi
0x0063f193 <+3>: push %ebp
0x0063f194 <+4>: sub $0x4ecc,%esp
0x0063f19a <+10>: call 0x63f19f
0x0063f19f <+15>: pop %edi
0x0063f1a0 <+16>: lea 0x4671c9(%edi),%edi
0x0063f1a6 <+22>: cmpl $0x1,0xa1f74(%edi)
0x0063f1ad <+29>: je 0x63f1c1
0x0063f1af <+31>: mov %edi,%ebx
0x0063f1b1 <+33>: call 0x63b820
0x0063f1b6 <+38>: mov %eax,%esi
0x0063f1b8 <+40>: cmpl $0xffffffff,0x3658(%edi)
0x0063f1bf <+47>: je 0x63f1cc
0x0063f1c1 <+49>: add $0x4ecc,%esp
0x0063f1c7 <+55>: pop %ebp
0x0063f1c8 <+56>: pop %edi
0x0063f1c9 <+57>: pop %esi
0x0063f1ca <+58>: pop %ebx
0x0063f1cb <+59>: ret
The crash occurs at the line: call. This is running on Ubuntu with g++ compiler. It was using the static libraries. I switched to the dynamic libraries and got the same fault. The 2D spline code is included in another app. I fed the same input file to the 2nd app and it works fine. I verified with the debugger that the arguments to dgesv with the two apps were identical. The app that crashes uses about 1.2GB of RAM while the app that doesn't crash uses about 100MB.
Any idea what's causing this? Or suggestions for a workaround.
Bruce
Dump of assembler code for function MKL_get_N_Cores:
0x0063f190 <+0>: push %ebx
0x0063f191 <+1>: push %esi
0x0063f192 <+2>: push %edi
0x0063f193 <+3>: push %ebp
0x0063f194 <+4>: sub $0x4ecc,%esp
0x0063f19a <+10>: call 0x63f19f
0x0063f19f <+15>: pop %edi
0x0063f1a0 <+16>: lea 0x4671c9(%edi),%edi
0x0063f1a6 <+22>: cmpl $0x1,0xa1f74(%edi)
0x0063f1ad <+29>: je 0x63f1c1
0x0063f1af <+31>: mov %edi,%ebx
0x0063f1b1 <+33>: call 0x63b820
0x0063f1b6 <+38>: mov %eax,%esi
0x0063f1b8 <+40>: cmpl $0xffffffff,0x3658(%edi)
0x0063f1bf <+47>: je 0x63f1cc
0x0063f1c1 <+49>: add $0x4ecc,%esp
0x0063f1c7 <+55>: pop %ebp
0x0063f1c8 <+56>: pop %edi
0x0063f1c9 <+57>: pop %esi
0x0063f1ca <+58>: pop %ebx
0x0063f1cb <+59>: ret
The crash occurs at the line: call
Any idea what's causing this? Or suggestions for a workaround.
Bruce
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Bruce,
Could you please give me a testcase so that I can reproduce it and figure out and dig more into what could be the problem?
Thanks,
Sridevi
Could you please give me a testcase so that I can reproduce it and figure out and dig more into what could be the problem?
Thanks,
Sridevi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll try. I thought of another difference between the 2 apps. The app that fails is using real-time extensions (Xenomai). And the app that works is not. I'll write a small test app. If it works, I'll add a few Xenomai calls and see if it fails.
This may take a few days.
Bruce
This may take a few days.
Bruce
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bruce,
Thanks for your time and efforts to create small testcase to reproduce the problem.
Just my guess however, MKL_get_N_Cores function tries to recognize CPU topology, but if Xenomai framework changesit via shadowing some CPU parameters (for example, CPU affinity) then MKLmight be confused somehow but must not crashed anyway.
Thanks for your time and efforts to create small testcase to reproduce the problem.
Just my guess however, MKL_get_N_Cores function tries to recognize CPU topology, but if Xenomai framework changesit via shadowing some CPU parameters (for example, CPU affinity) then MKLmight be confused somehow but must not crashed anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Attached are 2 test cases. The one built as a linux program (mkltest)
works. The xenomai version (mklxentest) fails with the same fault.
I took the dgesv example and turned it into a function. In the xenomai version, main() makes a couple of xenomai calls to create and run it as a task.
Bruce
I took the dgesv example and turned it into a function. In the xenomai version, main() makes a couple of xenomai calls to create and run it as a task.
Bruce
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bruce,
Nothing was attached in your previous post. Please try again.
Also, it would be helpful to add some description how to run your tests. E.g. how tocreate xenomai environment and run the second test
Nothing was attached in your previous post. Please try again.
Also, it would be helpful to add some description how to run your tests. E.g. how tocreate xenomai environment and run the second test
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found the answer. When a xenomai task is created, you tell it the
amount of stack space to allocate. From the documentation "The size of
the stack (in bytes) for the new task. If zero is passed, a reasonable
pre-defined size will be substituted." We were passing 0. When I
increased the stack size to 1MB, then the MKL calls didn't crash.
As far as I can tell the files are there. Not sure what I need to do so you can access the files.
As far as I can tell the files are there. Not sure what I need to do so you can access the files.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page