Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
공지
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
5255 토론

VTune Amplifier XE causes system crash/restart

cachecoherent
초급자
1,571 조회수
We have an SGI UV1000 with 8 core Xeon E7's. 64GB of RAM. Operating system is Redhat Linux 6.0 SE.

We created a simple project to generate a Nehalem general hardware profiling report on the /bin/ls executable (pretty basic test)

When the project is run, our system freezes and reboots. This is not a hanging thread nor a specific process that is not responding - the entire system actually experiences a freeze. After reboot, we checked the results of the VTune run and found no results.

Is there some kernel configuration that we must modify in order to let the architecture-specific hardware profiling work?
0 포인트
10 응답
Mark_D_Intel
직원
1,571 조회수
Some background questions:

- What version of Amplifier XE are you using?
- How many total cores (including HT if on)?
- Are there any error messages in the /var/log/messages log starting with 'SEP3_1' or 'amplxe-runsa'?

(An FYI for the likely cause of the problem: The hardware sampling driver allocates memory and the size of the allocation is determined by the number of cores. It is these memory allocations that can fail on machines with extremely large core counts and cause the OS to freeze.)

Mark
0 포인트
cachecoherent
초급자
1,571 조회수
Mark, thanks for the response:

- What version of Amplifier XE are you using?
We are using VTune Amplifier XE for Linux, version 2011 (Update 7) on active support.

- How many total cores (including HT if on)?
We have 32 physical CPUs, 8 cores per CPU, and hyperthreading is disabled.

- Are there any error messages in the /var/log/messages log starting with 'SEP3_1' or 'amplxe-runsa'?
We are checking on this now.

0 포인트
cachecoherent
초급자
1,571 조회수
Mark,

There are no error messages with those terms in /var/log/messages.

However, the system console shows that just before the system freezeup, VTune tried to allocate more memory than was available.

Is there any way to (interactively or by configuration) restrict the amount of memory VTune allocates for each core?
0 포인트
SergeyKostrov
소중한 기여자 II
1,571 조회수
>>...Operating system is Redhat Linux 6.0 SE

Is it a32-bit or 64-bit edition? Is it a Server Edition?

>>...However, the system console shows that just before the system freezeup, VTune tried to allocate more
>>memory than was available...

How much memorywasallocated before VTune crashed?
Did you try toincrease a virtual file size?

I could only assume that an incorrect processing is happening in VTune C/C++ codes, like:

...
*p = ( * )malloc( ... ); // or new(...), or calloc(...), etc

// malloc returns NULL because itfailed toallocate some amount of memory
// and processing continues because there is no verification that p is equal to NULL



// and of courseVTune crashes...
...
0 포인트
cachecoherent
초급자
1,571 조회수
64-bit. Redhat Linux SE stands for Security Enabled. The system is running in permissive mode, non-virtual.

I will check on memory allocation size before the crash.

I think you misunderstand the issue. The entire system is crashing, not just VTune.

I was hoping a simple configuration change to the memory allocation could be applied as a temporary fix.
0 포인트
Rob5
새로운 기여자 II
1,571 조회수

This issue is also being worked via case 657396.

- Rob

0 포인트
SergeyKostrov
소중한 기여자 II
1,571 조회수
64-bit. Redhat Linux SE stands for Security Enabled. The system is running in permissive mode, non-virtual.

I will check on memory allocation size before the crash.

[SergeyK] Any details?

I think you misunderstand the issue. The entire system is crashing, not just VTune.

[SergeyK] I understood the problem completely and I explained why it happens. Another possible
reasonthat VTune corrupts an operating system stack after its memory requestfailed.
After that OS crashes.

I was hoping a simple configuration change to the memory allocation could be applied as a temporary fix.

[SergeyK] If you install more RAM that could help.


Best regards,
Sergey

0 포인트
cachecoherent
초급자
1,571 조회수
Memory allocation size before the crash was shown as something small, around 2GB, much smaller than system RAM. That is the last message in the log, however. Probably not the last operation that occurred.

We have run the latest patch of VTUNE (Dec 20 2011) and the crash still occurs.

Now when we experience a crash, the system does not log a memory allocation. Instead it goes straight into kernel panic.

The system has 2TB RAM in total (64GB per processor, 32 processors). We will not be installing any more RAM. We would expect that 2 Terabytes of RAM is sufficient for the execution of VTUNE Amplifier on /bin/ls.

As per Intel's recommendation, we ran the same Nehalem General Exploration test on the Tachyon sample application that ships with VTUNE Amplifier XE 2011 and found the same result.

I have attached the uvcon log from our machine showing the contents of the kernel panic. Hope that helps.

0 포인트
LM11
초급자
1,571 조회수
Attached is the result from the Amplifier Feedback reporting tool:

amplxe-feedback.exe --create-bug-report=report.txt

0 포인트
Aitcomputing_G_
초급자
1,571 조회수

Hi Mark,

Is the following statement still valid for VTune Amplifier XE 2018u2?

We are having this issue on VTune Amplifier XE 2016, and Intel support's suggestion was to try 2018u2.

We are seeing "SEP3_1" in the /var/log/messages log.

mark-dewing (Intel) wrote:

Some background questions:

- What version of Amplifier XE are you using?
- How many total cores (including HT if on)?
- Are there any error messages in the /var/log/messages log starting with 'SEP3_1' or 'amplxe-runsa'?

(An FYI for the likely cause of the problem: The hardware sampling driver allocates memory and the size of the allocation is determined by the number of cores. It is these memory allocations that can fail on machines with extremely large core counts and cause the OS to freeze.)

Mark

Julia

 

0 포인트
응답