Processors
Intel® Processors, Tools, and Utilities
14400 Discussions

HW Error i9-10850K - Is it CPU or RAM ?

erstrauss
Beginner
6,860 Views

Hi All,

My system:
CPU: i9-10850K
MB: ROG STRIX Z490-E GAMING
RAM: Corsair 32GB, 2 X 16GB
OS: Fedora Linux 33

I see the following, hardware machine check events:

Apr 24 21:59:27 localhost.localdomain mcelog[941]: Hardware event. This is not a software error.
Apr 24 21:59:27 localhost.localdomain mcelog[941]: MCE 0
Apr 24 21:59:27 localhost.localdomain mcelog[941]: CPU 1 BANK 0 TSC 1307f2e3d5ac6
Apr 24 21:59:27 localhost.localdomain mcelog[941]: TIME 1619315967 Sat Apr 24 21:59:27 2021
Apr 24 21:59:27 localhost.localdomain mcelog[941]: MCG status:
Apr 24 21:59:27 localhost.localdomain mcelog[941]: MCi status:
Apr 24 21:59:27 localhost.localdomain mcelog[941]: Error overflow
Apr 24 21:59:27 localhost.localdomain mcelog[941]: Corrected error
Apr 24 21:59:27 localhost.localdomain mcelog[941]: Error enabled
Apr 24 21:59:27 localhost.localdomain mcelog[941]: MCA: Internal parity error
Apr 24 21:59:27 localhost.localdomain mcelog[941]: STATUS d000004000010005 MCGSTATUS 0
Apr 24 21:59:27 localhost.localdomain mcelog[941]: MCGCAP c10 APICID 2 SOCKETID 0
Apr 24 21:59:27 localhost.localdomain mcelog[941]: MICROCODE e2
Apr 24 21:59:27 localhost.localdomain mcelog[941]: CPUID Vendor Intel Family 6 Model 165 Step 5
Apr 24 21:59:27 localhost.localdomain mcelog[941]: mcelog: warning: 8 bytes ignored in each record
Apr 24 21:59:27 localhost.localdomain mcelog[941]: mcelog: consider an update
...
Apr 24 21:59:58 localhost.localdomain mcelog[941]: Hardware event. This is not a software error.
Apr 24 21:59:58 localhost.localdomain mcelog[941]: MCE 0
Apr 24 21:59:58 localhost.localdomain mcelog[941]: CPU 5 BANK 0 TSC 13098a1328846
Apr 24 21:59:58 localhost.localdomain mcelog[941]: TIME 1619315998 Sat Apr 24 21:59:58 2021
Apr 24 21:59:58 localhost.localdomain mcelog[941]: MCG status:
Apr 24 21:59:58 localhost.localdomain mcelog[941]: MCi status:
Apr 24 21:59:58 localhost.localdomain mcelog[941]: Corrected error
Apr 24 21:59:58 localhost.localdomain mcelog[941]: Error enabled
Apr 24 21:59:58 localhost.localdomain mcelog[941]: MCA: Internal parity error
Apr 24 21:59:58 localhost.localdomain mcelog[941]: STATUS 9000004000010005 MCGSTATUS 0
Apr 24 21:59:58 localhost.localdomain mcelog[941]: MCGCAP c10 APICID a SOCKETID 0
Apr 24 21:59:58 localhost.localdomain mcelog[941]: MICROCODE e2
Apr 24 21:59:58 localhost.localdomain mcelog[941]: CPUID Vendor Intel Family 6 Model 165 Step 5
Apr 24 21:59:58 localhost.localdomain mcelog[941]: mcelog: warning: 8 bytes ignored in each record
Apr 24 21:59:58 localhost.localdomain mcelog[941]: mcelog: consider an update
Apr 24 22:00:27 localhost.localdomain mcelog[941]: Hardware event. This is not a software error.
Apr 24 22:00:27 localhost.localdomain mcelog[941]: MCE 0
Apr 24 22:00:27 localhost.localdomain mcelog[941]: CPU 7 BANK 0 TSC 130b13c272bb0
Apr 24 22:00:27 localhost.localdomain mcelog[941]: TIME 1619316027 Sat Apr 24 22:00:27 2021
Apr 24 22:00:27 localhost.localdomain mcelog[941]: MCG status:
Apr 24 22:00:27 localhost.localdomain mcelog[941]: MCi status:
Apr 24 22:00:27 localhost.localdomain mcelog[941]: Corrected error
Apr 24 22:00:27 localhost.localdomain mcelog[941]: Error enabled
Apr 24 22:00:27 localhost.localdomain mcelog[941]: MCA: Internal parity error
Apr 24 22:00:27 localhost.localdomain mcelog[941]: STATUS 9000004000010005 MCGSTATUS 0
Apr 24 22:00:27 localhost.localdomain mcelog[941]: MCGCAP c10 APICID e SOCKETID 0
Apr 24 22:00:27 localhost.localdomain mcelog[941]: MICROCODE e2
Apr 24 22:00:27 localhost.localdomain mcelog[941]: CPUID Vendor Intel Family 6 Model 165 Step 5
Apr 24 22:00:27 localhost.localdomain mcelog[941]: mcelog: warning: 8 bytes ignored in each record
Apr 24 22:00:27 localhost.localdomain mcelog[941]: mcelog: consider an update
Apr 24 22:15:25 localhost.localdomain mcelog[941]: Hardware event. This is not a software error.
Apr 24 22:15:25 localhost.localdomain mcelog[941]: MCE 0
Apr 24 22:15:25 localhost.localdomain mcelog[941]: CPU 1 BANK 0 TSC 133a192c3c238
Apr 24 22:15:25 localhost.localdomain mcelog[941]: TIME 1619316925 Sat Apr 24 22:15:25 2021
Apr 24 22:15:25 localhost.localdomain mcelog[941]: MCG status:
Apr 24 22:15:25 localhost.localdomain mcelog[941]: MCi status:
Apr 24 22:15:25 localhost.localdomain mcelog[941]: Corrected error
Apr 24 22:15:25 localhost.localdomain mcelog[941]: Error enabled
Apr 24 22:15:25 localhost.localdomain mcelog[941]: MCA: Internal parity error
Apr 24 22:15:25 localhost.localdomain mcelog[941]: STATUS 9000004000010005 MCGSTATUS 0
Apr 24 22:15:25 localhost.localdomain mcelog[941]: MCGCAP c10 APICID 2 SOCKETID 0
Apr 24 22:15:25 localhost.localdomain mcelog[941]: MICROCODE e2
Apr 24 22:15:25 localhost.localdomain mcelog[941]: CPUID Vendor Intel Family 6 Model 165 Step 5
Apr 24 22:15:25 localhost.localdomain mcelog[941]: mcelog: warning: 8 bytes ignored in each record
Apr 24 22:15:25 localhost.localdomain mcelog[941]: mcelog: consider an update

1. I'm running with default setting
2. CPU is not getting hot, up to 55C
3. the dmesg command output includes:

[93947.269384] mce_notify_irq: 2 callbacks suppressed
[93947.269387] mce: [Hardware Error]: Machine check events logged

my questions:

1. Are the above errors in dicate CPU issue or RAM issue.
2. What are the next steps to isolate it, and fix the issue?
3. What diagnostics tools evailable to isolate the root cause of the issue?

I'll appreciate your help.

Thank you.

 

0 Kudos
29 Replies
DeividA_Intel
Moderator
5,484 Views

Hello erstrauss,  

  


Thank you for posting on the Intel® communities.   

  


In order to better assist you, please provide the following:  


  


1. Run the Intel® System Support Utility (Intel® SSU) to gather more details about the system.  


· Download the Intel® SSU and save the application on your computer: https://downloadcenter.intel.com/download/26735/Intel-System-Support-Utility-for-the-Linux-Operating-System  


· Open the application, check the "Everything" checkbox, and click "Scan" to see the system and device information. The Intel® SSU defaults to the "Summary View" on the output screen following the scan. Click the menu where it says "Summary" to change to "Detailed View".  


· To save your scan, click Next and click Save.  


2. Do you receive the same errors or similar on Windows?


3. Have you tried reinstalling the operating system?


4. Have you performed any troubleshooting steps?


5. Is this issue recent or old?


6. Are you presenting issues with your computer? Ex: performance, low FPS, crashes, etc 




 

Regards,    


Deivid A. 

Intel Customer Support Technician 


0 Kudos
erstrauss
Beginner
5,478 Views

Hi Deivid A,

Thank you for your reply.

1. Please see attached ssu output from the system, the files are a bit different as I installed additional packages which are mentioned in the script.

2. This box doesn't run Windows.

3. I installed the OS, and keep the kernel updated, I can run any other linux from live-usb-stick if needed, please suggest.

4. Yes, I looked in the ia32 architecture books, and tried to decode the Status, which got me to post this question.

I can switch the DIMMs but I suspect it is not a RAM issue.

5. The box was build early January, and was tested memory tester and passed the 24 hours run,

Checking the systems logs, I see the Machine check event, from January 19.

So it seems that it was never running without this event.

6. The system did crashed few times, and some applications get segmentation fault on instructions that do not access memory, like xor %rdi, %rdi   or lea.

Other than that it works ok and no other issues.

I'll be happy to run any stress application to test it.

Please let me know if more information is needed.

Thank you.

Erstrauss

0 Kudos
erstrauss
Beginner
5,475 Views

Hi Deivid A,

In order to verify that it is not OS specific, I booted the system to Intel-Clear-Linux usb.

I ran the following commands to cause the system to generate these errors,

git clone http://github.com/erez-strauss/lockfree_mpmc_queue
cd lockfree_mpmc_queue/
make -j
make report

while running the 'make report' command, I got the following event, again:

Apr 27 02:57:17 clr-live mcelog[497]: Hardware event. This is not a software error.
Apr 27 02:57:17 clr-live mcelog[497]: MCE 0
Apr 27 02:57:17 clr-live mcelog[497]: CPU 5 BANK 0 TSC 3e4875aa7f4
Apr 27 02:57:17 clr-live mcelog[497]: TIME 1619492237 Tue Apr 27 02:57:17 2021
Apr 27 02:57:17 clr-live mcelog[497]: MCG status:
Apr 27 02:57:17 clr-live mcelog[497]: MCi status:
Apr 27 02:57:17 clr-live mcelog[497]: Corrected error
Apr 27 02:57:17 clr-live mcelog[497]: Error enabled
Apr 27 02:57:17 clr-live mcelog[497]: MCA: Internal parity error
Apr 27 02:57:17 clr-live mcelog[497]: STATUS 9000004000010005 MCGSTATUS 0
Apr 27 02:57:17 clr-live mcelog[497]: MCGCAP c10 APICID a SOCKETID 0
Apr 27 02:57:17 clr-live mcelog[497]: MICROCODE e2
Apr 27 02:57:17 clr-live mcelog[497]: CPUID Vendor Intel Family 6 Model 165 Step 5
Apr 27 02:57:17 clr-live kernel: mce: [Hardware Error]: Machine check events logged
Apr 27 02:57:17 clr-live mcelog[497]: mcelog: Cannot send telemetry record in mcelog: No such file or directory
Apr 27 02:57:17 clr-live mcelog[497]: mcelog: Error sending telemetry record: No such file or directory

...

pr 27 03:13:53 clr-live mcelog[497]: Hardware event. This is not a software error.
Apr 27 03:13:53 clr-live mcelog[497]: MCE 0
Apr 27 03:13:53 clr-live mcelog[497]: CPU 7 BANK 0 TSC 7273d8469d4
Apr 27 03:13:53 clr-live mcelog[497]: TIME 1619493233 Tue Apr 27 03:13:53 2021
Apr 27 03:13:53 clr-live mcelog[497]: MCG status:
Apr 27 03:13:53 clr-live mcelog[497]: MCi status:
Apr 27 03:13:53 clr-live mcelog[497]: Corrected error
Apr 27 03:13:53 clr-live mcelog[497]: Error enabled
Apr 27 03:13:53 clr-live mcelog[497]: MCA: Internal parity error
Apr 27 03:13:53 clr-live mcelog[497]: STATUS 9000004000010005 MCGSTATUS 0
Apr 27 03:13:53 clr-live mcelog[497]: MCGCAP c10 APICID e SOCKETID 0
Apr 27 03:13:53 clr-live mcelog[497]: MICROCODE e2
Apr 27 03:13:53 clr-live mcelog[497]: CPUID Vendor Intel Family 6 Model 165 Step 5
Apr 27 03:13:53 clr-live kernel: mce: [Hardware Error]: Machine check events logged
Apr 27 03:13:53 clr-live mcelog[497]: mcelog: Cannot send telemetry record in mcelog: No such file or directory
Apr 27 03:13:53 clr-live mcelog[497]: mcelog: Error sending telemetry record: No such file or directory

 

I hope this experiment rules out the OS.

Thank you very much.

Erstrauss

 

0 Kudos
DeividA_Intel
Moderator
5,461 Views

Hello erstrauss, 



In order to help you further with the issue, I would like you to provide/try the following:



1. Have you overclocked the processor? Or have you taken it above the base frequency? (Processor Base Frequency 3.60 GHz)


2. Have you enabled Intel® Extreme Memory Profile (Intel® XMP? If so, try to disable it or choose the frequency supported by your CPU (2933 MHz)


3. Have you checked with Asus for any possible compatibility issue with the Linux family?



Troubleshooting steps:


1. Try a minimal configuration. CPU> 1 stick of ram> no video card


2. If possible, try with a different CPU or RAM.


3. Try a clear CMOS or BIOS recovery. You may need to check with Asus for the steps.


4. If possible, take the computer to a repair store.




Best regards, 


Deivid A.  

Intel Customer Support Technician 


0 Kudos
DeividA_Intel
Moderator
5,446 Views

Hello erstrauss, 


  


Were you able to check the previous post and try the steps provided? Please let me know if you need more assistance.   


  


Regards,  


Deivid A.  

Intel Customer Support Technician  


0 Kudos
erstrauss
Beginner
5,441 Views

Hi Deivid,

Thank you for your follow up. Sorry I had few busy days.

The current system runs without graphic card, and uses the on chip graphics.

Yes, I tried plugging out one of the 16GB DIMM, moving the DIMM to different sockets, and retested - seeing  the same machine check events.

next I'll try: resetting the BIOS, but it doesn't seems the issue.

1. I did not overclock the CPU or memory; should I play with the cpu/memory voltage or clock?

2. The computer was built from its components (CPU, RAM, MB, Fan, NVMe, PSU, Case). If I take it to a store, I would like to know what should I tell them to replace - which brings me to the question: is it CPU or RAM.

3. Is it possible that some of the default setting of the MB, place the CPU / MEM outside of the recommended zone?

4. Is there any stress/diagnostics program that I should try? for reproducing the issue and getting more detailed information.

5. If there are more diagnostics tools on Windows, I might boot a windows USB, and test it.

Thank you,

ErStrauss

 

0 Kudos
n_scott_pearson
Super User
5,434 Views
What is the exact model number of your DIMMs? Did you try disabling XMP?
...S
0 Kudos
erstrauss
Beginner
5,416 Views

The DIMMs:


Corsair Vengeance LPX DDR4
CMK32GX4M2E3200C16
DDR4 32GB(2X16GB) 205100197520892
3200MHz 16-20-20-38 1.35V ver 3.44

 

0 Kudos
erstrauss
Beginner
5,395 Views

Hi Deivid an Scott,

I cleared the CMOS and chose intel default settings.

I tested with each of the DIMMs and I get the same error.

Which leads me to the following options:

1. MB BIOS chooses wrong memory or CPU  timing configuration.

2. CPU issue.

3. DIMM (both of them) have the same issue? 

My next steps:

BIOS upgrade, just release few days ago.

Are there any other directions / testing I need to do?

How would you recommend to get the system to a stable state.

Please advise

Thank you for your help,

ErStrauss

0 Kudos
erstrauss
Beginner
5,357 Views

I upgraded the BIOS, no change, still same errors.

0 Kudos
DeividA_Intel
Moderator
5,347 Views

Hello erstrauss, 



Thanks for the information provided, and I would like to let you know that there is no need to play with the CPU/memory voltages since this could damage the CPU and/or the motherboard with the time.


Also, I would like you to try the following:


1. If you are willing to use Windows, try to check if you have the same performance issue with Windows.


2. With Windows, use the Intel® Processor Diagnostic Tool to check if the CPU is defective:

- https://downloadcenter.intel.com/download/19792


3. You can explain the situation in the repair store and ask to run a diagnostic focusing on the CPU and RAM.


4. Make sure that the RAM is running at 2933MHz and the CPU at 3.60 GHz.



If you noticed any error with the Intel® Processor Diagnostic Tool, save the results and send them in your next post.




  

Best regards, 


Deivid A.  

Intel Customer Support Technician 


0 Kudos
erstrauss
Beginner
5,298 Views

Hi Deivid,

I will give it a try, but that would take few more days.

Thanks,

ErStrauss

0 Kudos
DeividA_Intel
Moderator
5,292 Views

Hello erstrauss, 



Thanks for the update, take your time to try the recommendation. I will be waiting for the outcome.




Best regards, 


Deivid A.  

Intel Customer Support Technician 


0 Kudos
DeividA_Intel
Moderator
5,231 Views

Hello erstrauss, 


  


Were you able to try the steps recommended? Please let me know if you need more assistance.   


  


Regards,  



Deivid A.  

Intel Customer Support Technician  


0 Kudos
erstrauss
Beginner
5,225 Views

Hi Deivid,

Yes, I installed windows 10 on a separate HDD, and ran the test, it pass all tests.

-------------------------


--- Genuine Intel Test ---
...
Version 1.0.19.64b.W
...

Expected -- GenuineIntel
Detected -- GenuineIntel

Genuine Intel CPU Module Success

--- Brand String Test ---
...
Version 1.0.23.64b.W
...

IntelR CoreTM i9-10850K CPU 3.60GHz

Brand String Module Success....

--- Cache Test ---
...
Version 1.0.18.64b.W
...

--- Reading Cache Size ---

- Detected L1 Data Cache Size -- 32
- Detected L1 Inst Cache Size -- 32
- Detected L2 Cache Size -- 256
- Detected L3 Cache Size -- 20480

Cache Module Success

--- MMXSSE Test ---
...
Version 1.0.25.64b.W
...
..DetectUtils64 DLL Version - 1.1.3

--- Determining MMX - SSE capabilities ---
..MMX is supported on this CPU..
..SSE is supported on this CPU..
..SSE2 is supported on this CPU..
..SSE3 is supported on this CPU..
..SSSE3 is supported on this CPU..
..SSE4.1 is supported on this CPU..
..SSE4.2 is supported on this CPU..

Testing MMX
Dot Product computed using C code 506
Dot Product computed using MMX intrinsics 506
MMX Dot Product Computation Test Passed
Passed MMX Test

Testing SSE
Dot Product computed using C code 506
Dot Product computed using SSE intrinsics 506
SSE Dot Product Computation Test Passed
Passed SSE Test

Testing SSE2
Complex Product computed using C code 23.00 -2.00i
Complex Product computed using SSE2 code 23.00 -2.00i
SSE2 Complex Product Computation Test Passed
Passed SSE2 Test

Testing SSE3
Complex Product using C code 23.00 -2.00i
Complex Product using SSE3 code 23.00 -2.00i
SSE3 Complex Product Computation Test Passed
Passed SSE3 Test

Testing SSSE3
SSSE3 Absolute Value Tests Passed
SSSE3 Arithmetic Tests Passed
SSSE3 Dot Product Test Passed
Passed SSSE3 Test

Testing SSE4.1
SSE4.1 Blend Tests Passed
SSE4.1 Min Max Tests Passed
SSE4.1 Insert Bit Tests Passed
SSE4.1 Extract Bit Tests Passed
SSE4.1 Bitwise Comparison Tests Passed
SSE4.1 Dot Product Test Passed
SSE4.1 Arithmetic Tests Passed
SSE4.1 Bit Conversion Tests Passed
SSE4.1 Bit Compare Test Passed
Passed SSE4.1 Test

Testing SSE4.2
SSE4.2 Bit Compare Test Passed
SSE4.2 Calculate Bit Set to 1 Test Passed
SSE4.2 CRC Test Passed
Passed SSE4.2 Test


MMXSSE Module Success

--- Integrated Memory Controller Test ---
...
Version 1.0.20.64b.W
...

--- Reading Memory Size ---

Detected Memory Size is -- 32.00GB


--- Subtest - Memory Size Test Passed ---


--- Integrated Memory Controller Stress Test ---

Memory to be allocated 1048576 bytes

Memory Allocated.

Test 1 Ones and Zeros Moving Inversions write operations - Passed

Test 1 Ones and Zeros Moving Inversions verification operations - Passed

Test 2 32Bits Sliding Ones write operations - Passed

Test 2 32Bits Sliding Ones verification operations - Passed

Test 3 32Bits Sliding Zero write operations - Passed

Test 3 32Bits Sliding Zero verification operations - Passed

Memory Deallocated.

--- Subtest - Memory Stress Test Passed ---


--- Integrated Memory Controller Test Passed ---
Parallel_PrimeNum
Version - 1.0.0.10

Parsing Parallel_PrimeNum.xml
Running Module GraphicsW.exe -s 45 -resultName GraphicsW_Parallel_PrimeNum_1_Results.txt
Running Module Math_PrimeNum.exe -s 45 -resultName Math_PrimeNum_Parallel_PrimeNum_1_Results.txt

--- Prime Number Generation Test ---
...
Version 1.0.23.64b.W
...

..DetectUtils64 DLL Version - 1.1.3
AVX is supported in your OS
Max AVX supported AVX2

Ops Per Sec CycleRun Error Timesec

5787 3 0 1
5412 6 0 2
5074 9 0 3
3448 11 0 4
5089 14 0 5
1983 16 0 6
4011 20 0 7
2984 23 0 8
1983 25 0 9
2918 28 0 10
1971 30 0 11
2900 33 0 12
1916 35 0 13
1941 37 0 14
1944 39 0 15
35823 43 0 16
27197 46 0 17
34954 50 0 18
27095 53 0 19
27306 56 0 20
36861 60 0 21
25619 63 0 22
36701 67 0 23
27032 70 0 24
35734 74 0 25
27041 77 0 26
36773 81 0 27
27071 84 0 28
35984 88 0 29
27075 91 0 30
36372 95 0 31
27403 98 0 32
35795 102 0 33
26266 105 0 34
27649 108 0 35
35134 112 0 36
27709 115 0 37
27138 118 0 38
35240 122 0 39
27276 125 0 40
131733 145 0 41
147011 167 0 42
147609 189 0 43
148077 211 0 44
157416 234 0 45

Operation Per Second -- 157416
Error -- 0

Prime Number Generation Test Passed

Module Math_PrimeNum.exe Completed - Pass

...
Version 1.0.4.64b.W
...
GL_VERSION 4.6.0 - Build 27.20.100.8681

Module GraphicsW.exe Completed - Pass

Result - Pass
Parallel_FP
Version - 1.0.0.10

Parsing Parallel_FP.xml
Running Module GraphicsW.exe -s 45 -resultName GraphicsW_Parallel_FP_1_Results.txt
Running Module AVX.exe -s 45 -resultName AVX_Parallel_FP_1_Results.txt
Running Module Math_FP.exe -s 45 -resultName Math_FP_Parallel_FP_1_Results.txt

--- Floating Point Test ---
...
Version 1.0.22.64b.W
...

..DetectUtils64 DLL Version - 1.1.3
AVX is supported in your OS
Max AVX supported AVX2
FMA3 supported
MFLOPS CycleRun Error Timesec

0.195 3 0 1
0.325 8 0 2
0.65 18 0 3
0.195 21 0 4
0.26 25 0 5
0.52 33 0 6
0.195 36 0 7
0.65 46 0 8
0.39 52 0 9
0.52 60 0 10
0.39 66 0 11
0.78 78 0 12
0.195 81 0 13
0.52 89 0 14
0.26 93 0 15
0.39 99 0 16
0.455 106 0 17
0.26 110 0 18
0.39 116 0 19
0.26 120 0 20
0.455 127 0 21
0.26 131 0 22
0.455 138 0 23
0.325 143 0 24
0.455 150 0 25
0.195 153 0 26
0.26 157 0 27
0.65 167 0 28
0.39 173 0 29
0.715 184 0 30
0.325 189 0 31
0.455 196 0 32
0.39 202 0 33
0.39 208 0 34
0.325 213 0 35
0.39 219 0 36
0.585 228 0 37
0.26 232 0 38
0.39 238 0 39
0.195 241 0 40
0.455 248 0 41
0.26 252 0 42
0.39 258 0 43
0.26 262 0 44
0.39 268 0 45

Million Floating Points per Second MFLOPS -- 0.39
Error -- 0

Floating Point Test Passed

Module Math_FP.exe Completed - Pass

...
Version 1.0.4.64b.W
...
GL_VERSION 4.6.0 - Build 27.20.100.8681

Module GraphicsW.exe Completed - Pass

--- AVX Test ---
...
Version 2.0.25.64b.W
...
..DetectUtils64 DLL Version - 1.1.3

--- CPU Features Detection ---
..AVX is supported by this CPU..
..AVX2 is supported by this CPU..
..AVX512BW is NOT supported by this CPU..
..AVX512CD is NOT supported by this CPU..
..AVX512DQ is NOT supported by this CPU..
..AVX512ER is NOT supported by this CPU..
..AVX512F is NOT supported by this CPU..
..AVX512IFMA52 is NOT supported by this CPU..
..AVX512PF is NOT supported by this CPU..
..AVX512VBMI is NOT supported by this CPU..
..AVX512VL is NOT supported by this CPU..
..AES is supported by this CPU..
..PCLMULQDQ is supported by this CPU..

..AVX is supported by this Operating System..

Most Advanced AVX Feature Detected.. AVX2

Testing Most Advanced AVX Feature - AVX2....
AVX2 Test Result --- PASS

Testing AES
AES Test Result --- PASS

Testing PCLMULQDQ
PCLMULQDQ Test Result --- PASS


AVX Module Success

Module AVX.exe Completed - Pass

Result - Pass
Parallel_Math
Version - 1.0.0.10

Parsing Parallel_Math.xml
Running Module GraphicsW.exe -s 45 -resultName GraphicsW_Parallel_Math_1_Results.txt
Running Module FMA3.exe -s 45 -resultName FMA3_Parallel_Math_1_Results.txt
Running Module Math_PrimeNum.exe -s 45 -resultName Math_PrimeNum_Parallel_Math_1_Results.txt

--- FMA3 Test ---
...
Version 1.0.23.64b.W
...
..DetectUtils64 DLL Version - 1.1.3

--- CPU Features Detection ---
..FMA3 is supported by this CPU..
..FMA3 is supported by this Operating System..

Testing FMA3....
FMA3 Test Result --- PASS

FMA3 Module Success


--- Prime Number Generation Test ---
...
Version 1.0.23.64b.W
...

..DetectUtils64 DLL Version - 1.1.3
AVX is supported in your OS
Max AVX supported AVX2

Ops Per Sec CycleRun Error Timesec

3281 2 0 1
3984 4 0 2
4242 6 0 3
3666 8 0 4
2083 9 0 5
1822 10 0 6
3518 12 0 7
1597 13 0 8
1743 14 0 9
652 15 0 10
2351 17 0 11
2133 19 0 12
916 20 0 13
1990 22 0 14
1028 23 0 15
20735 25 0 16
19414 27 0 17
9453 28 0 18
13754 30 0 19
18154 32 0 20
22154 34 0 21
7564 35 0 22
22302 37 0 23
19723 39 0 24
19526 41 0 25
5430 42 0 26
17600 44 0 27
8538 45 0 28
19249 47 0 29
19167 49 0 30
20999 51 0 31
10438 52 0 32
19735 54 0 33
19407 56 0 34
7664 57 0 35
14638 59 0 36
20084 61 0 37
8525 62 0 38
16377 64 0 39
16220 66 0 40
37881 70 0 41
74865 76 0 42
67018 82 0 43
68667 87 0 44
79526 93 0 45

Operation Per Second -- 79526
Error -- 0

Prime Number Generation Test Passed

Module FMA3.exe Completed - Pass
Module Math_PrimeNum.exe Completed - Pass

...
Version 1.0.4.64b.W
...
GL_VERSION 4.6.0 - Build 27.20.100.8681

Module GraphicsW.exe Completed - Pass

Result - Pass
Parallel_GPUStressW
Version - 1.0.0.10

Parsing Parallel_GPUStressW.xml
Running Module GPUStressW.exe -s 30 -resultName GPUStressW_Parallel_GPUStressW_1_Results.txt
Running Module AVX.exe -s 30 -resultName AVX_Parallel_GPUStressW_1_Results.txt
Running Module FMA3.exe -s 30 -resultName FMA3_Parallel_GPUStressW_1_Results.txt

--- AVX Test ---
...
Version 2.0.25.64b.W
...
..DetectUtils64 DLL Version - 1.1.3

--- CPU Features Detection ---
..AVX is supported by this CPU..
..AVX2 is supported by this CPU..
..AVX512BW is NOT supported by this CPU..
..AVX512CD is NOT supported by this CPU..
..AVX512DQ is NOT supported by this CPU..
..AVX512ER is NOT supported by this CPU..
..AVX512F is NOT supported by this CPU..
..AVX512IFMA52 is NOT supported by this CPU..
..AVX512PF is NOT supported by this CPU..
..AVX512VBMI is NOT supported by this CPU..
..AVX512VL is NOT supported by this CPU..
..AES is supported by this CPU..
..PCLMULQDQ is supported by this CPU..

..AVX is supported by this Operating System..

Most Advanced AVX Feature Detected.. AVX2

Testing Most Advanced AVX Feature - AVX2....
AVX2 Test Result --- PASS

Testing AES
AES Test Result --- PASS

Testing PCLMULQDQ
PCLMULQDQ Test Result --- PASS


AVX Module Success

Module AVX.exe Completed - Pass

--- FMA3 Test ---
...
Version 1.0.23.64b.W
...
..DetectUtils64 DLL Version - 1.1.3

--- CPU Features Detection ---
..FMA3 is supported by this CPU..
..FMA3 is supported by this Operating System..

Testing FMA3....
FMA3 Test Result --- PASS

FMA3 Module Success

Module FMA3.exe Completed - Pass

...
Version 1.0.15.64b.W
...

IntelR CoreTM i9-10850K CPU 3.60GHz

..Found Brand ID - i9

..Found Gen ID 10
... Loading GPU ...
Platforms 1
0 IntelR OpenCL HD Graphics Selected
Devices 1 filtered by type gpu
0 IntelR UHD Graphics 630 Selected
SUCCESS The process GEMM.exe with PID 8896 has been terminated.

GPUStressW Module Success

Module GPUStressW.exe Completed - Pass

Result - Pass

--- DGEMM Stress Test ---
...
Version 1.0.11.64b.W
...
..DetectUtils64 DLL Version - 1.1.3

--- CPU Features Detection ---
..AVX is supported by this Operating System..

Most Advanced AVX Feature Detected.. AVX2

maxMatrixSize 1024

minMatrixSize 512

Testing Most Advanced AVX Feature - AVX2....
DGEMM AVX2 Test Result --- PASS


DGEMM Module Success

--- Frequency Check ---
...
Version 1.0.3.64b.W
...
..........
..Expected Frequency -- 3.60
..Measured frequency -- 3.59876
..

FrequencyCheck Passed....

-------------------------

 

Interesting note:

When I set the DIMM frequency to 2933MHz, and ran with a single DIMM, the benchmark application did not triggered the hardware event.

When I ran it with two DIMMs at the same frequency - I got the hardware event as before.

I'll continue testing it with linux and / or windows.

I need to get to a stable system.

Please suggest next steps.

Thank you,

ErStrauss

0 Kudos
DeividA_Intel
Moderator
5,197 Views

Hello erstrauss, 



Thanks for the information, based on the previous test this looks like an issue related to the motherboard (RAM slots) or CPU (memory controller hub), but to be completely sure I recommend you try with different hardware:



1. Use a new RAM with your current system or use the same RAM with a different setup.


2. Try a different CPU on your current system or try the same CPU with a known working setup.


3. try with a different motherboard and use the same RAM and CPU that you have.



Also, you can take the computer to a repair store for them to run the previous tests, however, this may not be for free.





Best regards,  


Deivid A.  

Intel Customer Support Technician 


0 Kudos
DeividA_Intel
Moderator
5,102 Views

Hello erstrauss,  


  


Were you able to check the previous post? Please let me know if you need more assistance.   


  


Regards,  


Deivid A.  

Intel Customer Support Technician  


0 Kudos
erstrauss
Beginner
5,024 Views

Hi Deivid,

 

I replaced the CPU with another i9-10850K, I replaced the DIMMs with Crucial 32GB, and got the same errors.

 

dmidecode differences:

--- desktop-dmidecode-20210510-201442.txt 2021-05-10 20:14:42.984843700 -0400
+++ desktop-dmidecode-20210515-214123.txt 2021-05-15 21:41:23.913146200 -0400
@@ -750,24 +750,24 @@
Set: None
Locator: ChannelA-DIMM2
Bank Locator: BANK 1
Type: DDR4
Type Detail: Synchronous
- Speed: 2933 MT/s
- Manufacturer: Corsair
- Serial Number: 00000000
+ Speed: 2666 MT/s
+ Manufacturer: CRUCIAL
+ Serial Number: E3C2A476
Asset Tag: 9876543210
- Part Number: CMK32GX4M2E3200C16
- Rank: 1
- Configured Memory Speed: 2933 MT/s
+ Part Number: BL16G32C16U4B.M16FE
+ Rank: 2
+ Configured Memory Speed: 2666 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Not Specified
- Module Manufacturer ID: Bank 3, Hex 0x9E
+ Module Manufacturer ID: Bank 6, Hex 0x9B
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 16 GB
@@ -820,24 +820,24 @@
Set: None
Locator: ChannelB-DIMM2
Bank Locator: BANK 3
Type: DDR4
Type Detail: Synchronous
- Speed: 2933 MT/s
- Manufacturer: Corsair
- Serial Number: 00000000
+ Speed: 2666 MT/s
+ Manufacturer: CRUCIAL
+ Serial Number: E3C296AF
Asset Tag: 9876543210
- Part Number: CMK32GX4M2E3200C16
- Rank: 1
- Configured Memory Speed: 2933 MT/s
+ Part Number: BL16G32C16U4B.M16FE
+ Rank: 2
+ Configured Memory Speed: 2666 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
Memory Technology: DRAM
Memory Operating Mode Capability: Volatile memory
Firmware Version: Not Specified
- Module Manufacturer ID: Bank 3, Hex 0x9E
+ Module Manufacturer ID: Bank 6, Hex 0x9B
Module Product ID: Unknown
Memory Subsystem Controller Manufacturer ID: Unknown
Memory Subsystem Controller Product ID: Unknown
Non-Volatile Size: None
Volatile Size: 16 GB

 

 

--- The errors:

ay 15 21:51:43 localhost.localdomain mcelog[1007]: Hardware event. This is not a software error.
May 15 21:51:43 localhost.localdomain mcelog[1007]: MCE 0
May 15 21:51:43 localhost.localdomain mcelog[1007]: CPU 9 BANK 0 TSC 25ac7dc38e8
May 15 21:51:43 localhost.localdomain mcelog[1007]: TIME 1621129903 Sat May 15 21:51:43 2021
May 15 21:51:43 localhost.localdomain mcelog[1007]: MCG status:
May 15 21:51:43 localhost.localdomain mcelog[1007]: MCi status:
May 15 21:51:43 localhost.localdomain mcelog[1007]: Corrected error
May 15 21:51:43 localhost.localdomain mcelog[1007]: Error enabled
May 15 21:51:43 localhost.localdomain mcelog[1007]: MCA: Internal parity error
May 15 21:51:43 localhost.localdomain mcelog[1007]: STATUS 9000004000010005 MCGSTATUS 0
May 15 21:51:43 localhost.localdomain mcelog[1007]: MCGCAP c10 APICID 12 SOCKETID 0
May 15 21:51:43 localhost.localdomain mcelog[1007]: MICROCODE e2
May 15 21:51:43 localhost.localdomain mcelog[1007]: CPUID Vendor Intel Family 6 Model 165 Step 5
May 15 21:51:45 localhost.localdomain mcelog[1007]: Hardware event. This is not a software error.
May 15 21:51:45 localhost.localdomain mcelog[1007]: MCE 0
May 15 21:51:45 localhost.localdomain mcelog[1007]: CPU 1 BANK 0 TSC 25c64bda10a
May 15 21:51:45 localhost.localdomain mcelog[1007]: TIME 1621129905 Sat May 15 21:51:45 2021
May 15 21:51:45 localhost.localdomain mcelog[1007]: MCG status:
May 15 21:51:45 localhost.localdomain mcelog[1007]: MCi status:
May 15 21:51:45 localhost.localdomain mcelog[1007]: Corrected error
May 15 21:51:45 localhost.localdomain mcelog[1007]: Error enabled
May 15 21:51:45 localhost.localdomain mcelog[1007]: MCA: Internal parity error
May 15 21:51:45 localhost.localdomain mcelog[1007]: STATUS 9000004000010005 MCGSTATUS 0
May 15 21:51:45 localhost.localdomain mcelog[1007]: MCGCAP c10 APICID 2 SOCKETID 0
May 15 21:51:45 localhost.localdomain mcelog[1007]: MICROCODE e2
May 15 21:51:45 localhost.localdomain mcelog[1007]: CPUID Vendor Intel Family 6 Model 165 Step 5

 

298.395773] mce: [Hardware Error]: Machine check events logged
[ 308.517496] mce: [Hardware Error]: Machine check events logged
[ 382.316402] mce_notify_irq: 2 callbacks suppressed
[ 382.316403] mce: [Hardware Error]: Machine check events logged
[ 543.292837] mce: [Hardware Error]: Machine check events logged
[ 598.703705] mce: [Hardware Error]: Machine check events logged
[ 610.774985] mce: [Hardware Error]: Machine check events logged
[ 627.041481] mce: [Hardware Error]: Machine check events logged
[ 627.231234] show_signal_msg: 118 callbacks suppressed
[ 627.231235] q_bandwidth[15604]: segfault at 10e000 ip 000000000052e000 sp 00007f4ba0a23e30 error 14 in q_bandwidth[402000+14e000]
[ 627.231241] Code: 1f 84 00 00 00 00 00 0f 1f 00 53 4c 8b 4f 08 48 8b 47 10 f0 ff 40 74 0f 1f 00 48 8b 47 10 8b 40 78 85 c0 74 f5 45 31 c0 66 90 <48> 8b 47 10 8b 40 78 83 f8 02 76 24 48 8b 47 10 8b 40 70 85 c0 75
[ 696.587665] mce_notify_irq: 1 callbacks suppressed
[ 696.587667] mce: [Hardware Error]: Machine check events logged
[ 698.511829] mce: [Hardware Error]: Machine check events logged

 

The other parts to replace are the MB, PSU, NVMe.

 

Please suggest

 

Thank you.

ErStrauss


 

0 Kudos
erstrauss
Beginner
4,994 Views

Hi Deivid,

 

Attached are the logs from the system with different CPU and different RAM, running ClearLinux.

Same error reported with and without the NVMe storage ( I booted linux from USB drive ).

 

Thanks

ErStrauss

0 Kudos
DeividA_Intel
Moderator
4,984 Views

Hello erstrauss, 




Thank you for the information provided 


  


I will proceed to check the issue internally and post back soon with more details. For the moment I recommend you get in contact with the motherboard manufacturer to check the status of the motherboard.  


  


Best regards, 


Deivid A.  

Intel Customer Support Technician 


0 Kudos
Reply