Dear Intel community!
I have got multiple clients that reports boot issues with NUC 11TNKI5000 devices that stalls at boot and displays the Grub shell.
The issue is that about 1 in 1000 attempts ends up in the grub prompt while booting.
That may sound rather ok but since the computers are running unmanned kiosks this must work.
Sometimes this issue happens after 3 boot attempts, sometimes after 800 attempts.
Computer is updated with the lates Bios FW v67 released recently. Issue was there before that.
The boot strap is using ubuntu grub file from here:
I have investigated network traffic with the help of wireshark and managed to record a failed boot attempt. What happens is that the grubnetx64 file is fetched, all blocks ok but the computer never even requests the grub.cfg file from the tftp server.
It is hard to troubleshoot since I have not any knowledge of whats is going on and how grub is handling the requests. I assume first request is made by UEFI firmware and second should by done but some tftp client inside grub?
I use a test rig that boots via WakeonLan and as soon as the OS has booted we reboot automatically from an external control system that records boot time and attempts.
As a reference we use a NUC8i5BEK2 and that computer never fails.
Thank you for posting on the Intel® communities.
In order to review this further, could you please confirm the following details?
1- How many Intel® NUCs do you have/manage, and how many are experiencing this behavior?
2- Are you using IPv4 or IPv6?
3- Did the Intel® NUCs work before without experiencing this behavior? Or is this happening since the first day that the Intel® NUCs are deployed in their locations?
4- When you said "the devices stalls at boot and displays the Grub shell", what does it happen after that? Do they resume the boot process after a while? or do they freeze/hang? Do they need to be rebooted manually?
Intel Customer Support Technician
1.) We sell a type of visitor experience solution (think digital signage with stronger interactive support) <removed>
In general we recommend Inter NUC series as the first choice but the platform will work on any x86 architechture. That means that there are devices out there in the hundreds but it also mean if the issue cannot be addressed we must change our recommendations and steer away from NUC gen 11.
The reason this issue has been discovered is the sheer quantyties in the wild using PXE boot. And it is very easy to replicate but since it is at such a low level it is probably very hard for someone outside Intel to work out what is going wrong.
I currently have three devices at office, all showing the same behaviour.
3.) The ones i have at our lab is brand new out of the box and thay have shown this behaviour in at least the two latest F/W revisions.
There is a large scale installation over multiple venues in Netherlands where all their new devices show this behaviour suggesting this may be a pretty general issue.
4.) I have attached an image showing the halted grub (the one ending 075719.jpg). . So we end up in the GRUB shell. Shell is working just as it should with the exemption that i cannot fetch the grub.cfg from the tftp server.
Also attached the response if i type "set" to show grubs current configuration. (080309 and 080246)
If i at this stage type: configfile $prefix/grub.cfg we would on a working system move on and boot the system with the settings provided in the grub.cfg but on a failing device we will get a black screen for about two seconds and return back to the grub shell.
If i hook up wireshark to the tftp server I can conform that on a failing device, no attempt to fetch grub.cfg i made. On a working device, we would see a fetch of grub.cfg.
My theory is that something with the network fails when grup is executed. We always see a successful fetch of the grub NPM package, but after executing it we seem to loose the ability to fetch over tftp and http.
Also attach one video where manually fetching the grug.cfg on a working NUC (video 083122) and one trying after the NUC failed to shell. (video 083145).
Let me know if you would want me to test something in grub shell paerhaps to get some lead in what is going wrong.
I appreciate that this may be some grub bug, but on the other hand, it is very low level and I would assume there are some sort of "standard" how the boot procedure works under the hood in terms of hardware access.
To be able to reproduce on your side, you would want a pxe boot inviroment and a OS that can boot, ideally Ubuntu or something then just repeat the boot procedure. Sometimes it happens after 5 cold boots, sometimes after 1000 boots.
If you need I can arrange for our mini kiosk OS and a cloud server that could help with getting boot a OS and the ability to set up a counter and something that can reboot the computer automatically as soon as is up. Just send me an email for such setup.
I can also report that i have seen the same issue also on a NUC11PAHi5000 after round about 20 reboots. (I suppose they are very simular)
Thank you very much for your response and for the details.
We understand that you wanted to share with us some images and videos to illustrate better the scenario. However, it seems we are missing them in the thread.
Just in case for clarification purposes, you need to manually upload the files directly in the thread. To upload and attach a file, use the "Drag and drop here or browse files to attach" option below the response textbox.
In case the files are not uploading or if there is a problem with the "Drag and drop" option, we have also sent you an email to the email address associated with your profile, so you can reply to our email and attach the images/videos or a "download share link".
Intel Customer Support Technician
We were able to download the images and videos from both the private email and here in the thread. Thank you for your efforts in this matter.
We will proceed to review this further and once we have more information available we will be posting back in the thread.
Additionally, we will send you another private email just to confirm a couple of details about your company.
Intel Customer Support Technician
We are currently looking into this issue, and I have a couple of observations, please see below:
- Due to the nature of this issue and since it is also involving a large number of units, I would recommend that you open a support ticket directly with Intel Technical Support via phone, chat or web ticketing. Issue resolution may require that you provide us with log files, screenshots and other information that may contain private information that shouldn’t be shared openly on the Community Portal. Also, if warranty replacement is needed or in-house issue debugging is required, we would also need a direct support ticket to be generated with Intel Technical Support in order to process any of these requests. If you agree with this, please refer to the following URL for further details: https://www.intel.com/content/www/us/en/support/contact-intel.html#support-intel-products_98414:98414
- Please provide screenshots or log files from Wireshark* that may help clarify the issue that you described.
- Do you happen to know if any previous BIOS version also exhibited this issue?
Please let me If you open a ticket directly with Intel Technical Support and I will transfer all information you provided in this post and will also close this one up to avoid duplication.
Thanks, I have now been in contact with Intel over the phone as per your suggestion so there is a ticket for this issue. Since they now got the link to this thread no additional transfer of data from here should be necessary at this moment. I should try to keep this thread posted if any news is received.
We got your new support request and will be working on that directly with you.
This community post will not be monitored by Intel Technical Support.