An Intel 530-Series mSATA SSD 240GB is installed in an Intel DQ77MK motherboard mounted in an Antec Sonata-II tower case. It has FreeBSD 10.1-Release (amd64) installed on it.
The SSD will randomly disconnect resulting in a series of ahci timeout messages and ultimately a kernel panic.
Usually, if not always, on reboot the SSD is no longer visible in the BIOS until the system is power-cycled. (One small variation here is that if the system is powered down before FreeBSD panics, the device boot order is retained in the BIOS - there are other non-SSD SATA disks attached - but if the system is allowed to panic and reboot then the device boot order is lost - the SSD is no longer listed as the first boot device).
Initially in an attempt to mitigate this problem the SATA channel for the SSD was configured to reduced speed operation in FreeBSD, (in /boot/loader.conf):
(kernel log, /var/log/messages):
kernel: ada2 at ahcich4 bus 0 scbus5 target 0 lun 0
kernel: ada2: ATA-9 SATA 3.x device
kernel: ada2: Serial Number CVDA414203E8240M
kernel: ada2: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
kernel: ada2: Command Queueing enabled
kernel: ada2: 228936MB (468862128 512 byte sectors: 16H 63S/T 16383C)
kernel: ada2: Previously was known as ad12
This is still the current setting, but the issue persists.
The system was also booted into linux from a DVD. Creating an "EXT3" filesystem on the SSD and writing to it was similarly unsuccessful.
Since finding the other reports of problems with this device I have taken a look at the temperatures using 'smartctl'. It seems there may indeed be the overheating issue that has been identified for this device, as evidenced from this excerpt:
# smartctl -l scttemp /dev/ada2
Current Temperature: 34 Celsius
Power Cycle Min/Max Temperature: -20/67 Celsius
Lifetime Min/Max Temperature: -20/76 Celsius
Under/Over Temperature Limit Count: 0/0
The SSD sits in the exhaust airflow from the graphics card heatsink/fan. A thermometer showed this exhaust air to be around 45C. The SSD probably does not benefit greatly from the existing case cooling - PSU exhaust fan and 120mm case rear exhaust fan - due to its position within the case and its position and orientation on the motherboard (lies parallel and close to board, air crossflow is low due to large internal volume of case and is also significantly occluded to the SSD by SATA connectors etc.).
I have now installed (loosely sat) an 80mm fan to blow air from the bottom of the case directly onto the SSD. From occasional checking of the "Current" temperature using 'smartctl', it seems to be effectively reducing the average temperature of the SSD. (For some reason the temperature history log as displayed by the 'smartctl -l scttemp' command doesn't seem to now be updating reliably, however the last four temperatures shown in the attached ada2_smartctl_scttemp.log were recorded after the additional fan was installed and seem to reflect the success of the additional cooling).
Despite the additional cooling this problem persists - it recurred about 24 hours after installing the fan and there were a succession of faults today triggered repeatedly by the same action. That action was using the FreeBSD 'pkg' command to update/install a particular port which required a ~38MB download. Each attempt to run the pkg command to install that port resulted in the SSD disconnecting before the 'fetch' of the pkg file completed - perhaps between 50-80% completion. After 3 consecutive instances of repeated failure, the 'pkg fetch' command was used to write the downloaded pkg file to a different partition (to the /usr partition rather than the default /var partition), but the SSD again disconnected. The port was ultimately successfully installed by downloading the pkg file to a non-SSD drive and completing the installation from there.
The most curious aspect of this scenario is that each download attempt of the ~38MB file occurred at ~150kB/s with a correspondingly low average write rate to the SSD. In the final, successful, install of the pkg file, over 200MB of files were written to the SSD, very rapidly, but the drive did not fault. Besides which many dozens of other port installs have been completed without incident.
One final point of interest is that prior to the acquisition of the Intel SSD, a KingSpec 32GB device was installed and exhibited essentially the same symptoms, although probably much more rapidly. At the time I put this down to it being a poor quality product, but the experience with the Intel device perhaps suggests that something else is at play?
I have a SATA->mSATA adapter on order and will try the SSD with that to see if eliminating the mSATA port provides any improvement. In the meantime is there anything else I can do to validate the condition of the SSD or resolve this problem?
We are going to try to recreate this behavior. Please answer the questions below.
1 - When the drive is working properly (still seen by the OS), can you put the system to sleep or restart it and have the drive be seen on wake from sleep or restart finishes?
2 - What Bios version do you have installed?
1. "Sleep" mode isn't used for this system. So long as the SSD has not "disconnected" then on any kind of restart (O/S reboot, hard reset, power removed) the drive remains visible after restart finishes.
2. BIOS version is 0067, see below excerpt from FreeBSD "kenv" command.
Another incident yesterday.
Approx 206 hours continuous uptime elapsed since the previous incident, it occurred between approx 13:30 and 14:30 local time. The PC was not being actively used at the time. The desktop (gnome3) applications running were Chromium browser and Evolution email. It is not clear that there should have been any particularly large writes to the SSD, however this is difficult to assess due to potential background activity particularly of gnome3.
Also, not surprisingly, the "pkg" command that I originally reported as triggering the SSD disconnects has since been tried again twice and did not trigger the disconnect. In all probability that was simply a fortuitous coincidence.
Did you ever get this solved?
I have a similar issue, but my issue is with a regular 2.5" SSD installed in the system. It appears that only the system disk loses the connectivity, then the system will restart and displays "non-system disk or disk error". If I power cycle the system it works fine. In my case I'm running Windows 2012 R2, and I'm booting in BIOS mode (not UEFI).
Was wondering if you were able to get this resolved... I know this was from a while ago but I've been struggling with this issue for a very long time.