I left my Acer Aspire laptop on overnight, running Windows 7 x64. In the morning, all my windows were closed, and I had this message:
Windows has recovered from an unexpected shutdown
Windows can check online for a solution to the problem.
Skip to the end for the solution that worked in my case. Read the following discussion for possibilities that may help in other cases, and also for alerts for red herrings that may just distract from the main issue(s).
Apparently the system had crashed and rebooted. It wasn’t the first time I’d had this problem. Clicking the “Check for solution” button closed the option to view problem details, and then the “checking” window closed without further ado. I ran BlueScreenView (named in honor of the so-called Blue Screen of Death (BSOD) that often accompanied Windows crashes) to see the minidump that, I believed, the Problem Details option would have named. (I was up to date on Control Panel > Windows Update.)
I could have done a System Restore (Control Panel > System > System Protection tab), to roll back to a time before the problem came into existence. That would have been a realistic option if I had jumped on the problem right away. At this point, unfortunately, it had been going on for a while. I could alternately have restored the system to the last good state saved in an Acronis True Image or Macrium Reflect image backup of my drive C, and that continued to be a fail-safe option, but I would still want to know what specifically was causing the problem, so as not to wind up in the same situation all over again; besides, the image restore would wipe out the various installations and fixes I had applied since the date of my last drive C image, and I would have to go through those issues and improvements all over again.
Previous struggles had shown me how to set up the computer so that it would save minidumps. A review of my notes from those struggles now led me to to sort the minidumps, in BlueScreenView, according to crash time. This gave me the entry for the one that had just occurred. But apparently that didn’t matter: all five of the minidumps shown agreed that the crashes were caused by a driver known as iaStorA.sys, with a Bug Check Code of 0x000000d1. (I knew that Bug Check and STOP Codes (a/k/a Stop Error Codes) were synonymous. A search suggested that this Bug Check Code could also be written as 0xd1.)
Microsoft’s Bug Check Code Reference said that Bug Check 0x000000d1 referred to an error called DRIVER_IRQL_NOT_LESS_OR_EQUAL. The detailed description of that error said, “This bug check is usually caused by drivers that have used improper addresses.”
A search of my system indicated that I had two copies of iaStorA.sys. One was in C:\Windows\System32\drivers; the other was in a subdirectory under C:\Windows\System32\DriverStore. I browsed to the former in Windows Explorer > right-click > Properties > Details tab. This gave me a File Description: “Intel Rapid Storage Technology driver,” version 126.96.36.1996. I went to Control Panel > Device Manager > IDE ATA/ATAPI controllers > Intel(R) 8 Series Chipset Family SATA AHCI Controller > right-click > Properties > Driver tab > Driver details. Sure enough, it listed iaStorA.sys and stated a File Version of 188.8.131.526. I clicked OK and chose Update Driver > Search Automatically. It said, “The best driver software for your device is already installed.”
Back in C:\Windows\System32\drivers, I right-clicked on iaStorA.sys and chose “Scan with AVG.” (AVG was my installed antivirus software.) It found no virus. A search led to several indications that the problem was (or at least could be) caused by SpeedFan, and that a workaround would be to add /NOSCSISCAN to the command line. This information was of uncertain value to me, since (a) I did not seem to have a program called SpeedFan installed (and a search for SpeedFan found no files of that name on my system) and (b) I was not sure which command line they were referring to.
Someone suggested — contrary to Device Manager’s assurances — that a newer version of the driver was indeed available. Using a newer version seemed better than other suggestions that I might try rolling back to an older version of the driver. I went to the suggested Intel webpage and downloaded and ran SetupRST.exe version 184.108.40.2061. This was apparently going to be updating a driver that, in the form I had downloaded, was called iRST_Intel_220.127.116.116_W8.1×64. It appeared, in fact, that the version 12.9 file (16.7MB) would actually supersede the similarly sized (16.6MB) version 12.8 file, such that I might have uninstalled version 12.8 (via Control Panel > Programs and Features) before installing version 12.9. I had neglected to check the date and time of iaStorA.sys, presumably installed or updated by version 12.8, but now I saw that it was dated 11/21/2013.
Unfortunately, this did not resolve the problem. The next morning, I had a crash again. BlueScreenView gave the same code. This time, I looked at the Problem Details in that Windows dialog telling me of the “unexpected shutdown.” In addition to naming the .dmp file shown by BlueScreenView, it said BCCode was d1, and it named an .xml file that, it said, would help describe the problem. I opened that file, C:\Users\[username]\AppData\Local\Temp\WER-7536517-0.sysdata.xml. It seemed to contain a list of devices and drivers. It did list iaStorA.sys, dated 11/21/2013, but provided no obvious information about the crash.
I briefly wondered if the crashes were related to attempts at shutdown or hibernation, since they were occurring late at night. But I decided this was probably not the case, since (a) I had seen one while working at the keyboard and (b) upon checking Control Panel > Power Options, I verified that the laptop was set to stay always on when connected to AC power. If BlueScreenView had not been giving me what seemed to be the needed information, a previous post reminded me that I could go to Start > Run > SystemPropertiesAdvanced.exe (or Control Panel > System > Advanced tab) > Startup and Recovery Settings > uncheck Automatically restart. This would hopefully leave the BSOD onscreen, rather than restarting the system, so that I could see whether it presented any other useful information. That post also reminded me that, instead of the smallish minidump files saved in %SystemRoot%\Minidump (i.e., C:\Windows\Minidump), I could have changed that dialog to write debugging information to a much larger Kernel Memory Dump (MEMORY.DMP), perhaps about 1GB, but from my previous post it did not appear that this would be especially helpful. A search confirmed that, with my present settings, I was not getting any MEMORY.DMP file creation.
I went back to the last search (above) that I had tried. One webpage listed there suggested that the problem I was having was “generally a memory error, either physical or virtual, [or] some problem addressing a page file.” The moderator suggesting this, Ulrich, had nearly 90,000 posts to his credit, so I guessed he might be on target. His suggestion gained plausibility when I tried to close the dialog discussed in the previous paragraph, where I could choose minidumps or a Kernel Memory Dump. When I clicked OK to close that dialog, I got this message:
Windows might not be able to record details that could help identify system errors because your current paging file is disabled or less than 1 megabytes. Click OK to return to the Virtual Memory settings window, enable the paging file, and set the size to a value over 1 megabytes, or click Cancel to change your memory dump selection.
My memory dump selection at this point was unchanged from before: small memory dump. All that I had changed was to uncheck the Automatically Restart option. So I clicked OK on this System Properties dialog and went into Control Panel > System > Advanced tab > Performance Settings > Advanced tab > Virtual memory Change. It was set to allow System Managed paging files on drives C and X. It showed 23888MB currently allocated to paging files. These settings had remained unchanged for months. This did not seem to be a problem area, nor did it seem relevant to my current difficulties.
My previous post also suggested trying NirSoft’s MyEventViewer. I checked BlueScreenView, saw that the last minidump had occurred at 2:51:08 AM, and consulted MyEventViewer for events occurring at that time. The only such event was labeled, in MyEventViewer, as a System Information type of event, with an Event Description of “ACPI thermal zone ACPI\ThermalZone\THRM has been enumerated.” A search on that description led to a suggestion that this meant my system had experienced a “thermal trip” (a shutdown?) associated with the CPU, and that perhaps I should install SpeedFan (!). It was possible that the system was being especially taxed in the middle of the night, when I wasn’t even using it: conceivably this would be due to hours of file indexing by my newly updated Copernic Desktop Search, version 4. But it didn’t seem likely that Copernic, by itself, would yield more intensive use than I would be giving the system in midday, with a thousand tabs open in Firefox and so forth. (Exaggerating.)
Unlike me, another user had an informative item appearing in MyEventViewer, immediately before his/her ACPI Thermal Zone entry. That item said this:
Critical: Kernel Power: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Then again, I did have a number of entries in MyEventViewer, in the ten seconds before that 2:51:08 AM ACPI entry, and most of these did have to do with the CPU. My laptop’s Intel Core i7 CPU had only four cores but, according to Speccy, it had eight threads. This could explain why MyEventViewer was listing eight Microsoft-Windows-Kernel-Processor-Power entries in those preceding ten seconds. MyEventViewer had not listed another similar set of eight entries since 5:02:35 AM the previous day. There was no minidump listed at that time in BlueScreenView. But the next set of eight entries before that had occurred five or six hours earlier, at 11:38:53 PM on the next preceding day, and that was, again, nine seconds before the 11:39:02 PM time listed for the second most recent minidump shown in BlueScreenView. In short, it appeared that these BSODs would be immediately preceded by a Microsoft-Windows-Kernel-Processor-Power entry for each of the CPU’s eight threads. So I was wondering if there was a CPU heat or power issue. The only thing that came to mind at this point was that possibly I would not be having this problem if I were running the laptop with the battery installed. I had removed it because they said it was bad for the battery to be constantly maintained at peak charge.
The descriptions of the eight Microsoft-Windows-Kernel-Processor-Power entries all said this (varying only in the processor number):
Processor 0 in group 0 exposes the following:
2 idle state(s)
16 performance state(s)
8 throttle state(s)
A search led to the statement that “This event is typically recorded after a bugcheck crash to indicate the state of the processor(s) when the crashed happened. The bugcheck crash is initiated by the operating system when it detects an anomaly in the behavior of various system-levels drivers.” I was getting precious little from Google searches (Web and Scholar) for “throttle state,” but a Dogpile search led to an Advanced Configuration & Power Interface (ACPI) specification document leading me to other terminology, especially “throttling states” and “T-states” and “dynamic frequency scaling.” A revised search then led to a Wikipedia page indicating that throttling was used (as one might have guessed) to reduce heat or conserve power, the former being consistent with those earlier references (above) to SpeedFan. It seemed that power to the CPU was being scaled back, seconds before a BSOD, in response to overheating — which was, again, an odd thing to happen at 3 AM, unless it was a by-product of some kind of driver malfunction.
According to MyEventViewer (MEV), the ten seconds before the 2:51:08 AM crash also included several other items with the following Source names: BTHUSB, k57nd60a (two occurrences), and MEIx64. Not all of these items included Event Descriptions, but those that did indicated that the Broadcom NetLink (TM) Gigabit Ethernet was now initiated and configured, and that the Intel(R) Management Engine Interface driver had started. These descriptions seemed to suggest that something had shut down the Ethernet and Intel MEI drivers, and that they were now restarting just before the crash. I wondered whether their restarting had somehow placed sudden demands on the CPU, leading to rising levels of throttling and then a crash.
The question in that case might be, what had caused the Ethernet and other drivers to shut down? MEV listed only four other events in the half-hour before the crash, and all four of those occurred at around 2:50:26 AM (i.e., roughly 40 seconds pre-crash). The Source indicated (in MEV) for three of these events was Microsoft-Windows-FilterManager. The fourth, occurring one second before those three, was Microsoft-Windows-Kernel-General. MEV showed the latter (referred to here as MWKG) as an “Information”-type event occurring once or twice a day during the past week or so; not occurring at all in the three weeks before that; and then occurring quite often as an “Error”-type event in the several days before that.
I checked back, again, to the previous crash shown in BlueScreenView, at 11:39:02 PM on the second preceding day, approximately 36 hours before I was writing these notes. A similar pattern appeared in MEV at that point: an ACPI thermal event at the time of the crash, immediately preceded by eight Microsoft-Windows-Kernel-Processor-Power events, two k57nd60a events, and one each of BTHUSB and MEIx64 — all of which were, again, preceded (just a few seconds earlier, in that case) by three FilterManager events preceded, in turn, by a Microsoft-Windows-Kernel-General event. So it did seem that, if I could get to the cause of those MWKG and/or FilterManager events, I might have a solution to the problem.
A search suggested that the MWKG event might be related to system time; another search yielded a hint that the FilterManager event might arise where “Filter Manager failed to attach to volume” (i.e., to a hard drive partition). The fact that there were three such events (and the original implication of an Intel Rapid Storage Technology issue related to a driver named iaStorA.sys) raised the thought that, for some reason, in the middle of the night, the system might suddenly be having difficulty connecting with my three TrueCrypt-encrypted partitions on a hard drive sitting in an external USB-connected dock. The problem in that case could be due to the USB dock’s drivers and/or the laptop’s USB 3.0 drivers. I recalled that there had been some difficulty in locating USB 3.0 drivers for this Acer laptop. Maybe my drivers were not quite up to speed; maybe the MWKG event was triggering recognition of a problem with a USB device. I could give that a rough test by keeping the external USB dock turned off whenever possible. I wasn’t entirely sure of this option, though: I suddenly realized that I had also left a 32GB Patriot USB stick plugged into the laptop for several days, though I believed that the crashes had been occurring before that. Attempts to update drivers for these devices (via Control Panel > Device Manager > right-click on the drive or drive controller > Properties > Driver tab > Update Driver) were unsucccessful.
I was already in the habit of keeping my drives pretty well defragmented, and I ran frequent disk checks, so the hard drives per se did not seem likely to be an issue. Likewise on the antivirus front: I did not seem to be having problems with AVG, and I had just run a Malwarebytes scan. But there were some other possible solutions. I ran the built-in Windows Memory Diagnostic (Start > Run > mdsched.exe) and set it to check RAM next time the system rebooted. Also, I had been having some display issues. One post implicated display drivers in crashes involving FilterManager. Speccy said that the laptop was using NVIDIA graphics, so I used NVIDIA’s scanner (though I could just as well, and probably more safely, used their manual search option) to find, download, and do a clean installation of the latest NVIDIA driver. (Note: the first time I tried this installation, it shut down all other programs and rebooted the system without warning.)
Another post suggested that a person having problems somewhat comparable to mine might be experiencing short spikes in temperatures that a BIOS upgrade would fix. This suggestion drew corroboration from an unexpected source. In response to the suggestion that I might be having a power supply unit (PSU) issue, I ran OCCT. Its PSU test evidently included a CPU test, with a default ceiling of 85°C. The PSU test terminated with a warning: “Core #0 over maximum value! Value Reached: 86, Max Value: 85.” Upon termination, OCCT opened a Windows Explorer session with nine PNG image files that it had created. I browsed those files. The CPU Usage graph indicated that the CPU had reached 100% usage in less than 1.5 minutes. At about the same time, it had maxed out its memory usage, somewhere above 10GB (on a 12GB system). Not only did Core #0 exceed the 85° mark at just over two minutes; it kept on climbing to a final value (at the three-minute mark) of just over 90°. The three other CPU cores were not far behind. Unfortunately, at this point I already had the latest Acer BIOS (v. 1.13), so I was not sure what to do with these values. I would have had to research the question of whether it was normal for the system to crap out under an OCCT stress test, or whether this was instead the sign of something bad.
I decided to treat it as something bad, without any research, because — after all of the foregoing precautions — I had another BSOD within the next few hours, middle of the day. As almost always happened, I was away from the machine, suggesting that there was something about the idle state that was causing this. My computer was like a marginal employee, troublesome if not kept busy. Like my kind of employee, but that’s another matter.
I had just two more arrows in my quiver: either roll back the Intel Rapid Storage Technology (RST) driver to something earlier than version 18.104.22.1686 (above), or restore a drive image from a month earlier and see if I could install my new programs and tweaks and updates one at a time until I identified the source of the problem — if, indeed, the RST driver wasn’t the issue. There was another possibility, that some of the Windows 8.1 drivers Acer had made available for my machine were ultimately just useless for Windows 7. That was conceivable, but we did not want to go there. Another possibility emerged around this time: someone said that I should also go into Control Panel > Power Options > Change plan settings > Change advanced power settings > Hard disk > change to Never turn off when plugged in. I did that. (The “Never” option was below the 5-minute option.) It didn’t seem likely that this was the issue — the previous setting called for a hard drive shutdown after only a half-hour of inactivity, and that happened quite frequently, whereas these BSODs were only happening once or sometimes twice a day — but it was a possibility.
One post said there were shutdown problems, not obviously related to my situation, in RST 22.214.171.1246, and recommended using 126.96.36.1993 instead. Seemed like a reliable rumor to me. A search led to an Intel RST downloads page, on which the version immediately preceding 188.8.131.526 was 184.108.40.2066. It wasn’t ancient — it was only 10 months old at this point — but I wondered whether this meant that Intel had removed versions 12.6 and 12.7 because they were buggy, or whether maybe my rumor source was unreliable after all. I downloaded 220.127.116.116 (16.3MB) and ran a quick search to double-check. Sure enough, 18.104.22.1683 was available from a number of sites. I downloaded copies from Lenovo (11.3MB), Dell (16.0MB), and ASRock (15.9MB). Their divergent sizes suggested that I’d be taking my chances with anything other than the official Intel 22.214.171.1246 download, so I uninstalled 126.96.36.1991 (in Control Panel > Programs and Features) and installed 188.8.131.526. After the necessary reboots, I went into Control Panel > Windows Update. It found no updates. The version of iaStorA.sys in C:\Windows\System32\drivers was now dated 3/22/2013.
So now it was just a matter of waiting, to see if I would still get a BSOD. For purposes of this test, I left the external USB drive turned on and connected after all, so as to focus on the change of drivers. A week later, there had been no more crashes.
It seemed, at this point, that I had found a solution. Precisely what that solution was, I could not be sure. But it appeared to be either (a) roll back the Intel Rapid Storage Technology driver to something in the vicinity of 184.108.40.2066 or (b) change the hard drive power settings to never turn off when the laptop was plugged in.