Recovering Data from a Messed-Up Hard Drive

There are many different scenarios for losing data stored on a computer drive. This post explores one set of possibilities.

Creating the Crisis

I had an old hard drive, containing both Windows program files and data in one partition. I decided to divide this drive into two partitions, called PROGRAMS and DATA. The idea was that the partitioning program would move all of the existing material over to one side of the drive, in the new PROGRAMS partition, and there would be this whole new empty DATA partition. Then I would move the data files from the PROGRAMS partition to the new DATA partition.

It wasn’t a bad plan, except (1) I should have backed up the drive before attempting the partition and (2) I used GParted, a Linux tool, for the partitioning, and (3) I attempted to make the PROGRAMS partition as small as possible, reasoning that I would be moving data out of there and leaving a lot of empty space. To review those problems in slightly more detail, the backup wasn’t essential — this was old data — but then again I did not actually intend to lose it entirely. GParted was normally a decent tool, but in this case it messed up, and that dimly reminded me that I might have had a problem with GParted once before. Ordinarily I preferred to use something like MiniTool Partition Wizard, but in this case I was being lazy — I did not have a bootable USB drive handy that would run MiniTool on this machine.

I wasn’t sure whether it was GParted, or the decision to set the resulting PROGRAMS partition to the minimum size, or perhaps something else. Whatever it was, it completely screwed up the drive. I flailed around for a while, but it appeared the drive was trashed. The Testdisk program (in Ubuntu, run from a customized bootable USB) said, “The following partitions can’t be recovered,” and I got something similar from other tools. If anything, these random efforts only made things worse: I believe that, at one point, I told Testdisk to try rewriting the boot record, or something like that. I flailed around for a while longer, and then finally decided to start over and proceed one step at a time. That’s when I began writing this post.

Setting Up an Image

I decided that the first thing I should do was to make an image of the messed-up drive. That would accomplish two things: it would give me a backup, however belated, and it would let me experiment on a the backup, or a copy of the backup, without any more tinkering with the original drive. To make the image, I used my trusty old copy of Acronis True Image Home 2011. I ran it from a bootable CD, so I didn’t need the imaged drive to be bootable. There were other imaging programs that might have permitted the same steps; I just wasn’t familiar with most of them. (Note: there were reports that newer versions of Acronis did not display the quality that I had found during these years of using the 2011 version.)

There were two features of Acronis that I thought might be useful for this effort. First, I could make a sector-by-sector image. The idea here was that Acronis would ordinarily just copy and compress the files (including hidden files) that a user would be able to see, if s/he were browsing in something like Windows Explorer; but if I opted for the sector-by-sector option, Acronis would copy everything on the drive, including disk sectors that would seem to contain nothing but empty space. I had not used this feature often, and wasn’t entirely sure what the image might include; but when it was done, I saw that the 256GB drive was compressed into a 112GB image. It seemed the image must contain more than the 65GB that GParted had identified, before commencing the process that screwed everything up. At any rate, it seemed that the Acronis image had probably captured everything that I would care about, and that I was probably safe in working with the image in lieu of the original hard drive.

The other Acronis feature that seemed potentially useful was the ability to mount an image as a sort of virtual drive. To use this, I brought the resulting drive image over to a Windows computer where I had installed Acronis. I figured I could tell Acronis to mount a copy of the 112GB image on that Windows machine, and it would set it up as though it were a separate drive. Since I was mounting a copy, I could make changes, see what happened, and if necessary trash that copy and try again.

When making the image, I had created a single image containing both the PROGRAMS partition and the DATA partition. But now, on the Windows machine, when I told Acronis to look at them, it assigned them two separate drive letters. It put the PROGRAMS partition on drive K, and it made the DATA partition drive L. Since PROGRAMS was the screwed-up one, Windows popped up a message offering to format it, as soon as Acronis mounted it. I said no thank you, I didn’t want Windows complicating things. In Windows Explorer, clicking on drive L yielded no problems — there didn’t seem to be anything in the drive, other than a System Volume Information folder — but clicking on drive K produced an error message: “K:\ is not accessible. The file or directory is corrupted and unreadable.”

Recovering the Partition from a Copy of the Image

Now that I had a copy of the drive image mounted in Acronis, it seemed I should be able to run suitable Windows-based data recovery programs on the mounted virtual drives. This meant using something more cautious and less intimidating (not to mention more colorful) than powerful, one-command-wrecks-everything tools like Testdisk and CHKDSK.

I got a quick reminder of the risks inherent in those tools when I ran CHKDSK on the mounted image. Running “CHKDSK K: /r” resulted in indications that CHKDSK was deleting many “orphan file record” segments, followed by failed attempts to recover other orphaned files because there was insufficient disk space. (I suspected the problem of insufficient space was due to the fact that I had made the PROGRAMS drive as small as possible. To fix that (and to prevent CHKDSK from aborting altogether, due to “Insufficient disk space to fix the attribution definition table,” it seemed I would have to use a partitioning program to resize the PROGRAMS partition, either before making the image or after restoring the image to an actual (not virtual) drive partition.)

After I saw CHKDSK make all those changes, I was going to unmount drive K (i.e., the virtual version of the PROGRAMS partition) by right-clicking on it in Windows Explorer and choosing the Unmount option that had been added to that context menu when I installed Acronis in Windows. But suddenly drive K wasn’t there anymore. I re-mounted it and re-ran the CHKDSK procedure, just to verify that it ran the same way the second time (i.e., that there had not been any changes to the Acronis image of the PROGRAMS partition). I had noticed that Acronis gave me an option to mount images in read-write mode, but I had not selected that. It seemed that selecting that option might have enabled CHKDSK to alter the actual Acronis image.

So what I was looking for, from a recovery program, was not only something that would at least seem more cautious than CHKDSK. If possible, I would also want to start with a tool that could recover folders, not merely files: I would rather have the drive back the way it was, more or less, with each file in its proper folder. I didn’t want to have to sort manually among thousands of randomly recovered files, including everything from Windows program files to documents to pictures, and try to imagine where they would have belonged.

MiniTool Partition Recovery

While reading about these matters, I encountered recommendations for a number of free data recovery programs. TopTenReviews seemed to indicate that “partition recovery” (which was apparently what I wanted) was a common feature in many such programs. A search led to several (1 2 3 4) indications that MiniTool products were among the best for this purpose. The MiniTool situation was a little confusing, though. On one hand, there was something called MiniTool Partition Recovery. It seemed this program had previously been limited to recovering a maximum of 2GB, but at this point it appeared to be simply free for personal use. One MiniTool page said this program was different from MiniTool Partition Wizard, but another MiniTool page said it was part of MiniTool Partition Manager. My copy of MiniTool Partition Wizard Free (version 9.1) did offer a Partition Recovery Wizard option. But when I ran Partition Wizard, I got a pop-up ad offering me a chance to download a “powerful data recovery tool” that would allegedly “recover all lost files from Windows.” This appeared to be a reference to MiniTool Power Data Recovery, offered for sale on a separate website for $69 to $89.

With the PROGRAMS and DATA partitions mounted in Acronis — this time, for some reason, as drives M and N — I started my installed copy of MiniTool Partition Wizard Free. It showed that drives K and L were still mounted, even though they no longer appeared in Windows Explorer, even after an F5 refresh. Apparently Acronis had not fully unmounted them. That seemed to explain why remounting had put those drives at M and N, instead of K and L.

Within Partition Wizard, I tried using the Partition Recovery Wizard. It told me to choose a disk that I would like to recover. I selected drive M, and told it to do a Quick Scan on the entire partition. It immediately found a lost NTFS partition. I checked the box next to that partition and clicked Finish. The Wizard disappeared without further ado. Now what? I went back to Windows Explorer. Nothing had changed there: clicking on drive M still gave me that “not accessible” error message.

It occurred to me that maybe the Partition Recovery Wizard would have made a visible change in the PROGRAMS partition, if I had enabled that read-write option in Acronis when I mounted it. I decided to try that. But first, it seemed I had better clean up the confusing situation where there seemed to be multiple mounted copies of that partition. In Windows Explorer, I unmounted drives M and N. I ran Windows Task Manager (Start > Run (or WinKey-R) > taskmgr.exe) > Processes tab. There, I sorted by clicking on the Image Name column heading, and also on the Description column heading. As far as I could see, Acronis was no longer running. And yet, after closing down and restarting Partition Wizard, it still showed that drives K and L were in use.

To get past that confusing state of affairs, I rebooted the system. Then I tried again. To clarify, I was using MiniTool Partition Recovery Wizard, found inside MiniTool Partition Wizard Free. I was using it on a copy of an Acronis sector-by-sector image of the corrupted hard drive. When I mounted that image in Acronis, running on Windows, I mounted its images of both partitions C (PROGRAMS) and D (DATA) from that corrupted drive. And when I told Acronis to mount them, this time, I also clicked the box that said, “Mount the partitions in the read-write mode.” This time, they were mounted as drives J and K. Windows Explorer recognized that drive K was named DATA, but (as before) it didn’t call the other one PROGRAMS: it just called it “Local Disk.” When Acronis mounted drive J (Local Disk), as before, it popped up a message saying, “You need to format the disk in drive J: before you can use it.” And as before, I clicked Cancel rather than Format Disk in response. This time, when I mounted the drives, Acronis created what looked like a partial copy of the image being mounted. That may have been because I checked the read-write box.

So, OK, I was in MiniTool Partition Recovery Wizard. I selected the disk corresponding to Local Disk J and clicked Next. As before, I told it to do a Quick Scan of the Full Disk. As before, it quickly found an NTFS partition. I selected that partition and clicked Finish. As before, Partition Recovery Wizard just disappeared. Back in Windows Explorer, nothing seemed to have changed: I still couldn’t view the contents of drive J. I tried again, this time telling the Recovery Wizard to do a Full Scan rather than a Quick Scan. That took maybe five minutes. It didn’t find anything else. I double-clicked on the identified NTFS partition. There was no data. This appeared to be what Testdisk meant when it said the partition could not be recovered. Partition Recovery Wizard was vanishing after I clicked Finish, apparently, because its work was completed: it had done nothing, and that was all it could do.

Verifying That the Files Exist

At this point, it occurred to me that I was assuming there were files to be found, somewhere in that backup image. But what if there weren’t — what if everything in that drive was completely fubar? I decided that I was not yet ready to recover a boatload of randomly assorted files, but at least I could make sure the files were there, just waiting to be restored in their original folder arrangement.

This called for a file recovery program. Various sources (e.g., Raymond, FOSSbytes, Lifehacker, GFI) named Recuva as one of their favorites. (Others included PC Inspector File Recovery, Puran File Recovery, LazeSoft Data Recovery, Wise Data Recovery, TestDisk, Undelete 360, PhotoRec, Pandora Recovery, SoftPerfect File Recovery, FreeUndelete, Glary Undelete, Restoration, and Undelete Plus.) I had used Recuva in the past, with good results, and decided to try it again.

I mounted the Acronis image, with its PROGRAMS and DATA partitions, checking the Acronis checkbox that said, “Mount the partitions in the read-write mode.” I did not expect to be making any changes in the image — if I recovered anything, I would put it somewhere else — but later I might. My system felt that drive letter I: was taken, at this point, so Acronis mounted them as drives H (Local Disk, formerly PROGRAMS) and J (DATA). I started Recuva, skipped its opening wizard, pointed it at drive H, and clicked its Scan button. It immediately indicated that it had found no files, and asked if I wanted to run its Deep Scan mode. I said Yes. Five minutes later, it was estimating that the job would take 20 minutes.

To speed up this and future explorations, I cancelled that Recuva process, unmounted the image, copied it over to a solid state drive (SSD), and tried again. Deep Scan on drive H took only … 23 minutes. Huh. It seemed that, for some reason, the SSD didn’t really make the process much faster than it was going to be on the hard disk drive (HDD).

Recuva indicated that it had found about 73,000 files on drive H. So the highly regarded Testdisk and MiniTool programs were wrong, when they found nothing to recover within the existing PROGRAMS partition. Files were still there, in that image of a corrupted partition. Moreover, it looked like the vast majority of those files were flagged with Recuva’s green dot, indicating recoverability.

When sorted on filename, it appeared that most of the files identified by Recuva had potentially recognizable names (e.g., permaculture.doc, as distinct from $BackupData$). I wasn’t familiar with the contents of the data before I messed up the drive, so for all I knew those (even $BackupData$) were the original names of those files.

Well, since I was here, I thought I might as well recover copies of the files that were recoverable; that might be all I was ultimately able to get from this drive or these images. I clicked the top left box to select everything, right-clicked on one of the listed files, and chose Recover Checked. I put them into a new folder on the SSD, reasoning that this would surely be faster than putting them on an HDD somewhere. But — what’s this? — an error message indicating that there was only 96GB available, and the process needed 118GB. OK, the HDD then. But I would want to see how the original ~60GB had nearly doubled itself.

Recuva proceeded only a few seconds before running into a problem. I had seen that many of the recoverable files had the same names. That hadn’t been a problem originally: they had been in different folders, so it was quite natural that there would be more than one file named readme.txt, somewhere on the disk. Now, though, with them all going into one folder, there would be name conflicts. Recuva presented the option of keeping both files, but just renaming one of them so there would be no conflict. I went with that, assuming that the files would be renamed as something like readme1.txt, readme2.txt, and so forth. With that taken care of, Recuva proceeded to recover the files. It originally estimated this would take several hours, but in fact it took about 50 minutes, and it recovered 68,075 files from this PROGRAMS partition. It was unable to recover another 4,000 or so. I looked at a few of the files and saw that their contents really were there: I was seeing pictures and documents.

I repeated the Recuva process on the Acronis partition image mounted as drive J (DATA). By the time it was finished, a half-hour later, the Deep Scan had found a total of 363 files. I recovered copies of these to another folder on a HDD, and then unmounted the Acronis images running in drives H and J. Altogether, these efforts on the PROGRAMS and DATA partitions had recovered a total of 68,438 files.

Eliminating the Dud

Those Recuva results gave me a theory for what had happened, when I had tried to use GParted to put all of the material on the original PROGRAMS partition into the smallest possible space. It seemed that GParted might have been overly ambitious: maybe it had misunderstood the Windows file structure, and as a result had tried to make the PROGRAMS partition smaller than it could really be without losing data.

In that theory, it seemed that GParted had succeeded with most of the files, but some were left outside the excessively shrunk PROGRAMS partition, and maybe some others were left halfway in, halfway out. Then, when I created a new DATA partition to fill the supposedly empty space, those files or file parts that didn’t make it inside the PROGRAMS corral became property of the DATA partition instead. Maybe what CHKDSK was seeing as orphans were actually complete files, but CHKDSK could only see the parts of those files that lay inside the overly small PROGRAMS perimeter.

This theory suggested that CHKDSK might not find orphans, and Recuva might find more recoverable files, if I deleted the later-made DATA partition, or perhaps after I followed up by enlarging the PROGRAMS partition to fill the disk, covering the space formerly contained within the DATA partition. Then everything that could be recovered would be inside the PROGRAMS partition. Maybe then CHKDSK would not struggle due to insufficient disk space.

To explore this possibility, I rebooted the computer with an Acronis USB drive, and restored the PROGRAMS and DATA partitions to an unused HDD. So now I wouldn’t have to run my experiments on mounted Acronis images; I could fiddle with real HDD partitions. If Acronis had indeed made and restored a sector-by-sector image of the original corrupted hard drive, then this test drive should contain the same data arrangement as the original drive.

Then I rebooted into Windows. On this test HDD, I started by using MiniTool Partition Wizard Free to try to resize the PROGRAMS partition. But it wouldn’t do it. It said I had to run CHKDSK first. So now, the test. I used MiniTool to delete the DATA partition, and then tried again to make the PROGRAMS partition larger, to fill the disk. Sadly, still no luck: Partition Wizard said, once again,

Failed to execute the following command: ResizePartition . . . . NTFS file system error. Please use “Check File System” function to fix it first.

So, OK, the file system problem was baked into the PROGRAMS partition, and it had to be fixed within that partition. Well, at least there was now a lot of empty space lying next to the PROGRAMS partition; maybe that would help somehow. In Partition Wizard, I right-clicked on the PROGRAMS partition, mounted as drive H, and chose Check File System > Check & fix detected errors. The dialog said, CHKDSK is verifying files. Windows Task Manager (Start > taskmgr.exe > Processes tab) confirmed that CHKDSK was indeed running. CHKDSK proceeded to delete 451 orphan file record segments, along with various other items; it also fixed various items. The dialog didn’t disappear when it was done. I wasn’t sure what to do, so I clicked Start again. It fixed a few more problems. I ran it again. It seemed to be repeating itself. I clicked Cancel.

Back in the main Partition Wizard window, I tried resizing drive H again, to fill the disk. This time it worked. Now Windows Explorer was able to see the contents of the partition, and reported that there was nothing there except the System Volume Information folder and an empty found.000 folder. I renamed the partition to be TEST.

At this point, I realized that I might have made a mistake by restoring the original image to this particular TEST HDD, because it was a 1TB drive, and now I was planning to scan the whole drive with Recuva. This would be a mistake because (a) it could take Recuva a long time to scan that large partition and (b) I wasn’t entirely sure what had been on the drive previously. I thought it was previously wiped, so there shouldn’t be anything on it other than data from this present experiment — but I wasn’t sure.

So, OK, I ran Recuva again, this time examining the entire 1TB TEST drive. My worries about the large HDD seemed unfounded: as before, Recuva’s Deep Scan identified about 73,000 files, and it finished that scan in 25 minutes. Now, the question: would Recuva be able to recover more than 68,438 of those files, now that the DATA partition was gone?

It took a while to get the answer to that question. For some reason, Recuva began running into problems here. It kept crapping out, complaining that “The device is not ready.” It was restoring several thousand files, but arbitrarily skipping over the vast majority. I tried specifying a couple different target drives, but still got the same result. A search yielded many suggestions, but no real explanation for this odd behavior. Some of the solutions I saw involved changing the Windows power plan (e.g., Control Panel > Power Options > Change plan settings > Change advanced power settings > Hard disk), but that was not an issue here. Possible fixes included simply rebooting and trying again, or rebooting into Safe Mode and running Recuva there.

Before trying those measures, I killed Recuva, restarted it, re-ran the scan, and tried again. That seemed to work: this time, Recuva went all the way through the scan and restoration process. For some reason, its Deep Scan took more than 2.5 hours. During the process, it was reporting that it had found more than 80,000 files. But in the end, it reported that it had identified only 73,293 files. Its attempt to recover those took 36 minutes. The results: 70,856 were “fully recovered”; 2,420 were “partly recovered”; and only 17 were not recovered. The stated reason for failure to recover those 17 was, in each case, “Unable to find file data on the disk.”

Now, why were these results so different from the previous attempts on the separate PROGRAMS and DATA partitions: why were more files recovered, and why did the process take so much longer? I couldn’t remember the exact sequence of events, but one possibility was that I needed to restart Recuva, not because it was buggy, but because I had used MiniTool Partition Wizard to change the partitions being examined. Maybe Recuva took a look when it first started, and continued to be guided by that. So maybe it got confused when the partition changed, and a restart cleared its head. In this theory, the 2.5 hours to run the scan was to be expected, because now it really was scanning that whole 1TB TEST drive.

At the time when I was looking at these results, several hours after starting, it seemed there could be another explanation for the inconsistency in Recuva’s output. Around the time when I restarted Recuva, I went into its Options menu. I wasn’t sure, in retrospect, whether I had changed anything there. As far as I could tell, such changes should not have had any effect on the results. But that was nonetheless a possibility.

Yet another possibility was that some of those unique files in TESTCOPY were extraneous to this project: either they had been on the 1TB drive previously, or they had been temporarily copied to that drive while this project was underway. I believed that the disk had been wiped previously, and that I had not used it for anything else during this project, but now was the time to make sure. I would have to go through some additional steps (below) before I would have a better idea about these possibilities.

Individual File Recovery

Let us review. My real goal was to recover the original file structure of the corrupted drive. As described above, I temporarily departed from that goal in order to verify that there were files to be recovered. Recuva had demonstrated that there were indeed recoverable files, within those supposedly unrecoverable partitions. In the spirit of seizing the opportunity, I had decided to take advantage of what Recuva was showing me: in several efforts, I had recovered what files I could, putting copies of them into folders on other drives.

Now it occurred to me that I might be able to learn some things by comparing the sets of files resulting from those several Recuva sessions. For one thing, I thought that an analysis of the sets of files provided by Recuva might shed more light on what had happened in this process — on what the various partitions held, and on what Recuva had achieved. In addition, it seemed that those sets of files might provide at least a partial reconstruction of what had been on that corrupted drive. A complete recovery would require more time than I was going to devote. But I did have some ideas (below) for how I might make that kind of reconstruction more feasible, if I was unable to recover the file structure.

To learn about these things, it seemed I needed to arrange the recovered files in two sets. One set would contain the net total of all recovered files. That is, it would contain one copy of each unique file appearing in any of those folders. That would be the set that I would be parsing and trying to reduce, if I did have to develop a file-by-file reconstruction. The other set would contain only those files that did not also appear in one or more of the other folders; it would highlight any ways in which the results from one recovery effort differed from the results of other recovery efforts.

I started by creating the latter set. To do this, I made a copy of each of the three folders containing files recovered by Recuva: the folder for the PROGRAMS partition, the folder for the DATA partition, and the folder for the TEST partition where DATA had been deleted and PROGRAMS had been expanded to fill the entire 1TB HDD. I labeled the first one PROGCOPY, the second one DATACOPY, and the third one TESTCOPY.

Now I wanted to eliminate all files that appeared in more than one of those three folders. After that, the files left in any one of those three folders would be unique to that folder: Recuva would have recovered those files in that recovery effort, but not in any of the other recovery efforts. I wasn’t sure there would be any survivors, after duplicative files were deleted: that was the question.

To identify and delete duplicates across those three folders, I used DoubleKiller Pro, my long-time familiar duplicate detector. I set it to identify files that had exactly the same size, date, and content, byte for byte. (The names would not necessarily be the same, because Recuva might have added a number to some filenames, so as to enable identically named files to coexist in the same output folder.)

I ran DoubleKiller on all three of those folders. It detected many exact duplicates. I marked all of these duplicates and told DoubleKiller to delete them all. So now the files in any one of those three folders would be unique to the recovery effort that had produced them.

As I looked at the three folders after that deletion, I was surprised to see that DATACOPY was unchanged: it still had the same 363 files. Likewise, 538 items remained in the PROGCOPY folder. But then it occurred to me that DoubleKiller might be looking at the dates when Recuva had recreated the files — not the dates when those files were originally created. I re-ran DoubleKiller, this time looking for files that were of identical size and contents, but whose dates might differ. This time, after deleting files that appeared in more than one folder, DATACOPY was empty. Everything that I got from that first run of Recuva seemed to have been re-detected in Recuva’s last run. So running CHKDSK on the PROGRAMS partition, and then deleting the DATA partition, apparently caused no harm to files that had been in the DATA partition; Recuva found them again after PROGRAMS was resized to cover the entire drive.

But while DATACOPY was empty, PROGCOPY was not. It still contained 147 items that were not exact size-and-content duplicates of anything in TESTCOPY. Well, maybe Recuva had recovered nearly the same files in both cases, but there were slight differences between them? I ran DoubleKiller again, this time looking for files that were of exactly the same size, and had exactly the same names, but did not pass the test of being byte-for-byte identical. Yes, DoubleKiller said, there were quite a few of those. Spot checks of various image (mostly .jpg and .png) files indicated that the ones in PROGCOPY were viewable, while their nearly identical counterparts in TESTCOPY were corrupted. (Apparently they still took the same amount of space on the disk, and thus passed the test of identical size.) This outcome suggested that resizing the PROGRAMS partition — or, more likely, running CHKDSK — had corrupted some of the recoverable files in the PROGRAMS partition. So it was probably a good idea to run Recuva, or something like it, both before and after attempts to fix things with CHKDSK or a partitioning program.

In DoubleKiller, I deleted those near-duplicates, and then I looked again at what remained in these folders. PROGCOPY was now down to just 15 items. I ran another DoubleKiller search, this time jettisoning the requirement of identical size, and looking for files that had nothing in common except having exactly the same name. That search failed. Spot checks confirmed that at least a few of these files did not exist in the full set of files recovered in Recuva’s last run. (Others of identical filename did exist. But perhaps they had been removed from TESTCOPY in DoubleKiller’s previous eliminations of all files that had identical counterparts in PROGCOPY or DATACOPY.) For example, one of the files remaining in PROGCOPY was a viewable .jpg image that did not exist in that final full set. This suggested that, in potentially rare instances (in this case, involving only a few out of tens of thousands of files), CHKDSK or Recuva had completely lost or destroyed files that were recoverable previously.

After these several rounds of eliminating duplicate and near-duplicate files, TESTCOPY retained more than 4,700 files that had not been recovered by either of the two other efforts. As I saw when I sorted files by type in Windows Explorer, the vast majority of these were program-related: they included files (with extensions like .js, .exe, and .dll) that appeared to be involved with the operation of various Windows programs. But I was able to identify a few viewable or playable media files (e.g., .jpg, .mp3), and these were clearly not mine. To the extent that I was able to make anything of either the names or the contents of the files remaining in TESTCOPY, I was confident that they arrived in that folder, not because I had somehow mixed up my data with the corrupted hard drive, but rather because the last run of Recuva did detect files that were not (and perhaps could not be) recovered until after I had used CHKDSK and revised the partitions.

That completed the investigation of files that were recovered in only one of the three Recuva attempts. Now I wanted to assemble a combined set of unique files from these three Recuva efforts. These would be the files I would work through if I did attempt a manual reconstruction of the drive’s contents — if, that is, I was unsuccessful in my attempt (below) to recover the file structure. I decided to assemble this complete set of files now, while I still had the matter fresh in my mind.

The foregoing procedures seemed to suggest a way of assembling this full set of files. First, I ran DoubleKiller to identify exact byte-for-byte duplicates among the three sets of files recovered by Recuva. This time, I did not make the mistake of insisting that they have the same date; a byte-for-byte comparison (entailing also an exact-size comparison) would be sufficient. Where DoubleKiller identified duplicates, I sorted them by path, and deleted only those retrieved from the DATA partition, and then those retrieved from the original PROGRAMS partition. There were still byte-by-byte duplicates remaining in the TEST partition after I had eliminated duplicates appearing in the PROGRAMS and DATA partitions, so I sorted duplicate sets by filename and marked the last files of each set for deletion.

Note: where DoubleKiller said that I had marked all copies of a duplicate pair, I had to unmark one, and repeat doing so until DoubleKiller didn’t complain anymore. The situation, there, was that the duplicate of a PROGRAMS file was not in the TEST folder, but rather was under a different name in the PROGRAMS folder. The fact that duplicates were detected did not guarantee that they would always be in different folders. The goal was not simply to erase everything in PROGRAMS; it was to eliminate duplicates. DoubleKiller would usually move the display to highlight the pair in question, so that I could see that I had to uncheck one of the two.

This process wiped out the files recovered from the DATA partition. If it hadn’t, I would have run another DoubleKiller session, this time looking for files in PROGRAMS or DATA that had identical filenames. If I had found any, I would have changed the names of one of the duplicates slightly, by adding a letter or number to the end. Either way, the goal was to combine the files in the PROGRAMS and DATA recovery files into a single PRE-CHKDSK folder. I renamed the TEST output folder to be POST-CHKDSK. These names would remind me why I had these two separate folders of files recovered from the original drive, if some time passed before I got back to this project.

Partition Recovery: Trying Again

The foregoing look into file recovery was interesting, but I hoped it was a mere sideshow. I still wanted to recover the original file structure, with files nicely organized as they originally had been. Some files were not going to make it, but it looked like the vast majority were recoverable, so it seemed like there should be some way to resurrect the original arrangement.

MiniTool Partition Recovery Wizard had not been helpful on my previous try (above). But now that I had run CHKDSK and, moreover, had expanded PROGRAMS to fill the 1TB drive, I hoped it would be able to detect the original PROGRAMS file structure. Also, while fiddling with Recuva, I had also noticed that it offered an option (in its Options > Actions tab) to restore the drive’s folder structure while recovering its files. I figured I would start with these two tools, and then resort to other tools if necessary.

Since I still had Recuva open, with the results of its Deep Scan of the TEST drive still visible onscreen, I decided to take the opportunity to see how it would do at this new task. So I clicked on its “Restore folder structure” option and re-ran its Recover operation, this time steering it toward a folder called TESTRUCT, where it could go ahead and restore the whole file and folder structure from the TEST drive.

Recuva took 58 minutes and recovered almost but not exactly the same numbers of files as before: 73,274 total, of which 70,854 were fully recovered and 2,420 were partially recovered. Nineteen were not recovered, mostly because Recuva was “Unable to find file data on the disk” but in a few cases because “The system cannot find the path specified.” I looked in the TESTRUCT folder, to see what had been achieved. There were six top-level directories, of which five were inconsequential (e.g., $RECYCLE.BIN; a “drivers” folder containing a dozen files that were clearly not drivers). By far the main folder was something called “Unknown folder.” TreeSize indicated that this Unknown folder contained more than 99% of all of the material that Recuva had found. Plainly, we did not have a successful reconstruction of a Windows XP program drive.

Within that Unknown folder, Recuva reconstructed numerous folders whose names did not always seem consistent with their contents. For example, in a folder called Bind Logs, I found no logs — only a .dll file and an .ini file. In a folder called KB974392 — one of a series of similarly named folders, all evoking Microsoft KnowledgeBase updates — I did not find anything related to any such update, but rather found a single Microsoft Word document named Lecture #1.doc. There were many other examples. The situation appeared to be that Recuva had found some folder names, and had found some files, and made an educated or perhaps wild guess as to which belonged with which. While there seemed to be a few instances where the folder names might lead directly to relevant material, my overall conclusion was that it, if this was the best that a folder recovery program could do, it would be more effective just to work through the raw collection of unsorted files assembled in the previous section of this post.

I decided to delete the contents of TESTRUCT and try again, this time using MiniTool Partition Recovery Wizard. I began, as instructed, with its Quick Scan. As in my previous use of this tool, it found only one NTFS partition, containing only 115MB of data. I tried again, this time with the Full Scan. That took more than two hours, for this 1TB drive, but it did find three “Lost/Deleted” partitions in addition to the one Existing partition. For all of these, the reported “Used Size” was quite small — even smaller than the 115MB reported for the existing TEST partition. I concluded that the Used Size information was confused. Looking at the Starting and Ending LBA columns, I found that one of the Lost/Deleted partitions appeared to span a larger space than either of the others. (The existing TEST partition was of course the largest, but that was to be expected: it spanned a full 1TB, which had not been the case with the single partition on the original hard drive, the one that existed before I repartitioned that drive with GParted.)

The Partition Wizard tutorial advised me to select “all needed partitions.” But having seen how Recuva had made a hash of things, I decided this was not to be interpreted as advice to select all partitions every time. In this case, I didn’t want these multiple partitions to be coming back from the dead, all entangled with each other. I thought it might be wiser to select that one larger (perhaps original) Lost/Deleted partition. If that didn’t work, I wouldn’t run the Full Scan again; I would just come back and try with the TEST partition found on the Quick Scan.

So I selected that one Lost/Deleted partition. Partition Recovery Wizard (PRW) grayed out two of the others, indicating that they “overlapped” with this one. I found I couldn’t select them even if I tried. But the last of the four was not grayed out. So, OK, I selected it too. I clicked Finish. PRW said, “At least one of the existing partition will be deleted.” This meant that it was going to replace the existing TEST partition with those prior partitions. I didn’t want that. PRW hadn’t been able to do anything with those prior partitions. I thought it was going to just reach through TEST to fondle whatever lay beneath. I’d rather just have it work with TEST. So I unchecked those and checked TEST, the one Existing partition. Then I clicked Finish again. I guess that was stupid, because PRW just returned me to the previous screen, where I was before I clicked the PRW link. Evidently it wasn’t going to dig around in the soil beneath TEST. TEST existed, it had no files, and we were done.

While PRW was doing its two-hour exploration, I reviewed sources discussing some of the other recovery programs mentioned above. They would all recover individual files, but now I was looking for good alternatives to recover the drive’s previous folder structure. TechRadar said that FreeUndelete and Glary Undelete would do that. I would have tried Paragon Rescue Kit, but its website gave me the download but not the key that I needed to use its free version. Also, I had also seen many endorsements of TestDisk, which (like MiniTool) would perhaps be more able to do something, now that I had run CHKDSK.

I decided to try TestDisk again, this time using it in Windows instead of booting it from a USB drive. Its download page had some confusing advice about the 64-bit version, but my system was able to run it, so I went with that. It ran as a portable. It asked me to Select a Media. I waited until MiniTool (above) was done, and then proceeded. Using arrow keys, I chose the 1TB TEST HDD, and then I tabbed down to the bottom line, arrowed over to Next, and hit Enter. That just moved the selection to the next drive in the list. Not what I wanted. The mouse didn’t work in the TestDisk box. Well, maybe there wasn’t any Next. The tutorial said I should choose Proceed. I arrowed over to that and tried Enter again. That worked. Next, I was to choose the partition table type. I wanted to help TestDisk not be confused, so I chose Intel and then Analyze. In its Current Partition Structure report, it found three partitions. All were marked as having “Bad relative sector.” My only choices were Quick Search or Backup. I didn’t need a backup, so I selected Quick Search and hit Enter.

TestDisk then proceeded to analyze 121,599 cylinders. This took about 40 minutes. When it was done, it identified two adjacent partitions. These were presumably the PROGRAMS and DATA partitions that I had created with GParted. I selected the first one and hit P. It showed me a list of files and folders in that partition. There were basically none, other than System Volume Information and the two designed by one dot (“.”) and two dots (“..”). I hit colon (“:”) next to each. Only the third one, System Volume Information, turned green. I didn’t care about that folder, so I hit Q to quit. Same thing with the other of the two partitions. I hit Enter. Now I had the opportunity for a Deeper Search. This seemed to be just analyzing the same 121,599 cylinders again. It ran for a while — it didn’t say how long, and I wasn’t present when it finished — and then it reported four partitions. Hitting P to list files in each yielded nothing more than the single and two-dot entries and the System Volume Information folder. For one partition, I didn’t even get that: it just said, “Can’t open filesystem. Filesystem seems damaged.” Only one of the four was green. I selected that one (if it made any difference) and hit Enter. Overall, I wasn’t clear what was happening, even with the aid of the tutorial, so I bailed out, and tried some others:

  • I started up Glary Undelete, pointed it toward drive H, the 1TB drive, and clicked Search. It found nothing. Was this indeed the same drive that I had examined with Recuva? I started Recuva again to be sure. I only had to run it for a few seconds to confirm: yeah, it saw hundreds of files. Well, alrighty then.
  • Moving right along, I tried FreeUndelete. Same thing. These seemed to be lacking a counterpart to Recuva’s Deep Scan.
  • Puran File Recovery. It offered Level 1 (slow but recover more data) and also Extreme Level 1 (Faster). Beyond those included parenthetical comments, there did not seem to be any explanation of the difference, either in the built-in help file or online. I went with Level 1. I clicked the button labeled “Recover Data and Copy To,” and designated an output folder. It was done almost instantly: it had found six files.
  • Undelete 360 found zero files.
  • Restoration found nothing.
  • Likewise Wise Data Recovery.

We seemed to have a trend. Really, I could not tell whether any of these programs was doing anything differently than any of the others were doing, with the exception of Recuva, which definitely was producing results unlike those produced by the others.

Manually Reducing Unsorted Files

It appeared that no useful folder structure could be recovered from the restored image of the original HDD. I was left with the relatively complete but unsorted sets of files that Recuva had given me in its several passes through the 1TB TEST partition. Now I wanted to boil down those sets into more manageable piles.

My first step was, as above, to separate out the program files, which were useless to me in their present condition, and which would probably be recovered anyway by reinstalling Windows and various programs. To help with this sorting process, I used a spreadsheet to develop commands that would move user data files to a different folder. This was strictly optional; it was just intended to save a bit of time.

In summary terms, here is how I proceeded with that spreadsheet. First, I opened a command window located in the folder where my PRE-CHKDSK and POST-CHKDSK file sets were stored, and ran this command:

dir /a-d /s /b > d:\dirlist.txt

I opened dirlist.txt in Notepad, made sure that Notepad’s Word Wrap feature was turned off, and copied and pasted that list into a Microsoft Excel spreadsheet (LibreOffice would also suffice). In that spreadsheet, for each file listed, I used the RIGHT command to single out the four rightmost characters in that file’s name (i.e., to identify its extension, among the many possible file extensions); did a unique filter to produce a list containing just one instance of each such extension; sorted that list; put an X next to those extensions that were typically associated with user data files (inserting a COUNTIF to identify the most frequently used extensions, so that I wouldn’t be overlooking any major data categories); sorted again by the X column; got rid of the unique list entries without an X; resorted the unique list alphabetically; did a VLOOKUP next to the original list, to mark each file whose extension appeared in that select list; and wrote an Excel command to produce a file-moving command, like this:

="move /-y "&CHAR(34)&A2&CHAR(34)&" D:\Marked\"

From the spreadsheet, I copied all of those commands (one per file to be moved) into Notepad, saved that file as mover.bat, and ran it. For every file that had the same name in the PRE-CHKDSK and POST-CHKDSK folders, that batch file stopped me, to ask if I wanted to overwrite the other file having the same name. To avoid those interruptions, I should have done earlier what I had to do later: use Bulk Rename Utility to add a letter to the end of each of those duplicate filenames, so that they would merge without overwriting files in the target directory.

The extensions I marked, from the unique list, were these: .avi, .bmp, .doc, .flv, .gif, .htm, .jpg, .m4a, .m4v, .MOV, .mp3, .mp4, .PDF, .png, .ppt, .rtf, .swf, .tif, .txt, .wav, .wma, .WMF, .wmv, .wpd, .xls, .zip, .docx, .html, .jpeg, .tiff, .xlsm, and .xlsx. Those were the evident data file extensions I noticed in this list.

Again, instead of using the spreadsheet and the batch command to work with extensions, I could have just sorted the files by Type, in Windows Explorer, and manually selected and moved files of these types.

Either way, I wound up with a moderately accurate division of the recovered files. As above, I roughly categorized the one group as containing data files and the other group as containing program files. I used WinRAR (7-Zip would also have been fine) to zip the program files into a single large file, to get them out of the way; I was not quite sure I was ready to delete them. This is not to say there was no conceivably useful data in those program files, nor was I going over this with a fine-toothed comb. I was just triaging a large number of files into two roughly distinct groups.

Now I turned my attention to the data files. As described above, I had already deleted byte-for-byte duplicate files from the PRE-CHKDSK and POST-CHKDSK folders into which I had placed the files recovered by Recuva. But I had not yet repeated the previously used approach of comparing files that had the same name and the same size, but that were not byte-for-byte identical because CHKDSK had corrupted the copies found in the POST-CHKDSK folder. I should have done that before I added a letter to their names, in order to combine them into the same folder: now they no longer had the same names.

But corrupted files could also be detected in other ways. One approach would be to use batch files to try to open or print large numbers of files, to see whether they were corrupt. In previous posts, I had described that approach for specific file types: .wav, .txt, .ws, .jpg, .doc, and .pdf. I had not looked recently; there might also be tools designed to test such file types far more easily.

Sorting data files into categories (e.g., .pdf, .wav, .doc) could identify large groups of files that would be of minor significance. For example, on this drive, it appeared that many .wav files were there simply as sounds effects within the Windows operating system (e.g., mouse click), and many .txt files appeared to consist of notes or licenses relating to various pieces of software. It might also be possible to identify and zip, or delete, groups of files that were no longer playable or useful.

As I viewed the various file types, I was reminded that Recuva had retrieved files having the same name, originally located in different folders, and had put them all into one folder. To do that, Recuva had modified the names of some slightly. It appeared Recuva’s technique was to add the number 2 (or 3, if there were three) to the names of duplicate (or triplicate) files. DoubleKiller could be used to identify likely duplicates by using combinations of relaxed comparison criteria (e.g., search for files whose names have the same first six letters, whose size is within 4KB of each other, and whose content is identical up through the first 25Kb). The spreadsheet technique summarized above could also be used with a directory listing to identify files having almost the same names and the same sizes. Right-clicking on DoubleKiller comparison results could also produce data for comparative spreadsheets.

This sort of endeavor would tend to transition from a general data recovery project into a file-by-file analysis. If it did get down to the level of examining individual files. IrfanView would be a useful tool. It was able to view a variety of files, and to move quickly from one to the next with a touch of the right arrow key, and its batch processing mode (File > Batch Conversion/Rename > check Output Format, uncheck Use Advanced Options, designate Output Diretory) could quickly detect those image files that were not corrupted. Photo comparison software (notably VisiPics) would assist in identifying and deleting duplicative images. As another example of tools that might be useful on a file-by-file level, Microsoft Word had the ability to compare multiple versions of documents, to see whether there were any differences between them. At a certain point, this sort of analysis would require, and would benefit from, the commitment of focused effort by a motivated individual over what could be a very long time.

This entry was posted in Uncategorized and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s