I had some files I wanted to archive on offline plastic discs. This post discusses why and how I proceeded in that task.
The “why” is simple enough. An offline backup in an unwriteable format (i.e., a burn-once ROM disc) offers protection against viruses and mistakes. You cannot wipe out your backup or archive disc by getting it too near a magnet, because it is not magnetic, or by accidentally reformatting it, because it won’t reformat.
Of course, a plastic disc is subject to physical damage. Forms of potential physical damage include heat, scratching, sunlight, excessive humidity, chemicals, microwaves, use of non-water-based or ballpoint markers on the disc label, flexing, and adhesive labels. Discs also vary in quality; some may endure what others cannot. That is another way of saying that every storage medium has its weak points. The best archival strategy includes storage in more than one medium. Paper (i.e., printing) counts as a medium: it may be impractical for many purposes, but has its defenders in photography at least.
As for the “how,” the following discussion provides the details. But, basically, I used WinRAR (or could have used 7-Zip) to create compressed archives; I used a VeraCrypt container to provide security for those archives; I filled that container with those archives and then burned that container to BD-R disc, and then emptied out the container and repeated the process as needed.
Disc Type and Data Rot
Optical discs (usually spelled with a “c”) have much less capacity per cubic inch than do contemporary hard disk drives (HDDs). At this writing, Amazon offers a popular 3.5″ 3TB HGST HDD, with 2.7TB of usable space, for $120. By comparison, Amazon offers a stack of 25 100GB Verbatim M-Disc BD-R (i.e., Blu-ray) discs (having about 10% less total capacity than the 3TB drive) for $466, plus the cost of the M-Disc drive. Conventional (as distinct from those M-Disc) Verbatim 100GB BD-R XL Bluray discs currently cost about $11 each, so 27 of them (i.e., enough to match the capacity of the 3TB HDD) would total to about $300. A 50-pack of the older 25GB Verbatim Blu-Ray discs (i.e., 1.25TB altogether) costs only $40, making them price-competitive with the HDD, but much slower and bulkier. Probably the better bet would be to start with a 50-pack of 50GB (also known as DL, for double-layer) Verbatim Blu-ray discs, currently $113. Finally, it would take somewhere around 600 ordinary DVDs to back up that 3TB HDD.
Those calculations use Verbatim as a commonly mentioned quality brand. Accounting Weekly (Naicker, 2015) conveys the widely shared view that, except in the case of damage (above), standard optical media have a 50-year lifespan, archival-quality optical media have a 100-year lifespan, and M-Disc technology has an expected lifespan of more than 1,000 years. MakeUseOf puts longevity at 20 to 100 years for recorded DVDs and up to 200 years for recorded BD-Rs. But Wikipedia reports tests suggesting no greater longevity for M-Disc than for conventional media, and Memorex estimates “theoretical readability” of data recorded on a CD, DVD, or Blu-ray disc at “up to 30 years,” with this caveat: “Considering practical uses and everyday conditions, much earlier migration onto alternative technologies may be appropriate.” Memorex also notes that new, original-packed discs may be writeable for a shelf life of between five and ten years. Of course, the larger the disc, the more data lost if a disc fails.
Issues of optical disc longevity may be summarized in the term “disc rot” as distinct from “bit rot.” Wikipedia characterizes the former as “decay of storage media,” and advises that it can be reduced by storing discs in a dark, cool location with low humidity. By contrast, The Guardian (Gibbs, 2015) says that bit rot is the loss of file access due to the disappearance of the necessary software. An example would arise in an attempt to read a Multimate file, if antique copies of the old Multimate word processing program could no longer be found. But there can be hardware examples too. In the early 1980s, Columbia’s business school removed one of the two eight-inch floppy disk drives in its computer lab, and I discovered that the other one was unable to read disks I had written on the now-vanished drive. But maybe that should be called drive rot.
And then there is the separate issue of file rot, though at present it doesn’t seem to be commonly called that. The concern here is that random errors can creep into files, due perhaps to user errors, disk read errors, or file copying errors. To address this, I had recently encountered a program called Bitrot Detector, designed to notify the user if a file has changed since the last check. After brief tinkering, I decided I preferred my own backup system as a way of keeping an eye on possible file rot.
The point of all this rot is that, before making an archive on disc, ideally one would verify the quality of the files being backed up, and would replace any defective files from current sources (e.g., friends, currently available backups, replacement audio or video recordings of people or places) while they are still available — which they may not be, if years pass before the defect is detected. On that issue, I had experimented with various techniques to test PDFs, MP3s, and other file types.
Once I had my files together, and had tested them as much as I could manage at present, the question was, how should I arrange them for packing onto a disc? They would be most accessible, and would consume the most space, if I didn’t compress them; and in that case, unless I had passworded them individually, they would also not be secure. The better approach was, surely, to zip an entire folder, or multiple folders, into a single compressed, encrypted file.
I had bought a license and had been using WinRAR as my file compression program. Now that the popular open-source 7-Zip program had been updated to support the RAR5 archive format (joining the less known but apparently worthy Bandizip) and to eliminate known security vulnerabilities, I was no longer concerned that using RAR5 could leave my archives vulnerable to their own bit rot, years later, if the WinRAR people packed up and left town.
WinRAR and 7-Zip were two of the most commonly mentioned file compression tools. Gizmo favored 7-Zip, choosing as alternatives PeaZip, B1, and IZArc, along with Bandizip and others. (See also Raymond.) WinZip was possibly the best known, but it cost $30 unless I wanted to get it free through TrialPay, which would require me to buy or at least try some other product. I dimly recalled wasting some time on TrialPay at one point, and I saw now that it had its share of complaints from others. So I didn’t pursue WinZip further.
TechRadar noted that Windows came with its own built-in compression capability, but that it was limited and problematic. TechRadar named 7-Zip (4.7 stars at Softpedia and the leader at AlternativeTo), along with PeaZip, Zipware, Ashampoo, and WinRAR (4.5 stars at Softpedia). In comparisons of Windows 7 built-in zip, 7-Zip, WinRAR, and WinZip, there was agreement, between MakeUseOf and OnlineTechTips, that 7-Zip achieved the most extreme compression. MakeUseOf also compared elapsed time and found that WinRAR achieved the fastest compression. The PeaZip website agreed that the 7z compression format was best for compression as distinct from speed (likewise LifeHacker and others), but also said that RAR had the advantage of incorporating error recovery information.
To add to the tests conducted by MakeUseOf and OnlineTechTips, I ran a set of tests that attempted to detect optimal settings. Another post describes those tests. For various reasons described there, I decided to stick with WinRAR. I did not explore the various WinRAR settings thoroughly. Pending further insight, for general day-to-day use I chose WinRAR’s Good compression method and a 32MB dictionary size.
Based on the results of my tests, for ordinary day-to-day usage, I set WinRAR to exclude several filetypes from compression: JPG, PDF, MP3, MP4, ISO, RAR, and GZ. I did that based on the decision that the several percentage points of compression offered by 7-Zip or WinRAR for these filetypes were not sufficient to justify the time required to compress these filetypes, especially since the addition of a recovery record could actually produce an archive larger than the original file set. Excluding these filetypes from compression would not prevent WinRAR from combining them into a single archive file. As noted in the other post, there did appear to be ways to squeeze a bit more compression out of some such files, perhaps with the aid of tools like Precomp. For my purposes, the project at hand did not justify the time required to investigate and understand such refined compression techniques — and I would want to understand them well, lest I be unable to extract the compressed files later.
As I continued with the following discussion, I found that, for this project, I would be altering that list of filetypes that I would ordinarily want to exclude from compression. I decided to change other general-purpose WinRAR settings for this project too, as detailed below.
To reduce the potential damage from a defective archive file, WinRAR offered two data recovery options. First, there was the option of including a recovery record within an archive. The suggestion was to set this record between 3% and 10% of the archive size: the larger the recovery record, the more damage the archive would be able to sustain without data loss. I had previously looked into this and, for ordinary purposes, I had decided on a 5% setting.
Second, there was the option of using recovery volumes. These would only work with archives consisting of more than one volume. For example, in a ten-volume archive, a recovery volume (the eleventh volume in the set) would be able to replace any one of those ten. As explained in a SuperUser discussion, this was possible because the recovery volume would not save a copy of the data contained in any archive volume. Rather, it would save the sums of numerical values calculated from those ten volumes. If one volume was defective, the sums would not total up correctly, and the recovery volume would know what the missing value was. It was possible to create more than one recovery volume. The maximum was one less than the number of volumes in the archive (e.g., up to nine recovery volumes could be recovered in a ten-volume archive), though of course this would nearly double the size of the archive set.
Taken together, the recovery record and recovery volume options offered a sort of backup within the backup: not only would I have these discs, but the discs would have built-in self-protections. That could be important in some circumstances. For this project, it wasn’t. I would rarely if ever be using these discs. They were just a secondary backup with the virtue of permanence. Viruses and human errors would tend not to bother them, safely stacked inside their plastic cakebox. The kinds of damage that could befall them — fire, flood, theft — would tend to be the kind that would wipe out the entire set, no matter how many recovery records they contained. If I wanted a fail-safe solution, I would be further ahead using extra discs, not for recovery volumes, but rather for a second archival set that I would store somewhere else.
These thoughts suggested that I might not even bother using WinRAR on filetypes that would not be efficiently compressed (e.g., MP3, above). “Compressing” those largely uncompressible files would achieve little beyond putting them into a single archive, where they would actually be somewhat less accessible. The use of a compression program would add one more way in which something could go wrong. And instead of damage (e.g., a scratch) affecting a subset of the files on a disc, damage could leave the entire volume unreadable unless I added a recovery record or volume that could make the resulting archive larger than the files stored in it. It seemed that it might make more sense to burn those filetypes onto the disc directly, without compression. But for purposes of convenience, I decided to go ahead and use WinRAR to create archives for the files I was burning onto BD-R, with certain exceptions described below.
I wanted the contents of the archival discs to be unreadable by others. This was not necessarily as easy as one might expect.
The first thing I learned was that I would probably not be able to achieve perfect security: discs that nobody else would ever be able to read. Even with the best technological solution available, certain vulnerabilities would remain:
- Quantum computing. According to Wikipedia, quantum computing involves computers that use an entirely new kind of computing hardware, built around principles from quantum-mechanical physics. Quantum computing can perform calculations very much more quickly than is possible on today’s computers. Unfortunately, as MIT Technology Review (Simonite, 2016) explained, at this writing, it was not clear what quantum computing might eventually achieve. The timeframe was unclear too: some experts believed that quantum computing would be dominant by 2025, while others felt that it was more like “15 to 30 years away” and that there would meanwhile be progress on quantum-proof security technology (IEEE Spectrum, Nordrum, 2016). For present purposes, the point seemed to be that an archival disc in hostile hands, secure today, might become insecure at some point in the future. Then there would be a question of whether those hostile hands could make use of it at that point. If the data consist of passwords, hopefully those would have been changed by then; if the data contain illegal materials, presumably there would be a question of whether the statute of limitations on the relevant crimes have expired; and so forth. These concerns would be reduced as long as the user retained physical control of the archival discs: the ones using older technology could be destroyed and replaced with newer ones, using superior technology, as the technology developed.
- Law enforcement. The rubber-hose cryptanalysis cartoon from xkcd (above) seemed to describe approximately what would typically happen, in a standoff against police who wanted access to the user’s encrypted data. Law enforcement personnel have ways of achieving that access. For example, Wired (Greenberg, 2015) notes that there are multiple routes into an iPhone, even without the phone’s password (see also Selyukh, 2016). There are also certain legal realities. Consider the case of a veteran of the Philadelphia Police Department whom a judge ordered to be jailed “indefinitely,” on grounds of contempt of court, for his refusal to decrypt his hard drive (Vaas, 2016). That cop was in that position even though he had not actually been charged with a crime and could afford a lawyer (who was arguing that disclosure of the password would violate the Fifth Amendment rule against self-incrimination). As of September 2016, the officer had remained in jail for nearly a year (Philly.com). The rarity of this sort of case may suggest that prosecutors and judges do not usually push the point to this extreme or, more frequently, that most people lack the resources to stand up against prosecutors, even if their demands might someday be held unconstitutional.
- Backdoors. There is the issue of whether tech companies should be required to give law enforcement agencies a way of accessing the contents of secured devices. The claim here is that criminals are “going dark.” That is the claim made in, for example, an FBI page that has now been removed, without any forwarding link, from the location where The Verge (Brandom, 2016) found it, but that fortunately remains available in the Internet Archive. A search led to numerous critiques of this “going dark” claim, including informative articles by RedState and by PC Magazine, as well as a Harvard report saying this: “Are we really headed to a future in which our ability to effectively surveil criminals and bad actors is impossible? We think not.” The Verge further noted that cloud companies (e.g., Facebook, Google, Apple) already fulfill “the vast majority of government requests” for cloud-based user account data. Motherboard (2016) reports that Microsoft retains encryption keys for customer data, and that Microsoft claims it has never handed over any such key to law enforcement — but also that, even if that is true, Microsoft might be compelled to do so in the future. Regarding device encryption, the American Civil Liberties Union (ACLU, 2016) offers a number of reasons why government backdoors (to the iPhone, but also to other devices) would be catastrophic. In a separate development, it appears that, in its zeal to capture terrorists, the U.S. National Security Agency (NSA) “purposefully spread a bad algorithm” used in encryption, potentially jeopardizing many products that protect sensitive government information (The Verge, 2013). There were also concerns that law enforcement pursuit of a backdoor may have been behind the collapse of the TrueCrypt encryption tool. It was also conceivable that a backdoor exploited by law enforcement could be discovered and exploited by other governmental or private actors anywhere. Generally, it appeared that there were steps that users could take to reduce their security risks, but that there could be unpleasant surprises regarding the actual extent of backdoor exposure.
Given those concerns, it seemed that the most practical route, for the ordinary user, would be to try to make sure that s/he did not have data that could get him/her in trouble with the law, and to secure his/her data as well as possible with the available technologies.
On a hard disk drive (HDD) or solid state drive (SSD), there would generally be two ways of securing data: encrypt individual files and/or encrypt the entire drive. My previous explorations had yielded the conclusion that VeraCrypt was the best tool for drive encryption. In a recent comparison against BitLocker, VeraCrypt’s chief competitor, LifeHacker (Henry, 2016) reaffirmed that earlier conclusion. Wikipedia indicated that VeraCrypt would be most secure when (1) power to the computer’s RAM had been completely cut for at least several seconds after the most recent entry of the VeraCrypt password, by which point residual RAM contents would disappear; (2) attackers had not obtained physical access to the computer, during which they might install keyloggers or other data-grabbing hardware or software; and (3) the computer was free of malware that, again, might log keystrokes, including the VeraCrypt password.
Of course, burning data onto a write-once (DVD-R or BD-R) disc would be different from saving it on a HDD or SSD: the latter could be formatted with VeraCrypt, and then files could be added to or removed from it. A write-once disc would not be amenable to such repeated passes. It appeared that VeraCrypt would be useful with a write-once disc only through the approach of creating an encrypted container, and then burning that container to the disc, just as one would burn any other file to an optical disc. The user seeking access to the contents of that disc would start VeraCrypt, enter the password, and mount that disc’s VeraCrypt container to a drive letter in Windows.
So then there could be a question of whether a single VeraCrypt container file would be more secure than a single WinRAR (or other) compressed and encrypted archive. PC World (Spector, 2015) pointed out problems with the ZipCrypto encryption often used with ZIP files. Instead of ZIP and ZipCrypto, How-To Geek (2014) recommended using AES-256 encryption in WinRAR or 7-Zip. To do that, in WinRAR, using Windows Explorer, I selected the files to be compressed, selected WinRAR > Add to archive > General tab > Set password > Enter password and select “Encrypt file names” > OK. This produced a RAR archive. When I tried to open it, it said, “Enter password for the encrypted archive.” The process was similar but a bit simpler in 7-Zip. It offered a drop-down box for choice of encryption method, but for the 7z archive format, AES-256 was the only option.
But was AES-256 secure, as implemented by WinRAR or 7-Zip? A StackExchange Security thread raised several concerns. First, as of autumn 2015, when 7-Zip unencrypted an encrypted file, it would save that file’s contents in the Windows Temp folder, where they might remain until the next general Temp file cleaning if the file and 7-Zip were not closed in the proper sequence — and even after deletion, of course, the data could be recoverable until those temp files were shredded or the temp space was wiped . A SuperUser thread indicated that much the same was true for WinRAR, but that its temp files location could be set elsewhere — to, say, an encrypted drive (for security) or an internal SSD (for speed and security via TRIM). A Wilders Security thread contained the remark that, for purposes of security when using WinRAR or 7-Zip, “it’s not AES that’s the limiting factor. It’s the user.” Interpreting that statement, I found references to such things as “operating system leaks due to an inexperienced user” as well as the view that “most people will not generate a truly random password.” But another thread offered the views that there are no flawless AES implementations and that it can be difficult and expensive to test encryption software. There had been an audit of TrueCrypt; an audit of VeraCrypt was now underway; but there tended to be no comparable attention to AES implementations in tools like WinRAR and 7-Zip.
Regardless of whether one used VeraCrypt, WinRAR, or some other tool to encrypt the files being burned to optical disc, there would always be the question of whether an adequate password was used for encryption. This excerpt from my previous post provided useful links:
People have also been known to do foolish things with passwords. And for weak passwords, brute-force attacks (i.e., guessing millions of passwords per second, narrowed down with the aid of various rules and databases) provide an option . . . .
How-To Geek (2015) said that a strong password would include at least 12 characters, comprised of a mix of upper- and lower-case letters, numbers, and symbols; would not use ordinary dictionary words alone or in obvious combination; and wouldn’t use obvious substitutions (e.g., replacing the letter “o” with the numeral zero in H0use). Another post provides suggestions for remembering such passwords.
Geek recommended Dashlane as a password manager, for situations where the user would need assistance in remembering passwords, and Diceware as a passphrase generator. PC Magazine (2016) and others (e.g., ASecureLife; TechNorms) agreed that Dashlane and LastPass were the best password managers. It appeared that people who did not want or need to store their passwords in the cloud (as tools like LastPass would do) often preferred KeePass. A search led to pages contending that some if not all password managers had been hacked and/or were vulnerable to hacking.
Others echoed and added to the tips provided by How-To Geek (above). LifeHacker (Lee, 2014) pointed toward an article contending that hackers have already found ways to defeat Geek’s multiword passphrase recommendation. LifeHacker also suggested the Person-Action-Object (PAO) method: think of a person performing an action involving an object (e.g., Beyonce driving a Jello mold at Mount Rushmore), use some of the resulting letters (e.g., driJel), and combine that output with the results of two other similar stories to make an 18-character password. LifeHacker also recommended a “phonetic muscle memory” technique that did not seem, to me, to create memorable passwords. MakeUseOf suggested carrying part of the password with you, to jog your memory for the rest, or using nursery rhymes or other familiar sets of words (e.g., favorite song or movie lines; industry-specific lingo) to create character sequences (e.g., use 7bb for Little Boy Blue, using the 7 as an upside-down L). WikiHow offered the ideas of simply repeating the same password twice, and of alternately combining the letters of two or more words (e.g,. house + plane = hpoluasnee), perhaps with one backwards; or removing letters (e.g., vowels) (e.g., “miss you very much” > mssyvrymch). Several sites suggested just moving fingers a bit on the keyboard (e.g., to the left one key) and then typing a memorable phrase in that position (e.g., hitting “s” for each occurrence of “d” in the phrase).
These and other sites offered other suggestions as well. It was not clear that all of these suggestions would produce passwords that a user would remember, for a rarely used optical disc archive. Here, again, it also seemed advisable to have some awareness of vulnerability. Ars Technica (Goodin, 2013) told of a test in which a password expert cracked 90% of a list of over 16,000 password hashes within 20 hours, and of an anonymous cracker who deciphered 62% of that list in about an hour. OnlineDomainTools reportedly provided a good way to test passwords — or the user could try tools used for password cracking, such as HashCat, RAR Password Genius, and John the Ripper.
A VeraCrypt Container
I considered the approach of using WinRAR to compress my files, perhaps with a password; then putting that .rar archive into a VeraCrypt container; and then burning that container onto my optical (e.g., BD-R) disc. (See Raymond and Ghacks for related screenshots.) It now seemed that this approach in particular, and the use of VeraCrypt in general, would have certain advantages and drawbacks:
- VeraCrypt would encrypt files, but was not designed to maximize use of space. WinRAR would make the most of each disc, when used to compress those files that could be effectively compressed.
- If I passworded the WinRAR archive (with, presumably, its own distinct password), I would have to remember two different (presumably long and complicated) passwords, over the period of months or years that might pass before I might next need to access these archives. If these passwords were passwords that I also used elsewhere, they would be easier to remember; but if they were discovered in those other settings, they would compromise this archive as well, and vice versa. Of course, writing them down would create a security vulnerability in itself.
- WinRAR could be set to remember full paths. Hopefully that meant that, when WinRAR restored files from its archives to my hard disk, it would recreate the original directory structure needed to put these files back in their original place (e.g., \Top Folder\Subfolder 3\Sub-subfolder B\Filename.ext). VeraCrypt did not seem to offer any similar capability. So if I put files into the VeraCrypt container without using WinRAR (or some other tool or method to reconstruct the directory tree), those files would not be restored in an organized manner to their original folders. With that in mind, it would probably be best to use WinRAR to archive files from their desired locations (or from a copy of that location, on another drive). I would probably not want to move them into a staging area, somewhere else on the drive, before archiving them, unless that staging area was actually the location where I would want those files to be restored.
- VeraCrypt would apparently not decrypt its files into the Windows Temp folder or anywhere else. There would be no need of that: its files would already be ready to use from the disc. But if the files stored in the VeraCrypt container were archives readable with WinRAR or 7-Zip, presumably those files would be unpacked as usual into the default temporary directory (e.g., C:\Windows\Temp), with the security drawbacks mentioned above.
- VeraCrypt supported plausible deniability, in the sense that it would be possible to create a hidden volume within another volume. An intruder, or an extortioner or law enforcement official given the password to the outer volume, would theoretically fail to detect the existence of the inner volume. But a review of VeraCrypt’s documentation suggested that denial might not be very plausible in the case of a multidisc archive set. For example: we have four 25GB discs, suggesting between 75GB and 100GB of data; and yet, when the contents of those discs are copied to another drive, we find only 35GB of data. Why would someone use four BD-R discs to contain 35GB? The user in that situation, standing before a judge, might quickly recalculate the question of whom the judge was likely to believe. There was also the question of whether the user, other knowledgeable people, related circumstances, and the willingness or refusal to take a lie detector test would enhance or impair the user’s credibility and his/her vulnerability to a perjury conviction.
Such considerations persuaded me that, if I wanted to encrypt my files, VeraCrypt offered the best solution, albeit imperfect. As explained in a StackExchange Security thread, I’d prefer a tool that was designed for the specific purpose of maintaining security, was presently being audited with that purpose in mind, and was descended from another program (TrueCrypt) that had already passed such an audit — rather than a tool (e.g., WinRAR) that was designed for another purpose (i.e., compression) and just had the security aspect as an additional feature.
That said, I would have to use something like WinRAR if I wanted to compress, and retain the directory structure, of the files stored in that VeraCrypt container. I had set WinRAR’s temporary folder so that it would be on an encrypted drive, so any temporary data generated during file decompression would remain secure. Note that the files would need to be compressed first, before they were put into the VeraCrypt container. The container itself would not be efficiently compressible, because VeraCrypt would fill empty space in the container with junk data, to prevent attackers from figuring out the size or configuration of the actual data files in the container.
So I would create a VeraCrypt container that would be as large as possible, within the size constraints of the BD-R disc; I would mount that container, using VeraCrypt; I would use WinRAR to compress my files in an organized manner and in a size that would fit within that container; I would move one or more WinRAR archives into the container, to fill it; I would dismount the container in VeraCrypt; I would have a container ready to burn onto BD-R, in the form of a single file of the right size for the BD-R disc; I would burn the container onto the disc; and then I would remount the container, empty out its contents, and start the cycle again.
I felt that encryption with VeraCrypt — and, optionally, with WinRAR — would provide good security for the files I planned to burn to Blu-ray discs. I was not eager to have to recall yet another password. Hence, I did not pursue the option of buying a program (e.g., Nero, Ashampoo) that would allow me to password-protect the contents of those discs. I planned to use a free program, like ImgBurn, to burn the VeraCrypt containers onto the BD-R discs. So I did not have to explore the question of whether certain tools (e.g., Slysoft’s CloneBD; IsoBuster) could be used to crack Blu-ray disc passwords.
Calculating Disc Capacity
Now I had to figure out how big my VeraCrypt container should be. During the process of researching this post, I came to regret buying cheap blank BD-R discs (PlexDisc 6x White Inkjet, UPC code 842378008789) instead of Verbatim BD-Rs. But I had them now, and I consoled myself with the thought that, with quantum computing and all, I would probably be revisiting the whole project just a few years down the line. It seemed these cheap discs would probably hold up OK for that timeframe.
My search led to several webpages blithely assuring me that a BD-R would hold 25GB of data — but also to a discussion thread where someone said s/he was getting only 24,220,000,000 bytes onto his/her blank BD-Rs. Hugh’s News pointed out the potential for confusion, between the 25GB that manufacturers quoted (using base 10 notation, i.e., 1000 bytes per KB) and the 23.3GB (actually GiB) that a typical operating system would quote (using base 2 notation, i.e., 1024 bytes per KB). Either way, though, Hugh affirmed that there should be 25,025,315,816 bytes of gross capacity on a BD-R. CDROM2GO said that the true recordable capacity would be 23.28 GiB, which I interpreted as meaning that my disc-burning program (e.g., ImgBurn) would tell me that an empty BD-R offered only 23.28GB of free space.
I decided to explore that for myself. I put a blank BD-R in the drive, started ImgBurn, and clicked on its “Write files/folders to disc” option. I clicked Test Mode and went to Device tab. It said that Free Space was 25,025,314,816 bytes. (I had pointed out, in an email to Hugh (above), that his webpage had both this figure and the …315,816 figure quoted in the previous paragraph.) It seemed I needed about 25GB worth of files to fine-tune this. Fake File Generator would accept only eight decimal places, so I was on my own.
As in a prior mission, I used Tahionic Disk Tools Toolkit to create several files filled with random characters: one containing 20GB, five containing 1GB each, and ten in each of the following sizes: 100MB, 10MB, 1MB, 100KB, 10KB, 1KB, 100B, 10B, and 1B. Windows Explorer > right-click > Properties reported that the 20GB file consisted of 20.0GB (21,474,836,480 bytes — which, divided by 1,024, and likewise for its quotient twice over, did equal 20). I used Bulk Rename Utility to rename these sets with filenames indicating their sizes, and conducted spot checks to confirm that the junk file creation and renaming had proceeded correctly. I noticed that “Size” and “Size on disk” indicated by Windows Explorer > Properties differed only for those files smaller than 100KB. When formatting the HDD, I had used the default NTFS allocation unit size (4096 bytes), so each of these smaller file sizes would take up extra disk space, up to the appropriate multiple of 4,096. I added 25GB worth of those files to ImgBurn and clicked the Build icon. It said,
Test Mode might not be supported by the current media (BD-R).
Data COULD actually be written to the disc.
Would you like to continue anyway?
This was not ideal. I tried replacing the blank BD-R with a previously burned BD-R, and then clicked Yes. I got an error: “This disc is not empty.” I bailed out of ImgBurn and tried the portable version of CDBurnerXP x64. To include the 20GB dummy file, I had to go into its Disc > Change file system > UDF option. Once that was done, I was able to play with the dummy files I had just created. The net result was that, in terms of the sum of the size of the individual files as reported in Windows Explorer and in CDBurnerXP, I was able to squeeze 23,306,161 KB onto this BD-R. The CDBurnerXP status bar reported it as 23859.38MB. Windows Explorer > Properties reported it as 23.3 GB (25,017,091,072 bytes, with size on disk of 25,017,106,432 bytes).
As a second test, I told Tahionic to create 15,000 files of 3MB each. CDBurnerXP reported that the blank BD-R could hold 12,162 of these 3MB files. CDBurnerXP’s status bar said that totaled 23,856.69MB, whereas basic arithmetic (i.e., 12,162 x 3MB) said it would be more like 36,486MB. That seemed like a pretty large discrepancy. I tried using CDBurnerXP > File > Save compilation as ISO file. That crashed the program. I tried again. This time, it worked. According to Windows Explorer > Properties, the resulting ISO had a Size, and also a Size on disk, of 23.2GB (25,015,549,952 bytes). ISO Opener confirmed that I could extract those 12,162 files back out of the ISO. I was not sure how those files could total to only ~23GB.
At any rate, the results of those two tests (in the two preceding paragraphs) were pretty close to each other, insofar as both reported that the disc could hold around 23,856MB to 23,859MB. It was time to see if I could determine the maximum burnable size for a VeraCrypt container. I went into VeraCrypt > Volumes > Create New Volume > Create an encrypted file container > Standard VeraCrypt volume. I gave it a name (realizing afterwards that including the word “VeraCrypt” in the name could be helpful for someone trying to figure out what it contained or how to crack it). I went with the default encryption options, and saw that VeraCrypt was willing to let me specify this container’s size in KB, MB, GB, or TB. The more precise measurements (KB or MB) would come closer to filling the disc completely.
Based on my two tests, I decided to try a VeraCrypt container of 23,855MB. (It would be easier to remember 25 billion bytes or 23.2GB; the 23,855MB number would merely squeeze out a bit more space.) In the Volume Format section of the VeraCrypt configuration process, I saw a checkbox with the word Dynamic next to it. A search in VeraCrypt’s documentation yielded an indication that a dynamic container would grow as more data was added to it, but would not shrink when data was removed. The documentation said performance of this sort of volume would be “significantly worse.” I wasn’t sure whether that would be true in a container burned onto an optical drive, but it didn’t matter; I didn’t need the Dynamic option. Also, since I would be combining most if not all of my smaller files into RAR files, I might have increased the cluster size; but there was going to be no potential for fragmentation, either, so I doubted a larger cluster size would help performance significantly. So I left that setting at Default. The resulting VeraCrypt container measured 23.2GB (25,013,780,480 bytes).
In CDBurnerXP, I dragged the VeraCrypt container file to the staging area. (Note, again, that big files might require the Disc > Change File System setting to be UDF.) With the container in the staging area, the banner bar at the bottom said we had 3.22MB to spare. I decided it would be OK to allow that little bit for slop, in case not all blank BD-R discs allowed exactly the same amount of space. (As it turned out, all of the discs I used did report exactly 3.22MB to spare.) At this point, I had found the CDBurnerXP banner bar was too useful to abandon, so I decided to use CDBurnerXP rather than ImgBurn to do the actual disc burning.
Filling the Container: Archive File Size
Now that I had my VeraCrypt container, I could think more precisely about the sizes of the compressed WinRAR archives that I would be sticking into it. My uncompressed files were a random lot, in terms of size: one folder might contain files totaling only 5MB, while another might contain 50GB. There could be hundreds of folders, so I wouldn’t want to go through and compress each of them individually. Even if I had the time and energy for that, doing so might give me a bunch of incompatibly sized smaller archives. For example, I might have ARC1.RAR (16GB), ARC2.RAR (15GB), and ARC3.RAR (12GB). With my 23GB VeraCrypt container, it would take 75GB worth of BD-R discs to hold those 43GB worth of files.
Split (a/k/a multivolume) archives would provide a size solution. That is, I could tell WinRAR to compress the entire set of files (or some subset) into a single RAR file, splitting it into 23GB segments. Then each segment would almost perfectly fill the VeraCrypt container, resulting in minimal waste of BD-R discs. I didn’t like splitting because, without WinRAR’s recovery volume option, a defect in one volume could leave the remaining volumes in the series unreadable. It would also be impossible to tell, just by looking at the label, which disc might hold a particular set of files. I didn’t work through the question of exactly how one would find a specific file within a multivolume archive, in this combined WinRAR/VeraCrypt arrangement.
To avoid the minor risks and possible complications of split archives, I decided to set WinRAR’s default profile, for this project, so that it would create no RAR archives larger than the capacity of the VeraCrypt container. In many cases, my WinRAR archives would be significantly smaller than the container’s capacity. I would try to combine these archives into the container in such a way as to fill it. There would usually be some empty space left over. I decided to fill that space with MP3 files. I had a bunch of them, in widely varying sizes, and in practice I found that these could be combined to fill the discs very nicely. I didn’t need to compress them, as I had found that MP3s were not very compressible. The only drawbacks were that (1) the folder containing these MP3s, in the VeraCrypt container, would not be in its proper location, within my hard disk drive’s (HDD) directory tree, if I did restore the full contents of these BD-Rs to HDD, and (2) I could not know in advance which BD-R might contain these MP3s. So if I wanted to restore one of these MP3s, in the worst case I would have to hunt through the entire set of BD-Rs to find it.
To avoid posing risks to my files, and to clarify what remained to be done in this project, I did not work directly with my original files. Instead, I copied those files to a separate partition. I put those copies into the same locations, at the same levels, as the originals. For example, there would be a copy of D:\MP3s\Song.mp3 in E:\MP3s\Song.mp3, not in E:\Folder 1\MP3s\Song.mp3. I wanted the copied set to be exactly the same as the original set (except for the different drive letter) because I was going to tell WinRAR to store the full pathnames along with the archived files. In previous experience, using relative rather than full pathnames had produced a mess, when I restored files from WinRAR archives.
If there were files and folders that I did not want to burn onto BD-R, now was the time to delete them from the drive containing the files to be backed up. Once that was done, I could produce a list of the files being archived. To do that, I opened a command window (best done by using Ultimate Windows Tweaker > Additional Tweaks > Show “Open Command Window Here” and then right-clicking on drive E and choosing the Open Command Window Here option). In the command window, I entered this command: “DIR /A-D /S /B > D:\dirlist.txt” (without the quotation marks). I gave dirlist.txt a more informative name (“List of Files Burned to BD-R”), and planned to put a copy of it into the VeraCrypt container before each disc burn. As it turned out, at this point I overlooked a set of files that I did not want to back up on BD-R. I made a note of those in another text file, and put that onto the last disc in the series.
Next, I set WinRAR to delete files after compression. This would prevent unwanted duplicates: it would be impossible to archive the same file twice. So as I went through the process of creating WinRAR archives, the size of the copied set would get smaller and smaller, until there was nothing left. Meanwhile, the set of RAR files would get larger and larger, though I would be simultaneously burning BD-Rs to get rid of those.
To identify folders that were of a good size for compression, I used TreeSize. This program was very useful. It functioned much like Windows Explorer. I could right-click on the desired folder and choose the WinRAR compression option, right there within TreeSize. When WinRAR finished compressing the selected folder, and deleted the files that had been compressed, the TreeSize display would be updated, so that I could easily choose the next folder to compress.
To avoid creating split archives, I didn’t try to compress folders that were much larger than 23GB, unless I knew they contained a type of files that would compress well. It was hard to know what compression ration I was getting. I didn’t bother writing down the stats for various folders, since their contents varied quite a bit. WinRAR wasn’t ideal for this: its compression dialog would vanish as soon as the assigned task was completed. But in my observation, the compression percentages tended to be in the ranges mentioned in the preceding post — that is, typically between about 82% and 99%, depending on the types and combinations of files being compressed. At the settings specified above, with other tasks sometimes underway on the computer, it could take an hour or more to produce one 23GB RAR file from a folder containing a large number of files.
Along with the larger RAR files I was creating, in the neighborhood of 20-23GB, I also tried to compress a number of smaller folders. That gave me a good pool of variously sized RAR files that I could use to fill the VeraCrypt container. So if a large folder was compressed into an 18GB RAR file, I might put that into the container along with a 5GB RAR file to make a total of 23GB.
I had already seen that the VeraCrypt container occupied 23.2GB of space on a hard disk or BD-R disc. It would occupy that same amount of space, regardless of whether it was full or empty, regardless of efforts to compress it. What I needed to know now was, how much will that container hold?
To answer that, I mounted the container in VeraCrypt. (To mount it, I selected an unused drive letter (Q:) > Select File button > navigate to the container file > Mount.) So now I had drive Q, and according to Windows Explorer, that drive had 24,918,663,168 bytes (23.2GB) of free space, after Windows 7 created the inevitable $RECYCLE.BIN on drive Q. (To temporarily prevent anything else from going into the Recycle Bin, I tried the advice of going to the Windows desktop > right-click on Recycle Bin > Properties > select “Don’t move files to the Recycle Bin. Remove files immediately when deleted.” The Recycle Bin would still appear in my BD-R, but that was supposed to keep it empty. Unfortunately, it didn’t work. There were also more technical and drastic alternatives.)
The difference between the number just cited and the 25,013,780,480 bytes occupied by the container (above) was 95,117,312. Windows Explorer > Properties provided a sort of explanation: it said that 90.4MB of the disc was Used Space. In other words, the available space in the VeraCrypt container appeared to be about 90MB less than the size that I had specified when creating the container. The available space dropped a bit further when I added the “List of Files Burned to BD-R” file to the container. In the end, after further tinkering, I wound up setting WinRAR to split archives if they were larger than 24,850,000,000 bytes. An archive that large would still fit into the VeraCrypt container, with just a few megabytes to spare.
By this point, I felt that this project called for several adjustments to the general-purpose WinRAR settings discussed in the other post. To accommodate these changes, I created a new profile in WinRAR, and made it (for now) the default. In this profile, I checked the box for solid archives, so as to realize a supposedly significant improvement in compression. This move did not seem very risky, given the relatively sheltered nature of these discs. Also, since I was not using a recovery record, it seemed that compression could make a small but cumulative difference in filetypes that I would not bother compressing in general usage, so I revised the WinRAR list of filetypes to exclude from compression. I turned off the options to use WinRAR’s recovery feature and archive testing. In retrospect, I did have time for the latter, and probably should have left it on, even though it had never caught a bad archive. As noted above, I set this default profile to store full file paths. I did not set a password.
With that done, I was able to begin compressing folders. For each folder or set of folders being compressed, I gave WinRAR brief but descriptive names, incorporating not only the name of the individual folder but also some information on its parent folders (so that, for example, I would not wind up with ten archives named “Photos.rar.”) I wrote those descriptive names on the labels of the BD-R discs. Experience suggested that this would usually be enough to help me find a particular file. If not, each disc would have that file containing the full list of files being archived. So even if things had changed and I no longer remembered where that file had been located at the time when the disc was burned, that list would help me find it.</a.
Burning the BD-R Discs
At this point, the process became fairly straightforward. As noted above, I just had to mount the VeraCrypt partition, fill it with RAR files to the extent possible, fill the small bits of remaining space with MP3s, unmount it, burn it to BD-R, mount it, empty it out, and repeat the cycle.
Note that this was not the same as filling the the VeraCrypt container and then burning the contents of that container. If I burned the unmounted container, it would be encrypted; anyone wanting access to it would have to know its password to mount it. By contrast, if I burned the contents of the mounted container, the result would be the same as if I had burned the contents of any other disk drive: the BD-R would contain those contents without VeraCrypt encryption. To make sure I was doing this right, every now and then I would look at the contents of a BD-R disc that I had just burned, make sure that I could not get at those contents without mounting their container in VeraCrypt, and make sure that, once mounted, the files in that container worked properly.
To burn the discs in CDBurnerXP, I put an empty BD-R disc in the drive and dragged the VeraCrypt container to the staging area in CDBurnerXP. When I did that, the status bar told me I had only 3.22MB left. I clicked the Burn button in CDBurnerXP. It gave me some options. The label on my BD-R disc container said 6x, so I changed the speed from the default 8x and, since I had time and was using cheap BD-Rs, I selected the “Verify data” option. CDBurnerXP tended to remember these settings from one disc to the next, but not necessarily from one session to the next. In most cases, it took 27 minutes to burn and verify a disc, and 16:30 without verification. But in a few cases — due, perhaps, to other demands on the system — it took more than an hour. I encountered no data errors, coasters, or other problems with these discs.
Speaking of time requirements, it could take up to 40 minutes to move a 23GB RAR file into the VeraCrypt container, as VeraCrypt required some time to encrypt it. This delay would have been much longer if I had chosen one of VeraCrypt’s more esoteric encryption options. If I had been creating precisely sized split archives, I think I could have eliminated this delay by setting WinRAR to create the archives directly in the VeraCrypt container, and to pause after creating each volume, so that I could unmount, burn, and remount the container.
I had previously used a Sharpie pen to write on BD-R labels, and that hadn’t seemed to cause any data loss. I noticed that the Sharpie ink did spread out a bit over a period of several years, but the labels were still legible. I didn’t have any markers especially designed for discs, and I wondered if the oil from a ballpoint pen, or the graphite from a pencil lead, would get onto the underside of the disc stacked on top of this one, and maybe interfere with data reading. So I went with the Sharpie again.
After burning the first BD-R, I wanted to empty out the VeraCrypt container and get it ready for the next RAR file. But when I tried to remount it as drive Q, I got an error:
WARNING: The host file/device is already in use!
Ignoring this can cause undesired system results including system instability. All applications that might be using the host file/device (for example, antivirus or backup applications) should be closed before mounting the volume.
I told it to mount anyway, but it failed with another error: “Error: Cannot mount volume. The host file/device is already in use. Attempt to mount without exclusive access failed as well.” Selecting a different drive letter did not help. A search led to the suggestion to go into Device Manager and set the mounted drive to Offline and then back to Online, but the VeraCrypt container was not mounted. Other suggestions involved rebooting (I had a bunch of things running at the moment, and didn’t want to do that) or killing Windows Explorer. It turned out the culprit was CDBurnerXP. I thought I had killed it, but it turned out to be still alive. Simply put, I could have the container mounted in VeraCrypt, or I could be burning it in CDBurnerXP, but I could not let both of those programs put their hands on the container at the same time.