Tag Archives: xxcopy

Efficient Hourly Data Drive Mirroring with Robocopy

I had a bunch of data files on a Windows 10 desktop computer. I wanted to make hourly backups. The question was, how should I proceed? For reasons described here, I chose to use a Robocopy script, shown below.

Contents

Background and Setup
Choosing Mirroring Software
The Robocopy Script
Making Robocopy Faster
Benchmarks

.

Background and Setup

Instead of a mirroring program like Robocopy, one alternative was to use something like Acronis or Macrium to supplement a full drive backup with differential or incremental backups on a schedule. I decided against that approach. I didn’t want to rely on third-party software that might switch from being free to being paid (as had happened with Macrium some years earlier, before they switched back to offering a free version), or that might switch from being good to being semi-fubar, as had happened with Acronis. Moreover, as I recalled, these programs would put all backed-up files into a single compressed file, one per session. So if I wanted to know whether there was an earlier backup of MyFile.doc, I couldn’t use my preferred Everything file finder to immediately locate and view all copies; I would have to go rooting through those Acronis or Macrium backup files.

Note: combining the backup into a single compressed file could be much faster than file-by-file copying if the source folders contained large numbers of files (e.g., 10K+ per folder). A relatively complex solution would create a ZIP in the source drive, move that to the target drive, and then unzip it there (see also SuperUser).

My preferred solution was to have a separate backup folder for every hour of the day, numbered from 00 (i.e., midnight) through 23 (i.e., 11 PM). Each such folder would contain a full copy of all files and folders on my DATA (D) drive. The DATA drive contained materials that were in active use, at least in the sense that I would intermittently rename, rearrange, edit, delete, or otherwise tinker with them. That was different from my COLDSTORAGE (E) drive. I might add new archival materials to E, but those would be adequately captured in my regular (cautious) backup scheme, which ran every few days.

So the present plan was to supplement my regular backup scheme with hourly mirroring of files and folders that had some chance of being actively used. The working capacity of my computer’s spare 4TB internal hard disk drive (HDD) was about 3.6TB. So as long as the contents of the DATA drive didn’t exceed ~150GB, I would be able to fit 24 copies of it onto that 4TB MIRROR (M) drive.

When I say I wanted to mirror the DATA drive on the MIRROR drive, I mean that, if a file existed in a specific folder on the DATA drive, an exact copy should exist in the corresponding folder on the MIRROR drive; and if a file didn’t exist in a folder on DATA, there shouldn’t be a copy of it in the corresponding folder on BACKUP. After running the Robocopy script, I used Beyond Compare for spot checks to verify that this seemed to be happening as desired.

Later, it occurred to me that I could also mirror the COLDSTORAGE files that had changed. Since few files on COLDSTORAGE would change, from one hour to the next, this would not require much additional drive space. The relevant command would not compare the contents of files on the source drive (E) against those on the target drive (M), and would therefore take much less time than a Robocopy mirroring command. With an additional command to remove archived files more than two days old, this addition seemed likely to take very little additional disk space.

In addition to the space constraint, there was also a speed constraint. The computer would have to be able to make a copy of the changed contents within an hour, before the next hour’s mirroring began. For example, the 10 AM session would have to finish mirroring the changed contents of DATA to the 10 folder on MIRROR before 11 AM, when a new mirroring session would begin to update the 11 folder on MIRROR. I didn’t nail down a precise value for this constraint. It would depend in any case on whether I was copying many small files or a few large files. A bit of rough experimentation suggested a ceiling of ~100GB per hour. (Later, I explored ways of speeding up Robocopy. See below.)

That (100GB) was a bit below the size of the set of files and folders that I had decided to put into the DATA drive. So, to avoid piling up two or more running script iterations, all competing with each other for access to the DATA and MIRROR drives,  I decided not to use my Robocopy script for the first-time filling of these folders, which involved making 24 full copies of the DATA drive. Instead, I made an altered copy of the Robocopy script, designed to run even-numbered backups, one after the other. That is, it started by copying DATA to the 00 folder; when it was done with that, it copied DATA to the 02 folder; then the 04 folder; and so on. Once that was done, I could proceed with running the hourly Robocopy script. On the first day, it would be working hard to fill the odd-numbered folders (01, 03, 05 …), but it would be able to coast during the even-numbered hours, since the corresponding folders were already filled and not much had changed on DATA since then.

Choosing Mirroring Software

Once I had my disks and folders sorted out, I needed to choose the software that would do the mirroring. As already indicated, I did want mirroring — that is, a more or less exact copy of the DATA drive in each of the hourly folders on the MIRROR drive; I did not want a tool that would combine all changed files into a single archival file. The objective now was to choose software to update mirror folders each hour automatically.

GoodSync, and programs like it, offered one possible solution. Presumably most such programs would be capable of remembering a one-way sync task, so that I would just have to open the program and run the task, or might even be able to run it (or perhaps 24 versions of it) on a schedule, so as to mirror DATA to the correct numbered mirror folder. I had used GoodSync with generally good results some years earlier. But GoodSync was looking a bit rich for my blood at this point. Beyond Compare was another paid option that I had bought years earlier and continued to use in my cautious backup system (above).

FreeFileSync seemed to be among the more popular free synchronizing alternatives recommended by various sources: 4.4 stars from 4,715 raters at Softpedia; strongly preferred (by 461 votes) over Robocopy (34 votes) at AlternativeTo. (See also SyncFolders, with 4.1 stars from 105 raters on Softpedia but only 21 votes at AlternativeTo.) FreeFileSync had a simpler set of command line options. For my purposes, they appeared to offer less control.

Another approach would be to set up a file copying program. For a straight-across file copying operation, from my DATA drive to these 24 mirroring folders, it seemed there were many tools capable of improving greatly on the Windows COPY command, or on a manual copy-and-paste instruction. Various sources (e.g., Beebom, Sysprobs) seemed to say that TeraCopy was one of the best. Some older posts preferred Bvckup (4.5 stars from 98 raters at Softpedia). Unfortunately, Bvckup was no longer free. It appeared that the version of Bvckup 2 that would compete with Robocopy’s features would cost $50 per workstation. Those who did pay the price had very positive words for Bvckup.

I was reluctant to invest time in yet another proprietary program that might be pricey, that might function as something of a black box requiring a time-consuming investment in its own quirky syntax; or that might give me unpleasant surprises down the line (e.g., SyncToy: see DPReview; GroovyPost; Overclockers). It seemed that the best solution, for me, would involve learning how to write a command that would automate the process as desired. I wasn’t afraid to invest a little time in learning the syntax. Past efforts of that nature (e.g., learning Windows batch scripting) had paid off well, for a long time.

In my earlier inquiry, Robocopy appeared to offer the best automated solution that didn’t involve black-box or proprietary software (see related tools at Softpedia). That still seemed to be the case. For example, TeraCopy and Robocopy were the GUI and command-line options considered in a recent article in MakeUseOf; and the dominant advice in a SuperUser discussion favored Robocopy over XXCopy (see also SaasHub). The best command-line alternative to Robocopy may have been to run the Linux rsync tool (231 votes at AlternativeTo; see also AlternativeMe) on Windows. I did not explore the recommended methods for doing that. A Reddit discussion investigated relative speed advantages of Robocopy and rsync.

I would want a command that I could run from a batch file. I had already figured out how to create a batch file (HOURLY.BAT) that would run various commands every hour. To run my Robocopy mirroring script, I would just need to add one line to HOURLY.BAT.

The Robocopy Script

With those thoughts and preparations in place, I was ready to set up and run my Robocopy script. To do so, I opened Notepad, entered the following lines, and saved the file as RobocopyMirrorHourly.bat. The file was long, but most of its contents were just explanatory notes:

:: RobocopyMirrorHourly.bat

:: ***** General Remarks *****RobocopyMirrorHourly.bat
:: This script performs hourly mirror backups of DATA drive (D) to target folders on MIRROR drive (M)
:: The target (TARG) folders are named for 24 clock hours (e.g., the 14 folder contains the 2 PM backup)
:: Additional feature: this script also backs up newly changed files on COLDSTORAGE (drive E)
:: Requires folders named M:\MDATA\00 through 23 and M:\MCOLD\00 through 23

:: ***** Identify Target Folder on MIRROR Drive *****
:: Get first two characters in the 24-hour TIME variable
set hour=%time:~0,2%
:: If it's before 10 AM, the hour will be a single digit, so pad it with a leading zero
if %hour% LSS 10 set hour=0%hour:~1,1%

:: ***** Run Robocopy to Mirror DATA *****
:: The ATTRIB command undoes a Robocopy bug that converts the target folder into a hidden system directory
::      See https://blog.coeo.com/how-to-prevent-robocopy-from-hiding-your-files-and-how-to-fix-it-when-it-does
::      That article's suggested Robocopy flag of /A-:SH didn't work here
:: Set the TARG variable to identify that numbered folder, in a form that Robocopy can use
set targ="M:\\MDATA\\%hour%\\"
robocopy D:\ %targ% /MIR /SEC /SECFIX /R:2 /W:5 /REG /XD $RECYCLE.BIN "System Volume Information"
attrib -s -h M:\MDATA\%hour%

:: ***** Run Robocopy to Copy Changed Files on COLDSTORAGE ***** 
:: Mere folder moves don't change archive bit, thus they don't increase the quantity of files copied
:: Before running this, turn off archive bit for all files on COLDSTORAGE (drive E) so you don't make a full copy:
:: To do that, switch to E: and use attrib -a /s
:: ForFiles command deletes files in target that are older than 2 days - regular backups already have those
:: Then use Robocopy MOVE to delete the target's empty folders. See https://stackoverflow.com/a/30138960/711879
set targ=M:\MCOLD\%hour%\
ForFiles /p %targ% /s /d -3 /c "cmd /c del @file /q"
set targ="M:\\MCOLD\\%hour%\\"
robocopy %targ% %targ% /S /MOVE
robocopy E:\ %targ% /S /M /R:2 /W:5 /REG /XD $RECYCLE.BIN "System Volume Information" 
attrib -s -h M:\MCOLD\%hour%

:: ***** Robocopy Syntax *****
:: MIR         mirror
:: S           copy non-empty subdirectories
:: MOVE        moves files and directories, deleting them from source after copying 
:: M           copy files with Archive attribute set, then reset the attribute
::             Note: using that may confuse other backup software that depends on Archive bit
:: SEC, SECFIX include file and folder security permissions in copy
:: R           retries (default is 1 million)
:: W           wait time between retries (in seconds; default is 30)
:: REG         set R & W as default values in registry
:: XD          exclude these directories, if they exist
:: Generally   see https://www.computerhope.com/robocopy.htm

:: ***** Not Using Now *****
:: /LOG:D:\Test.log    direct output to log file Test.log
:: NFL, NDL    don't list files or directories being checked
:: COPYALL     copy all file information (e.g., security, attributes)
:: ZB          not recommended: see https://stackoverflow.com/questions/20982968/what-is-robocopys-restartable-option
:: FP          include full pathname in output
:: V           produce verbose report
:: IPG         slows Robocopy to prevent it from hogging bandwidth - e.g., /IPG:750 makes Robocopy very slow
::             see https://superuser.com/questions/1614053/reducing-command-priority-robocopy-slows-system
:: MT          multithreading - see https://raywoodcockslatest.wordpress.com/2020/12/11/hourly-backup/#RF
::             Note: /MT:32 gave Robocopy control, slowed other processes so that I wanted to use /IPG
::             Using MT with IPG returns an error: The /IPG option cannot be used with the /MT option.

The earlier post provided some notes that were useful at this point. First, as indicated there, I used NirCmd to run that Robocopy script in a hidden window, so that I wouldn’t be interrupted by the abrupt, hourly appearance of a big, clunky command window. To make NirCmd work, I put a copy of nircmd.exe in C:\Windows, and then added this line to HOURLY.BAT to run the Robocopy script each hour:

nircmd exec hide "C:\Batch File Storage Folder\RobocopyMirrorHourly.bat"

That worked. After the first day or two of filling the 24 hourly folders and checking with Beyond Compare, it looked like Robocopy would require as little as just a few minutes, each hour, to verify that the current hour’s folder on MIRROR captured the current state of files and folders on the DATA drive.

Making Robocopy Faster

Before getting too far into the file copying process, I experimented with improving copying speed by compressing the MIRROR drive.  The concept was that, if you have a strong CPU, you can use it to make things easier for the HDD. For instance, the CPU might squeeze a 100MB file down to 70MB, saving the HDD some of the space and also some of the time needed to copy the file. To enable compression, I could go into Windows File Explorer > right-click on the drive > Properties > General tab > Compress this drive to save disk space. This would be best done before filling the drive. That way, files would be compressed automatically when they were added to the 24 folders.

I was using VeraCrypt to encrypt drive contents. There didn’t seem to be any problem with using that built-in Windows file compression on a VeraCrypt drive; it was just a question of whether that would speed things up. In my experiment with ~800GB of files of mixed type and size, compression saved only about 1% of space. It appeared that VeraCrypt already incorporated a compression algorithm that was nearly as good as the one built into Windows. For me, then, there seemed to be little benefit from making the CPU do all that extra work to compress files on a VeraCrypt drive.

Later, with further reading, it appeared that Robocopy results could vary greatly with different command-line settings. A SuperUser answer provided an outline of the possibilities. With the aid of additional reading, I adopted several of those suggestions, as indicated in the Robocopy script (above). Others seemed unlikely to improve the situation. In particular, my files were of mixed sizes, so the /J option appeared unhelpful (see SuperUser), and another SuperUser answer said there were very few reasons to use the /NOOFFLOAD option. Although the documentation said /LOG would improve speed, I doubted it would make much difference when, as in this case, Robocopy would mostly be changing very few files, especially since the script would be running in a hidden window displaying nothing onscreen. If I needed a logfile for some reason, I could enable it, but otherwise I didn’t want that extra file lying around.

That left /R, /W, /REG, and /MT as the settings most likely to affect Robocopy’s speed. I set the first three as advised on various sites. The /MT flag was a little more complicated. MT was short for multithreaded copying. According to PureInfoTech,

Typically, when you copy files using File Explorer, you’re only copying one file at a time, but with multithreaded enabled, you can copy multiple files at the same time better utilizing the bandwidth and significantly speeding up the process.

If you don’t set a number when using the /MT switch, then the default number will be 8, which means that Robocopy will try to copy eight files at the same time, but the tool supports 1 to 128 threads.

In the command shown in this guide, we’re using 16, but you can set it to a higher number. The only caveat is that the greater the number, the more processing power and bandwidth will be utilized. If you have an older processor and an unreliable network connection, it could cause issues, as such make sure to test the command before executing the command with a high number of threads.

Whatsabyte seemed to say that a CPU would have two threads per core. The Speccy utility (4.6 stars from 142 raters at Softpedia) confirmed that my quad-core CPU had eight threads. How-To Geek seemed to say that two logical threads per physical core was the concept of hyperthreading, introduced by Intel in 2002. Wikipedia said that, starting in 2017, Intel began to move away from hyperthreading in some CPUs for security reasons.

A Tom’s Hardware post clarified that CPUs have cores that handle threads: a thread is a single software process, not a part of the CPU. That same post suggested that a CPU core can typically handle between 1.1 and 1.6 threads, depending on the type of task — that, for instance, a six-core Ryzen CPU, which apparently did not have hyperthreading, would handle between 6.6 and 9.6 threads (see also Microsoft). The ensuing debate (see also Quora) seemed to indicate that the types of software best designed to utilize so many threads would include some games, video encoding, data compression and encryption, and computer-aided design. Another Quora answer explained that a physical core would handle more than one thread when the first thread was blocked (e.g., waiting for RAM to become available): the core would go to work on another thread. Apparently having both threads blocked at the same time was too rare to justify the CPU overhead of running a third thread on that core. According to another answer,

To change threads, you normally have to empty the registers into the cache, write that back to the main memory, then load up the cache with the new values and load up the registers. Context switches hurt performance significantly.

ExtremeTech emphasized that the operating system, not the hardware, would allocate threads to cores. A StackOverflow discussion offered numerous additional insights. A TenForums comment suggested using taskmgr (i.e., Windows Task Manager) > Details tab > right-click on a column heading > Select columns > check the Threads box > OK. This revealed that some programs (especially Chrome and antivirus) had literally dozens of threads running. On the other hand, even a relatively complex file comparison tool like Beyond Compare or DoubleKiller had only two threads.

Benchmarks

Several sources proposed that the optimal number of threads for Robocopy would be determined by the speed of the CPU. Several said the best answer for a specific application would be provided by testing. In that spirit, using the same source and target drives (i.e., DATA and MIRROR), I ran my standard Robocopy command (i.e., the command shown in the script, above), with two variations: (1) I tried different MT settings, and (2) I designated a single source folder rather than the entire D (DATA) drive. I tried those settings on a source folder containing two large files (i.e., a 4.3GB Acronis (.tib) drive image file and a 4.1GB (.m2t) video file). Then I ran it again, with those same settings, this time with a source folder containing many small files (i.e., 10,789 .wav files comprising 3.8GB). Note: none of these files were in subfolders: all were in the designated source folder.

The set of many small files seemed to challenge the system more than the set of two large files. I watched the progress in the target folder using Windows File Explorer. I didn’t scroll, but I could see the total number of files rising, indicated at the lower left corner of the window. As values exceeded MT:32, Windows seemed to be struggling more to keep tabs on the flood of files: the File Explorer progress bar refreshed itself more often. I also noticed that the process was faster, regardless of the MT value, for the first half of the file list. That is, average throughput rates would apparently have been higher if I had copied significantly fewer than 10,789 files — or, perhaps, if some of those files had been in subfolders.

My results were as follows:

Within my testing, MT value had no effect on the copy speed for the two large files. At the other extreme, MT value had a notable effect on copy speed for the many small files. At MT = 32, there was a 53% improvement over the default value of MT = 8. (The table lists Median times because, in a few cases — and especially at the optimal value of 32 — the first result seemed anomalous, so I ran the test once or twice more at the same setting and used the median value.)

Since there appeared to be no penalty for using a higher MT value on large files, it made sense to set MT to 32. Thus, I originally set MT = 32 in the Robocopy script (above). Note again that having smaller numbers of files in a folder could yield even more dramatic improvements. That may be why Fosketts reported a significantly higher transfer rate for a mixed set of files with MT = 32. Possibly both of the cases tested here (i.e., many small files and a few large files) were worst-case tests of Robocopy, though for different reasons. Since this test was based on Robocopy, I could not say whether other programs allowing specification of threads would yield similar results.

Later, I found that MT = 32 slowed other processes to the point that Robocopy was hogging system resources. At first, I didn’t realize that MT was the culprit: I tried using various IPG settings (e.g., /IPG:750) to slow Robocopy down. Eventually I found that I didn’t need IPG and didn’t have system slowing if I didn’t use MT (or at least not /MT:32). See the notes in the Robocopy script (above).

Using XXCOPY to Copy Files from Multiple Subfolders to One Folder

I was using Windows 7. I had files in multiple subfolders within D:\Folder\Subfolder A. I wanted to copy all *.wav and *.mp3 files from those subdirectories to the D:\Target directory. This was apparently called directory (or folder) “flattening.”

After some flailing around, with looks at COPY and XCOPY, I decided to try XXCOPY (rated four stars at Softpedia). This command-line tool, free for personal use, offered many command-line options.

It had been a couple of years since I had even looked at XXCOPY, and possibly the first time I had used it since the 1990s. At the time of writing this post, I was not sure what additional options I might ideally want or need for this task. I hoped to get into this job and get out quick, so I tried the simplest combination I could manage. I may add to this post later, if I use XXCOPY again in the next 20 years.

For my immediate task, I decided to combine the necessary commands in a batch file. That way, I could save the file and not have to research this matter again, at least not for this particular task. The batch file I built (with guidance from Ken at XXCOPY located after 1 2 searches) was as follows:

echo off
xxcopy "D:\Folder\Subfolder A*.wav" D:\Target\ /SGO >  "XXCOPY Error Log.txt"
xxcopy "D:\Folder\Subfolder A*.mp3" D:\Target\ /SGO >> "XXCOPY Error Log.txt"
cls
echo.
echo.
echo Search XXCOPY Error Log.txt for "Copy Failed" messages.
echo I believe those result when the path is too long.
echo.
echo.
pause
exit

To create that batch file, I put that code into a file called “Run This to Copy MP3s.bat” and double-clicked on it in Windows Explorer. I could also have typed the file name on the command line (with quotation marks, because the name contained spaces).

At this point, I did not verify that the copy got everything; in fact, as noted in the remarks (above), some attempts produced “Copy Failed” errors. It appeared the solution to that problem would be to shorten the paths of the files that failed to be copied. Perhaps in some later attempt I will need to verify that everything copies correctly, and may update this post accordingly.

Incidentally, my next step was to convert the copied WAV files to MP3 for copying into an MP3 player. Based on previous experience and a review of the help file, it appeared that Boxoft WAV to MP3 Converter would be fine for that purpose, now that I had copied all of the WAVs into a single folder. It also appeared that this Boxoft product would do individualized conversion (involving e.g., files in different folders) if I wanted to work up individual command lines (using e.g., the Excel approach described in that previous post). But the Boxoft command line did not permit wildcards — along the lines of “convert *.WAV to *.MP3” — so I could not include a command to that effect in the batch file shown above.