Bulk Testing and Printing Large Numbers of PDFs (Raw Version – Batch Tutorial)

Note: a later post provides a short version of key points from this post.


I looked for ways to test large numbers of PDFs in Windows 7. I found three that worked: CorruptedPDFinder; the File > Create PDF > Batch Create Multiple Files procedure within Adobe Acrobat; and a command-line PDF-printing approach using Adobe Reader. For more information on that command-line option, see the Recap at the end of this post.

Most of this post explores various PDF-printing approaches, leading up to the choice of that Adobe Reader approach as the best of the lot. While I found those PDF-printing approaches to be far slower than the CorruptedPDFinder and Acrobat approaches, they had an advantage of thoroughness. It seemed that readable but partly corrupted PDFs would be especially susceptible of detection when every page of every PDF was being run through a printer, even if the output was purely digital.

Some portions of this post explore non-printing alternatives, both free and commercial. No doubt there are many other printing and non-printing approaches than are explored here. That said, it appeared that the non-printing alternatives discussed below were at least representative if not dominant among the possible approaches available to most users. In both the printing and non-printing approaches, the following text raises many thoughts and points out many technicalities that may be of interest to others who seek good PDF-testing tools.


I had a large number of PDF files scattered around my drive. I wanted to see if any of them were corrupt. Several years earlier, I had posted a writeup of similar efforts; now I wanted to update that and see if there were any relevant new tools or techniques. (I was using Windows 7, though other Windows and in some cases non-Windows operating systems would also be able to use techniques discussed here.) I tried these approaches on small and large sets of PDFs from various sources. The largest set tested here consisted of about 55,000 random PDFs.

The following notes may be haphazard in places. This testing proceeded in fits and starts over a period of months; thus, some sections or comments may be inconsistent with others. Even so, portions of this post may be useful for those seeking a tutorial in batch programming, as many steps are explained in detail. Another post provides a partial simplification of certain key points discussed below.


Testing with CorruptedPDFinder
PDFsam and Sejda Merging Programs
Testing PDFs in Adobe Acrobat
Other Batch Printing Approaches
The PRINT Command
Bullzip and bioPDF Approaches
Microsoft XPS Document Writer
Sumatra PDF
Another Search for Possibilities
Alternatives to Acrobat
Using One PRINT Command per PDF
Other One-Command-per-File Approaches
Revised Adobe Reader Approaches
Comparing PDF Page Counts


Testing with CorruptedPDFinder

My first 1 2 3 4 searches led to a program called CorruptedPDFinder, available on Softpedia under the awkward name of Recursive Finder of Corrupted PDF Files. I downloaded and ran that program. It quickly scanned a designated folder, and its subfolders, and detected one problematic PDF out of a total of 221. It allowed me to open that PDF — or, I should say, to attempt to open it — by double-clicking on that entry in the list. When I did that, Adobe Acrobat (my PDF reader) gave me an error message:

There was an error opening this document. The file is damaged and could not be repaired.

I would have liked right-click options to delete individual files, and to output the list of files to a text file, but at least CorruptedPDFinder did offer options to delete the corrupted files or to move them to another folder.

Altogether, from the whole set of 55,000 PDFs, CorruptedPDFinder identified a total of 34 corrupt files. All 34 were indeed corrupt. In a few cases, I got the error message just shown, when I tried to open them in Acrobat; but for most, the error message was this:

Acrobat could not open [filename] because it is either not a supported file type or because the file has been damaged.

Finding these corrupted files led to a separate pursuit, described in another post, in which I explored various PDF repair options.

PDFsam and Sejda Merging Programs

I wanted to see if other programs would find corrupted PDFs that CorruptedPDFinder had overlooked. To test that, I decided to try using a different kind of tool. This tool was PDF Sam (short for split-and-merge). PDFsam was not a PDF tester, but I thought it might identify PDFs that were too corrupt to merge. It had been a while since I had used PDFsam.

I downloaded and installed PDF Sam 2.2.4. It came with Soda PDF. I was not paying attention; during the sneaky installation process, I clicked on the wrong thing and found myself saddled with Soda. It turned out that Soda was trialware. Nice to know it existed, if I was looking for an alternative to Acrobat; but at the moment, at least, I was sticking with freeware. Unfortunately, Soda did not want me to uninstall it. I had to use Revo Uninstaller to get rid of it.

Anyway, I began with the PDFsam GUI. I went into the Merge/Extract plugin and clicked its Add button. It showed me the usual Windows Explorer type of dialog, where I could select individual files. That was not going to work, for files scattered among many folders. As in my previous use, I decided to switch to the command line approach. (To open command windows, I commonly used the Windows Explorer context menu option to “Open command window here,” made available through the “Additional Tweaks” menu in Ultimate Windows Tweaker.) Using the PDFsam command line had been a complicated experience, last time around. A search for documentation led to the PDFsam Wiki. It was not encouraging.

I wound up viewing a user’s question of whether it was possible to merge PDFs using PDFsam on the Windows command line. The answer said that Sejda was the engine behind PDFsam, and that Sejda had a command-line interface.

In Sejda, I looked at the documentation. It included a merge option, applicable to “a collection of PDF documents.” It was not clear whether that collection would have to be in one folder. It appeared unlikely that the merge option was going to search for all PDFs scattered around a drive. But possibly it would accept a file listing all PDFs.

So I downloaded and unzipped Sejda. As more or less advised in the instructions, I opened a command prompt in its bin folder and ran sejda-console.bat. That just listed Sejda’s available commands. To get more information on the merge command, as advised, I typed “sejda-console -h merge.” That gave me an earful:

Given a collection of pdf documents, creates a single output pdf document composed by the selected pages of each input document taken in the given order.

Example usage: sejda-console merge -f /tmp/file1.pdf /tmp/file2.pdf -o /tmp/output.pdf -s all:12-14:32,12-14,4,34-:

Usage: sejda-console merge options

[–addBlanks] : add a blank page after each merged document if the number of pages is odd (optional)

–bookmarks -b value : bookmarks merge policy. {discard, retain, one_entry_each_doc }. Default is ‘retain’ (optional)

[–compressed] : compress output file (optional)

[–copyFields] : input pdf documents contain forms (high memory usage) (optional)

[–directory -d value] : directory containing pdf files to merge. Files will be merged in alphabetical order. (optional)

[–files -f value…] : pdf files to operate on: a list of existing pdf

files (EX. -f /tmp/file1.pdf or -f/tmp/password_protected_file2.pdf:secret123)


[–filesListConfig -l value] : xml or csv file containing pdf files list to concat. If csv file in comma separated value format; if xml file <filelist><file value=”filepath” /></filelist> (optional)

[–help -h] : prints usage information. Can be used to detail options for a command ‘-h command’ (optional)

[–matchingRegEx -e value] : regular expression the file names have to match when the directory input is used (Ex -e “test(.*).pdf”). (optional)

–output -o value : output file (required)

[–overwrite] : overwrite existing output file (optional)

–pageSelection -s value : page selection script. You can set a subset of pages to merge as a colon separated list of page selections. Order of the pages is relevant. Accepted values: ‘all’ or ‘num1-num2’ or ‘num-‘ or ‘num1,num2-num3..’ (EX. -f /tmp/file1.pdf /tmp/file2.pdf -s all:all:), (EX. -f /tmp/file1.pdf/tmp/file2.pdf /tmp/file3.pdf -s all:12-14:32,12-14,4,34-:) to merge file1.pdf, pages 12,13,14 of file2.pdf and pages 32,12,13,14,4,34,35.. of file3.pdf. If -s is not set default behaviour is to merge document completely (optional)

–pdfVersion -v value : pdf version of the output document/s {2, 3, 4, 5, 6 or 7}. Default is 6. (optional)

So it did look like the –filesListConfig option would allow me to give Sejda a list of PDFs to combine. I would assemble that list using the DIR command and a spreadsheet, using techniques described in another post.

The next question was, do I have to actually create that merged PDF? Because an attempt to test several thousand PDFs by merging them into a single file would be very slow. It would also probably drag down system performance, and might eventually crash the program if not the machine. Unfortunately, there did not appear to be an option for a dry run, nor did the documentation seem to say that Sejda would make a first pass to check each PDF before trying to merge — which would be irritating, if you got two hours into a merge project and then it crashed because Sejda was only now discovering an uncooperative input file.

To test that, I worked up a Sejda command to combine two PDFs, and watched what happened. It took a while to figure out how to write the command, but this worked:

sejda-console merge -f D:\Folder\PDF1.pdf D:\Folder\PDF2.pdf -o D:\Folder\Merged.pdf

(Quotation marks might have been necessary if there were spaces in the file or folder names.)

That command produced Merged.pdf. When that command ran, it provided information about what it was doing. I noticed that it was creating a temporary buffer in C:\Users[username]\AppData\Local\Temp, where Ray was the username in my case. So apparently it would need adequate space on drive C for the PDFs being merged. I also noticed that the operation would fail — Merged.pdf would not be produced — if Merged.pdf already existed. So presumably I could put a small Merged.pdf in the output folder to defeat double-writing of a large output PDF — first to C:\Users, and then to the output folder.

I wasn’t sure how Sejda would react if I fed it a bad PDF. Ideally, it would produce information on that. I replaced PDF1.pdf (a real PDF file) with a text file that I created in Notepad and renamed as PDF1.pdf, and then I tried a variation on the command:

sejda-console merge -f D:\Folder\PDF1.pdf D:\Folder\PDF2.pdf -o D:\Folder\Merged.pdf > D:\Folder\SejdaOut.txt

That time, Sejda sent no information to the screen. Instead, SejdaOut.txt said approximately this:

> Configuring Sejda 1.0.0.M9

> Loading Sejda configuration form default sejda.xml

> Starting task execution . . .

> Created output temporary buffer C:\Users\Ray\AppData\Local\Temp\SejdaTmpBuffer243728859549573182.pdf

> Opening input I:\Current\PDF1.pdf

> Task execution failed. An error occurred opening the reader.

> Failure caused by: java.io.IOException: PDF header signature not found.

So it appeared that merging PDFs was not going to be a good way to test PDFs — at least not with Sejda (and therefore not with PDFsam), because Sejda was going to terminate as soon as it hit a bad PDF. In this case, it never got to PDF2.pdf. Also, it was creating its temporary buffer PDF early in the process. I was not sure whether it was immediately attempting to write the input PDFs into that temporary PDF. If so, that would make the process vastly slower and, in some configurations, might fill the C drive.

Not to say that a progammer couldn’t find a way to reconfigure this. I noticed, for instance, the reference (in the foregoing quote) to “Sejda configuration form default.” Perhaps the default could be changed, or another .xml guidesheet could be used. But I didn’t know how to do that, and didn’t want to invest the time to learn. For my purposes, it seemed time to look elsewhere.


In my previous writeup, I had gotten some mileage out of a file called printto.exe. This, it turned out, had been available from biopdf, which was apparently the source of the free Bullzip printer. My previous go-round had not ultimately been very successful, but I saw they had a new version of Printto, so I downloaded it.

To see the syntax, typing “printto /?” on the command line brought up confirmation that it hadn’t changed: printto filename [printer], where printer was optional. I went into my list of printers (see following paragraph) and made sure Bullzip PDF Printer was the default printer. (When I ran into some of the problems described below, I tried installing CutePDF Writer (with Ghostscript) and setting it as the default. But CutePDF seemed slow, and it did not seem to have an option not to ask the output filename. Nonetheless, there probably were other virtual PDF printers, free or commercial, that could have substituted for Bullzip. Another way of manipulating the output filename may have been to set the file properties, using something like Adobe Acrobat (perhaps in a batch mode), so that file printing would default to the filename previously specified in those properties.)

(For some reason, my Control Panel > Devices and Printers item was taking a long time to load, and when it did load, it was coming up blank. I was getting better results by going into Windows Explorer, looking into the left (Folders) pane, and selecting the All Control Panel Items > Printers item there. Future references to the Printers dialog in this post refer to that view.)

Next, I went to the Options icon in the Bullzip folder in my Start Menu and changed some of my default settings. The purpose of these changes was to enable Bullzip to print rapidly and automatically, without intervention by me. I didn’t care about the printed output; I just wanted Bullzip to try to print each PDF, and to produce error messages if it couldn’t print it. (I found that leaving the mouse pointer at a certain spot near the taskbar would automatically close Bullzip’s “Document Created” notices after each successful PDF print.) My Bullzip options changes:

  • General tab: Set File name to X:\TempPDF\<title>. All boxes unchecked on General tab.
  • Dialogs tab: don’t show dialogs. Suppress errors.
  • Document tab: screen quality.
  • Image tab: 150 x 150 resolution.
  • Actions tab: don’t open the document after creation.

Now all I needed was the printto command. There were two possible options. One was to use a wildcard to print all files within a folder. I tried “printto *.pdf.” That brought an error: “Invalid file name specified.” Obviously there wasn’t a “/s” option to print all PDFs in subfolders as well. Instead, it seemed, I would have to work up a batch file that would contain a specific command for each PDF file that I wanted to test — that is, every PDF on my drive, in any folder. Shortcomings in that approach moved me to develop a looping batch file instead. But that turned out to be more complex than expected, as described in another post. Moreover, in early testing the resulting batch file was very slow, and it did not capture useful identifying information for potentially flawed PDFs.

Testing PDFs in Adobe Acrobat

I got the idea to try PDFsam (above) because I had noticed a related feature in Adobe Acrobat ($100-200, and worth every penny, especially if a deal comes along or if cheaper alternatives don’t work). When I told Acrobat to merge PDFs, it would test those PDFs and notify me if any were corrupt and thus could not be properly merged. In theory, this approach would have the advantage of skipping the time- and space-consuming printing; I would get the corruption warning immediately, at the assembly stage.

Unfortunately, I ran into problems when experimenting with file-merging possibilities in Acrobat 9. For one thing, I was not successful in using Acrobat to search for PDFs among various folders. Instead, I would have to use Everything to search for all *.pdf files, select the ones I wanted to test, copied them to a single folder, and test the copies. I would have to search for the corresponding originals if Acrobat had a problem with any of those copies. In this approach, identically named files in different folders would overwrite one another when put into the same folder, so I would have to run a prior test for duplicate filenames with a duplicate detector like DoubleKiller.

There was another problem with this approach. I found that Acrobat would really bog down when trying to merge my big test set of 55,000 PDFs. I made several attempts. Each attempt took literally days: Acrobat slowed to a crawl after the first 20,000 to 30,000 PDFs, and ultimately crashed each time before processing the full set. The solution was to break those PDF copies out into five sets of around 11,000 PDFs each, and process them one at a time. Acrobat would whip through those subsets pretty rapidly.

Acrobat offered several ways to merge PDFs. One approach was to create a PDF portfolio. In Acrobat 9, I could do that via File > Create PDF > Merge Files into a Single PDF > PDF Portfolio (in the upper right-hand corner of the window). That approach offered an Options button. One option, I found, was to uncheck the box that said, “Continue combining if an error occurs.” Unchecking that box brought up all kinds of error messages, including requests for file passwords. This was good, in the sense that Acrobat was catching problems, but it was bad, in the sense that I was not getting a list of problematic files upon program completion. I would rather not sit there and manually click through (and perhaps write down the name of) each problematic file that Acrobat encountered.

Those requests for passwords did make me wonder what CorruptedPDFinder (above) had done when it encountered passworded PDFs. It would seem that CorruptedPDFinder must have ignored them, or may have applied only the most superficial tests of corrupted contents.

When creating a PDF portfolio in that way, I also had a choice of output file sizes. At first, I thought it would be fastest to specify the smallest file size. Eventually I realized that this might mean maximum compression, with processor delays if the CPU could not keep up. But I thought that choosing the largest file size could also introduce disk-writing delays. My searches did not produce a clear discussion of these tradeoffs. Eventually I just decided to go with the default, medium-sized output file. (Not surprisingly, I also found that Acrobat went much more slowly when some other program was hogging system resources.)

Acrobat offered other ways to combine files (and, presumably, to test them along the way). One way was to select a bunch of files in Windows Explorer and use the right-click context menu option (“Combine supported files in Acrobat”) added during Acrobat installation. Another way was via File > Combine > Assemble PDF Portfolio. This approach offered a Specify File Details option, but that proved unhelpful for present purposes: it did not include an option for indicating whether a given file was corrupt.

Acrobat also offered some approaches that did not involve merging PDFs. One that I did not test was File > Export > Export Multiple Files. Yet another approach: File > Create PDF > Batch Create Multiple Files. I tried that with the full set of 55,000 test PDFs. Acrobat said, “Press OK when all the desired documents have been added.” I pressed OK and, when prompted, designated an output folder. This approach did not seem to care how many files I was processing. It filled the output folder with copies of the PDFs from the input folder. When it was done, it gave me approximately this:

Warnings and Errors

Cannot open document: [filename]

Object label badly formatted: [filename]

Unrecognized object name: [filename]

These error messages highlighted a total of eight PDFs. There did not appear to be a way to save (or to copy and paste) the list of errors. One could produce a list by running a set of cropped screenshots of that window, each time scrolled a bit more, through a panoramic image maker (e.g., Microsoft Image Composite Editor (free)) and OCRing the resulting image, but the inability to resize the error window would make that a real chore if the error list was long. It would probably be easier just to retype the list manually, as I did in this case.

I tried opening the PDFs named in those error messages. I would have had to open them in another PDF reader (e.g., Foxit, Sumatra) if I had kept the error window open; Acrobat would not display other PDFs until that window was closed. In Acrobat, the effort to read the named files produced the following results:

  • For the one PDF labeled as “cannot open document,” Acrobat said, “There was an error opening this document. The file is damaged and could not be repaired.”
  • For the multiple PDFs labeled as “object label badly formatted,” Acrobat gave diverse results. In some cases, the file would open and its contents would display without apparent problems (though in at least one case the document was too long to page through to make sure each page was displaying). In one case, however, the contents were completely blank: I was looking at a 31-page PDF, in Acrobat, with nothing on any pages; and after leaving the document open for a minute, I got an error: “There was a problem reading this document.” In another case, the first pages of the doc were fine, but later pages were blank. Another one immediately popped up the “problem reading this document” error along with another error: “There was an error processing a page.” In that case, the first half of the document was blank; the second half was OK.
  • The one PDF labeled as “unrecognized object name” opened and looked OK, although here again the document was too long to page through manually to verify each page.

This test demonstrated that Acrobat had found PDF problems that CorruptedPDFinder had missed. I was not able to test whether the reverse might also be true — whether CorruptedPDFinder had found problems that Acrobat would miss — because, by this point in this long project, I had replaced the bad PDFs found by CorruptedPDFinder with good backups, and no longer had the bad PDFs to compare.

I was incidentally curious as to whether this approach had created output PDFs that were perfect matches of the input PDFs. For the most part, the answer to that was no: a DoubleKiller scan detected exact duplicates in only about 12% of the output PDFs. A second scan, seeking files with the same names and with similar sizes (not varying by more than 4Kb), accounted for most of the output PDFs. Further scans, allowing greater variation in file sizes, indicated that there could be substantial differences in size, between input and output PDFs, without any obvious difference in quality and without the kinds of problems sought here. In at least some cases, that may have been due to this reprocessing of PDFs that had been saved with earlier and less competently compressive versions of Acrobat. In the end, after deleting all output PDFs that were within 5MB of the size of their corresponding input (original) PDFs, I had a few dozen left. In Windows Explorer, I selected and tried to open all of these. They all opened. I concluded that the substantial differences in file size were not due to disappearance of content. The only files that the foregoing Acrobat procedure had failed to reproduce in the output folder were the eight PDFs with the problems discussed above.

In short, the Acrobat procedure described above (File > Create PDF > Batch Create Multiple Files) did identify several PDFs with real problems that CorruptedPDFinder had not detected.

Other Batch Printing Approaches

People had cooked up various ways to use Adobe Acrobat, Adobe Reader, or non-Adobe installed printers (e.g., Bullzip) to print PDFs en masse. For example:

  • Kurt Pfeifle offered a seemingly simpler and newer Adobe Reader alternative.
  • Someone pointed out that Windows Explorer (in Windows 7, but apparently not in Windows 8) offered the option of selecting a group of PDFs and right-clicking to get a print option — but someone else said (as I confirmed by trying it) that that was an option for at most 15 files at a time. But Superb Tech offered a registry hack to eliminate that limit.
  • A preliminary search suggested that there might not be an easy way to add a printer to the Windows Explorer context menu “Send to” item.

I tried several of those, as follows:

Registry Hack. As advised, I added the following key:

Windows Registry Editor Version 5.00

Later, after rebooting (or after killing and restarting Explorer.exe via Task Manager (i.e., Ctrl-Alt-Del or Start > Run > Taskmgr.exe and then File > New Task > explorer.exe)), I found that the change had worked: I could right-click and select Print for those 55,000 PDFs. As in the Acrobat approaches discussed above, there seemed to be two ways to go, with Bullzip as my default printer: either set it up to ignore error messages (see discussion of Printto, above) and do a comparison of input and output folders (using e.g., DoubleKiller) when the printing process completed, or else don’t suppress errors and view each Bullzip complaint as it arose, one file at a time. I opted for the latter approach. But when I went to select all files > right-click > Print, I got nothing. No files were printed, even when I let it sit overnight. I was tempted to try again with Acrobat as my default printer, but as indicated in the following paragraph, I decided to try a different Adobe printing method instead.

Reader-Based Commands. I had found that Adobe Reader conflicted with Acrobat in some regards, and thus had uninstalled it. However, for the occasion, I downloaded and installed the latest version. I started Reader and went through its licensing agreement. I did not need to change its settings, because sources said Reader would use the default printer unless I specified some other. Having already set up Bullzip for the purpose, it seemed I would already have configured the printer as needed. As above, I had to adjust the settings as desired within Bullzip’s default print profile; other profiles did not seem to work properly. With Adobe Reader thus prepared, I tried Kurt Pfeifle’s approach (above). Now that I had copies of the PDFs in a single folder, I did not need to work from a file listing the PDFs to be tested, so I went with this version of Kurt’s command:

for %i in (*.pdf) do (start /wait "C:\Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe /t" "%i" "Bullzip PDF Printer" && taskkill /f /im AcroRd32.exe /t)

reflecting the location of AcroRd32.exe on my machine. Sources indicated that the /t option was the print option for Adobe Reader. This command included a TASKKILL command (with its own /t option) as a way of killing Reader after it printed the file. Otherwise (without the /wait option), Reader sessions would proliferate until a maximum number of windows were open. Unfortunately, this command did not work: it did not produce PDFs in the target folder that I had designated in Bullzip’s Options dialog, and it also failed to kill those proliferating Reader sessions. (The Acrobat Wrapper program described below did not alleviate these problems.) I was not able to work out the bugs within the time available for this project. (A previous post had used a PAUSE command for a comparable purpose, but of course that would be impractical with many thousands of PDFs.)

PDF2Printer. I selected Bullzip as my printer, changed Bullzip’s default profile to the minimalistic settings described in the Printto section (above), and selected the folder containing the PDF copies. The box listing the files to be printed was filled immediately. I clicked the box that specified a maximum of three documents in the print queue, hoping this meant that the program would hold off on loading other documents, not that it would skip them. I clicked Print. PDF2Printer began generating new PDFs in the output folder as the names of input PDFs disappeared from the input file list. The output files bore an extra .pdf extension. If I wanted to do exact filename comparisons (in e.g., DoubleKiller), I would have to use something like Bulk Rename Utility (or just a DOS command) to remove those duplicative extensions. PDF2Printer ran for a while and then gave me an error message:


Run-time error ’70’:

Error writing status file “. Error 70: Permission denied.

Accompanying that error dialog, I saw another that said this:

PDF Printer

An error occurred.

Error 1008: Ghostscript timed out

Source: GUI

Internal hint: Run converter to create PDF file

The programs did not say what file had caused this error. I was not sure whether to hope that the program would pause to let me figure out where the problem was, but that was moot: PDF2Printer was forging ahead regardless. If I was to identify the files that provoked those error messages, apparently I would have to wait to see whether the output folder was missing its versions of any PDFs existing in the input folder.

Later, I experienced another error: “Printer selected is not valid.” The accompanying error message was much like the one quoted above, naming Ghostscript, but in this case the last two lines said, “Source: WriteStatusFile” and “Internal hint: Write status file.” In this case, unlike the previous error, the printing process did not continue while those messages were displayed, but rather waited until I clicked OK. I was not sure whether a set of PDFs producing a large number of these error messages would eventually choke the system, preventing completion of a batch printing job that would be finding errors with many PDFs.

When I clicked OK on those error messages, printing seemed to have terminated. But then I noticed that the drive on which these PDFs were stored was still active. Icons in the system tray (i.e., bottom right-hand corner of the screen) seemed to indicate that Bullzip was still working. This raised the possibility that perhaps the message about the invalid printer was related to an attempt to print an especially large PDF. Here, again, I was not able to tell what file had caused the problem. A closer look at the system tray icons suggested that two of the three PDFs I had allowed PDF2Printer to keep in the queue at any one time had, in fact, finished printing. I was not sure why their icons were still lingering in the system tray. I went into my list of printers (in Windows Explorer) and saw that something — PDF2Printer, apparently — had changed my default printer from being Bullzip PDF Printer to being my Brother printer. That was scary: I might have walked away and returned to find hundreds of pieces of paper lying around the floor. It seemed I should have disconnected the Brother printer before attempting this project. Hovering or right-clicking on the Brother printer clarified the mystery about the icons: I had two PDFs in queue for the Brother, and somehow it had also been put offline. So yes, the printing process had ceased, at least until the Brother came back online: PDF2Printer had done its thing with the first 396 of my 55,000 PDFs, but that was as far as it was going to go (and as far as I was willing to let it go) on this round. It appeared that PDF2Printer had encountered something in one of the PDFs that it considered incompatible with Bullzip, and had thus taken it upon itself to try a different printer instead. When I terminated all that, PDF2Printer gave a final “Out of memory” error. I uninstalled PDF2Printer.

The PRINT Command

The foregoing experiments discouraged me from further pursuit of batch printing options based on Adobe Reader: not to say that it could not be done, but that, among other things, Reader seemed to introduce additional complexities. Likewise, it seemed that I might not need to bother with third-party programs like PDF2Printer or Printto, as I had belatedly recalled that Windows had its own built-in PRINT command. It seemed that PRINT might provide a simple and direct way of batch-printing PDFs.

According to the help provided by typing PRINT /? on the command line, the syntax was as follows:

PRINT [/D:device] [[drive:][path]filename[...]]

Help also stated that PRINT “prints a text file,” which would imply it was not useful for printing PDFs. But various comments seemed to indicate otherwise. Possibly the more important part of that quote was the letter “a”: it appeared that PRINT might print only one single file, not multiple files. In that case, I would have to issue one command per file, in either of the two forms discussed above: by developing a separate command for each PDF to be printed, or by using a FOR loop that would repeat the same command for each separate PDF.

The first task was to specify the printer device. One source said there were two ways to do that. They appeared to overlap, so I decided to do both. This involved going into my Printers view (in Windows Explorer) > right-click the desired printer (in my case, Bullzip). In Properties, I would be using two tabs. First, the Sharing tab > Share this printer > Share name: Bullzip. Second, the Ports tab > check LPT1 (or any other LPT or COM port that might be available) > Apply. I noticed, in the Ports tab, that BULLZIP had been added and was already checked; this replaced that. I was not sure whether the LPT1 option would be necessary, but now I would play with it and find out.

The recommended syntax, in those two approaches, was to use PRINT /D:LPT1 filename or else PRINT /D:\ACER\BULLZIP filename, where ACER was the name of my computer (see Control Panel > System) and where LPT1 was the port name and BULLZIP was the share name that I had just assigned. A wildcard did not work on the first approach (i.e., PRINT /D:LPT1 .). But a wildcard did work on the second approach (i.e., PRINT /D:\ACER\BULLZIP .), at least in the sense of listing files that were supposedly being printed. Unfortunately, the result of that second approach was merely to try to print all of the files in the folder into one output file, on which Bullzip then choked and produced an error, as I could see in Printers > Bullzip > right-click > Open.

To decide between the two approaches just described, I tried both PRINT /D:LPT1 x.pdf and PRINT /D:\ACER\BULLZIP x.pdf, where x.pdf was an existing PDF in the folder where I was executing these commands. At this point, I discovered that the printing error of the previous paragraph was due, in part, to my simultaneous attempt to use both of those approaches. When I gave up on the LPT1 approach — that is, when I went into Printers > Bullzip > right-click > Properties > Ports tab > check BULLZIP (which had apparently been added by the foregoing change in the Sharing tab) — then PRINT /D:\ACER\BULLZIP x.pdf did produce an output PDF. With that change in place, the wildcard (i.e., PRINT /D:\ACER\BULLZIP .) also worked, in the sense of completing the printing process, but it only gave me a blank one-page PDF.

So now it was clear: I needed to use the “PRINT /D:\ACER\BULLZIP filename.pdf” syntax, and it would work with PDFs. The next question was whether I should use a FOR loop or a long list of commands, one per file. The FOR loop would be more efficient, if I could get it to work, and a previous post described the efforts I had already made with it, so I decided to try that. I put the FOR loop into a file called PRINTER.BAT:

ECHO off
SETLOCAL EnableDelayedExpansion
IF EXIST _ErrorLog.txt DEL _ErrorLog.txt
START "Killer" /min Killer.bat
FOR /r %%i IN (*) DO (
SET /a filecount=!filecount!+1
ECHO Printing file no. !filecount! : %%~nxi
IF NOT EXIST "R:\PDFCopiesOut\%%~nxi" PRINT /D:\\TP\BULLZIP "R:\PDFCopies\%%~nxi" >> _ErrorLog.txt 2>&1
FOR /l %%m IN (1,1,120) DO (IF NOT EXIST "R:\PDFCopiesOut\%%~nxi" TIMEOUT 1 > NUL)
TASKKILL /f /im cmd.exe /t

Note the use of a second FOR loop to introduce a delay of up to two minutes (i.e., 120 seconds) while allowing time for the PDF to print, before giving up and moving on. Here was the separate KILLER.BAT file invoked by that PRINTER.BAT script:

TASKKILL /f /im acrobat.exe
GOTO :start

Unlike the batch file provided in that previous post, the FOR loop in PRINTER.BAT referred, not to a file (e.g., PDFlist.txt) containing a list of files to be printed, but rather (and more simply) to all files in the current directory. Hence, in this version, there was no need for tokens or delimiters. Also, the IF NOT EXIST line was modified to use PRINT instead of Printto, and to refer to BULLZIP instead of “Bullzip PDF Printer.” In addition, several commands drew upon the %%~nxi variable. Note again that certain settings (e.g., R:\PDFCopiesOut as the destination directory) were set in Bullzip’s options.

I put both of those batch files in the folder where I had placed the test PDFs to be printed. They ran but, for some reason that I was not able to resolve despite posting a question on it (which, at this writing, remained unanswered), the output filename proposed by PRINTER.BAT was not being processed. Instead, Bullzip was printing every file without a name: the output file was named simply “.pdf.

Bullzip and bioPDF Approaches

A search led to a Bullzip webpage offering a User Guide for Bullzip PDF Printer version 2. This webpage was not the same as the webpage that opened when I clicked the Documentation shortcut that had been installed on my computer along with the Bullzip printer. The Documentation webpage did not provide actual documentation; it referred me instead to a bioPDF documentation page, with a warning that information on the latter might not be entirely accurate. I was inclined to try that bioPDF page nonetheless, because the former (i.e., the Bullzip User Guide page) did not start out well: it said that I would need “the CONFIG utility” to change the Bullzip PDF printer settings, but did not explain what that utility might be, or where I might find it.

I searched the bioPDF site for information on the CONFIG utility. A Command Line Interface page said that the syntax was as follows:

CONFIG.EXE /S name value | /R name | /C

This told me, first of all, that I was looking for a file called CONFIG.EXE. A search of my computer located that file in C:\Program Files\Bullzip\PDF Printer\API\EXE. The description accompanying that syntax said that I could use /S to set a setting, /R to remove a setting, or /C to clear all settings. I wasn’t sure how those options would interact with the Options dialog provided in the Bullzip installation folder. It seemed I might want to revisit that GUI dialog after running Bullzip command-line tools, so as to make sure that my usual settings were restored after I was finished with this special-purpose PDF-checking project.

The Bullzip User Guide said that recent versions of Bullzip would allow me to invoke the basedocname macro, on the command line, to produce the file name without an extension. The bioPDF page seemed to say that the macro name was case-sensitive: evidently it would not work if I typed Basedocname instead of basedocname. I was not sure how to put the available information together to construct the proper command. Another bioPDF page offered sample files to illustrate. But these turned out to be Visual Basic (VB) scripts, not batch files. I noticed that they dated from 2008, and that they appeared to use Printto (above).

Another bioPDF page offered two non-VB ways of printing PDFs using Bullzip. One involved using Acrobat Reader; the other did not. In the Acrobat Reader approach, bioPDF offered a program called Acrobat Wrapper. The page describing Acrobat Wrapper, and offering it for download, acknowledged that recent versions of Acrobat Reader would ignore the /t switch (above) that was supposed to close Acrobat Reader after Reader printed a PDF. The page indicated that Acrobat Wrapper would modify the registry so that the /t switch would work again: evidently it was called a “wrapper” because, you might say, it would add a layer, calling Reader to print the PDF and then closing Reader after the PDF was printed. This would apparently eliminate the need for my KILLER.BAT file (above). The version history on that webpage indicated that the Wrapper had last been updated to work with Adobe Reader X (i.e., Reader version 10). It was not clear whether that update would also work with more recent versions, such as the version 11 that I had installed. I did not look into the question of whether Adobe Reader X was still available for download. I did have an archive copy of Reader X, but I opted instead to start with the non-Reader approach. When I did later download and unzip that Acrobat Wrapper program, I saw that it consisted of an executable (.exe) file, not a registry editing (.reg) file, and that it was of some length. It seemed that its registry modifications could be extensive. I noted, also, that its name seemed to indicate that it was version 11, suggesting that the Wrapper program (but not the version history) may have been updated for Adobe Reader 11 after all. But as noted above, installation of this seemingly new version did not yield success in the modified form of Kurt Pfeifle’s command.

In the approach that did not involve using Adobe Reader, that bioPDF page said that I could use the PDFCMD command. I found the corresponding executable (PDFCMD.EXE) in C:\Program Files\Bullzip\PDF Printer. For some reason, it was present on only one of the two machines on which I had installed Bullzip. Apparently one of the steps described above had put it onto just one of those two machines.

The drawback of this PDFCMD approach, according to the bioPDF page, was that its output quality would be unacceptable for some purposes. That was not a concern for purposes of this PDF-testing project; I just wanted to know if the PDF was readable enough to be printed in any form. The webpage offered this example:

PDFCMD command=printpdf input="C:\Temp\A.pdf"

On the command line, I typed PATH > pathlist.txt, and then opened the resulting pathlist.txt file in Notepad and searched for Bullzip. Bullzip was not found. In other words, there was no Bullzip folder in my system’s PATH. So a batch file referring to PDFCMD would not work unless I put a copy of PDFCMD.EXE in the folder where I was running my PRINTER.BAT file.

I made a system backup and then ran the Acrobat Wrapper program. That, it seemed, would eliminate the need for KILLER.BAT (which I also would not need when using the non-Adobe Reader approach). I revised PRINTER.BAT to reflect that change and to incorporate PDFCMD. I had to remove the “.pdf” extension following the smarttitle macro in Bullzip’s Options dialog > General tab > Output filename. With those changes in place, I achieved success with this version of PRINTER.BAT:

ECHO off
SETLOCAL EnableDelayedExpansion
IF EXIST _ErrorLog.txt DEL _ErrorLog.txt
FOR /r %%i IN (*) DO (
SET /a filecount=!filecount!+1
ECHO Printing file no. !filecount! : %%~nxi
IF NOT EXIST "R:\PDFCopiesOut\%%~nxi" PDFCMD command=printpdf input="R:\PDFCopies\%%~nxi" docname="%%~nxi" >> _ErrorLog.txt 2>&1
FOR /l %%m IN (1,1,120) DO (IF NOT EXIST "R:\PDFCopiesOut\%%~nxi" TIMEOUT 1 > NUL)

Unfortunately, this approach was inordinately slow, at least on an underpowered machine with other processes running. It was so slow that it failed to print a number of larger PDFs that exceeded its 120-second cutoff — whereas the Adobe Reader script had opened numerous PDFs (but, as noted above, had not saved them in the output folder) within a matter of seconds. It seemed advisable to continue to search for a faster and less cumbersome method of testing PDFs.

Microsoft XPS Document Writer

Shashank Bhat suggested using Microsoft XPS Document Writer to batch-print PDFs. I tried incorporating the specific suggestion into the foregoing command, as follows:

for %i in (*.pdf) do (start /wait "C:Program Files (x86)\Adobe\Reader 11.0\Reader\AcroRd32.exe /t" "%i" "Microsoft XPS Document Writer")

But in Printers > Microsoft XPS Document Writer > right-click > Properties, I did not see an option to select an output folder, and there also seemed to be no such option within the Adobe Reader printer. Also, as above, when I used the /wait option, it opened only one PDF onscreen at a time, waiting for me to kill my PDF viewer before it would open the next one.

Sumatra PDF

Sumatra PDF appeared to be an alternative to Adobe Reader. The command reference seemed to indicate that, like Adobe Reader, Sumatra would print to a designated or default printer (e.g., Bullzip). I hoped that it would do that without opening a PDF reader, as the Adobe Reader commands (above) had done. It appeared that the command I wanted was something like this:

for %i in (*.pdf) do (start /wait SumatraPDF.exe -print-to-default "%i" >> _ErrorLog.txt)

where Bullzip was my default printer, with its options set to specify the desired output folder and filename format:


This version of the command showed no path for SumatraPDF.exe because, like other portable programs (above), I could just put a copy of SumatraPDF.exe in the folder containing the PDFs to be processed. I could have added a -silent option to the Sumatra command shown here, but I wanted _ErrorLog.txt to contain any error messages that might arise.

I ran that command successfully. It printed PDFs much more quickly than the looping batch file (PRINTER.BAT, above) had done. Of course, the batch file had the advantages of checking to see whether a file had already been printed, and of skipping over it if so, whereas the command shown here would just start again from the beginning. The batch file also showed the number of the file being printed, so that I could compare progress in the command window against the number of output PDFs reported on the status bar of an adjacent Windows Explorer session, to make sure things were staying at least somewhat in harmony. Given those advantages, I decided to plug the command into the batch file, producing this version of PRINTER.BAT:

ECHO off
SETLOCAL EnableDelayedExpansion
IF EXIST _ErrorLog.txt DEL _ErrorLog.txt
FOR /r %%i IN (*) DO (
SET /a filecount=!filecount!+1
ECHO Printing file no. !filecount! : %%~nxi
IF NOT EXIST "R:\PDFCopiesOut\%%~nxi" START /wait SumatraPDF.exe -print-to-default "R:\PDFCopies\%%~nxi" >> _ErrorLog.txt 2>&1

At first, I removed the CLS line near the top, because I was no longer tinkering with it by running and watching it on the command line, and the TIMEOUT line near the bottom, because I had belatedly recalled the /wait option and thought I would not need that alternate means of delaying progress until printing was done. But then I saw that printing had become inordinately slow. There had been large (i.e., several hundred MB) PDFs that took an hour or more to print, and even a one-page PDF could take a minute or more. In the first 6.5 hours, the program was running at a rate of about 100 PDFs per hour. I hoped for much better than that. The batch file had also generated several error messages like those shown by PDF2Printer (above). (The ErrorLog.txt innovation proved useless in this case; Error.Log.txt remained empty.)

I also noticed that, by that point, the batch file was working on file 658, but the output folder in Windows Explorer was reporting only 639 completed PDFs, and I had only had a few error messages. Moreover, the system had become very sluggish, taking a minute or more to respond even to a simple command. This was a reminder to start Task Manager (taskmgr.exe or Ctrl-Alt-Del) and see what was running. In Task Manager, I noticed that CPU usage was at 100%. Moo0 System Monitor indicated that the CPU burden was due primarily to gswin32c.exe. That was Ghostscript. So the slowdown was not primarily due to something other than my PDF printing process. Nonetheless, in Task Manager, I killed unnecessary processes, some of which may have been holdovers from my previous experimentation. I also ran msconfig.exe and turned off unnecessary services and startup programs.

To end the slowdown, I killed the command window in which the PRINTER.BAT file had been running. Yet the system continued to be sluggish. It appeared that Bullzip was continuing to process holdover PDFs, making up some of that gap between the number of files that had supposedly been printed and the number actually appearing in the output folder. The system tray contained a dozen or more Bullzip icons, some showing that printing of a certain file was still underway and others indicating that printing had completed. During this catch-up period, I noticed, in Task Manager, that there were several sessions of gswin32c.exe, and also several sessions of Bullzip’s gui.exe. I wondered if removing the /wait option and restoring the TIMEOUT line in PRINTER.BAT would limit Bullzip and Ghostscript to running just one current session. If not, perhaps I should add a TASKKILL line focused on those specific processes. These reflections yielded a revised PRINTER.BAT file (below).

I encountered an error when the batch file tried to print a big PDF that had remained stubbornly unprinted into the output folder. That error message was as follows:

Printing problem

Cannot print this file.

A search led to a statement that this error message meant that either the PDF was broken beyond repair or that it was set to disallow printing. There appeared to be a workaround for the latter problem, but it was beyond my present abilities and time resources. This particular file did seem to have a problem in the copy that I was working on, on the test computer, but not in the original, located on another machine. On the test computer, I could print it using Acrobat PDF, but not using Bullzip. Possibly it had gotten messed up somehow during my PDF tinkering or repeated printing retries, or perhaps there was a problem with the software on the test computer.

The version of PRINTER.BAT that I ended up with was as follows:

ECHO off
SETLOCAL EnableDelayedExpansion
FOR /r %%i IN (*) DO (
SET /a filecount=!filecount!+1
ECHO Printing file no. !filecount! : %%~nxi
IF NOT EXIST "R:\PDFCopiesOut%\%~nxi" START SumatraPDF.exe -print-to-default "R:\PDFCopies\%%~nxi"
FOR /l %%m IN (1,1,30) DO (IF NOT EXIST "R:\PDFCopiesOut\%%~nxi" TIMEOUT 1 > NUL)
TASKKILL /f /t /im gswin32c.exe & TASKKILL /f /t /im gui.exe & TASKKILL /f /t /im sumatrapdf.exe

Unfortunately, there were still multiple problems with it. My knowledge of batch files and/or the peculiarities of SumatraPDF were such as to defeat my effort within the time available.


It was reportedly possible to batch print PDFs from the command line using Foxit Reader. Sadly, a search led to an indication by a Foxit moderator that, as of December 2014, Foxit 7 was unable to print silently (i.e., without opening the output PDF in a Foxit Reader session). It was possible that a batch file using TASKKILL, as attempted with other programs (above), would resolve that difficulty, but I was not inclined to pursue it.

A search suggested that it was still possible to download copies of version 6, but I had no luck in multiple tries: it seemed that all reliable sources had gone to version 7. For those who had or could find version 6, or if ever Foxit repaired version 7, it seemed (with help from Dave Brooks) that a sample command would look something like this:

START /wait "C:Program Files (x86)\Foxit Software\Foxit Reader\Foxit Reader.exe" /t \filename.pdf "Brother MFC-7840W"

where I would probably replace the Brother printer with Bullzip.

Another Search for Possibilities

I had not yet found many workable ways to test a bunch of PDFs. I ran another search and came up with some more possibilities:

  • Schotbi suggested trying pdftotext in Linux. It seemed that possibly a Windows implementation of Linux would offer that program.
  • Multivalent spoke of Validate, which appeared to be a Linux-based command-line tool for testing PDFs at various levels of detail.
  • OriginalGriff pointed out that, once files were confirmed as non-corrupt, one could save a hash value (e.g., SHA, MD5SUM) for each file and verify, at future times, that those values had not changed. This would apparently be vastly easier and faster than opening and printing (above) or otherwise testing each page of each PDF.
  • In another precautionary remark, Zdzichu said that the solution, going forward, would include use of a filesystem (e.g., btrfs in Linux) that would use embedded data checksums and self-healing via redundant copies of files. But someone else remarked that files can go bad over a period of time, potentially defeating this sort of protection.

These suggestions, and the reading that I did while browsing these and other possibilities, seemed to confirm that a thorough file check, like that performed by a file printing process (above), was likely to be the best way of verifying that a PDF was fully functional. (Ideally one would find a way to confirm not only that the files had printed, but that each page contained valid content rather than blanks.) It also seemed that Linux (and perhaps Mac) had worthy tools not explored here. There was also the option of spending money for a good tester; that thought brought to mind the Total PDF Printer ($50) option noted earlier, with the risk that it would not work as advertised or would not work as well or as comprehensively as one might hope. Finally, it did seem advisable to develop a checksum system, so as to avoid the need for future time-consuming file-checking processes.

Alternatives to Acrobat

At this point, Adobe Acrobat had been my only workable alternative to CorruptedPDFinder. I wondered if any Acrobat competitors would offer ways of bulk-testing PDFs. PCWorld listed several commercial alternatives, starting with Nitro Pro ($140). I was not sure whether any of these programs would offer the kind of functionality I was seeking. Lawyerist listed a few more. There was a possibility that a LibreOffice Bulk Converter could be made to convert input PDFs to output PDFs. There may have been more that I could have done with CutePDF. But by this point, it seemed I was running out of ideas and grasping at straws.

Using One PRINT Command per PDF

The foregoing batch and command-line printing approaches had all tried to use a single command, within some kind of looping structure, to print all files within a folder or list. It occurred to me that I had not yet tried the brute-force approach of creating a batch file that would contain one explicit command for each PDF to be printed — containing, for example, a thousand lines of code if I wanted to print a thousand PDFs. I wasn’t sure if this approach would make any difference, but it seemed worth a try.

As indicated in the preceding sections of this post, I had tried a number of different PDF printers and printing commands. These included Printto, Adobe Reader, PRINT, Bullzip, PDFCMD, and Sumatra. It seemed that a command to print a PDF to another folder, using any of these tools, would follow this logic:

If R:\PDFCopies\filename.pdf does not exist, skip to the next file; otherwise, delete that file after verifying that a copy of it has been printed to R:\PDFCopiesOut.

I had made a backup of the PDFCopies folder, which itself (as noted above) contained mere copies of my original PDFs from various folders. Once a copy of filename.pdf was printed into the output folder (R:\PDFCopiesOut), I would remove it from the input folder (R:\PDFCopies). That way, if the script got interrupted and I had to start over again, it would not spend time on PDFs that had already been tested by successfully printing to the output folder. (The deletion concept would assume that the files in question were not set to read-only. I would later find it necessary to right-click in Windows Explorer and change the Properties of multiple PDFs that generated “Access is denied” errors when set to read-only.)

So now the question was whether I could squeeze all of that logic into a single command using one of those PDF printing tools. Using PRINT as a relatively simple tool, it seemed the desired command (keeping the logic if not the arrangement sketched above) would be something like this:

IF NOT EXIST "R:\PDFCopiesOut\[filename.pdf]" IF EXIST "R:\PDFCopies\[filename.pdf]" START /wait PRINT /D:\\TP\BULLZIP "R:\PDFCopies\[filename.pdf]" & IF EXIST "R:\PDFCopies\[filename.pdf]" DEL "R:\PDFCopies\[filename.pdf]"

That command would assume, again, that I had set Bullzip (my default printer) to put its output files in R:PDFCopiesOut. It might be necessary to add a TASKKILL command at the end, if I was using Acrobat Reader or some other program that had to be terminated manually. (Parentheses around the DEL command would apparently be necessary if I did append something like that TASKKILL command.)

The DEL step could also have been done with a separate set of commands, instead of being combined here, and that might have been preferable if I had intended to conduct any comparison of, say, input and output file sizes. But previous tinkering (above) had suggested that input files had been saved with varying degrees of quality, whereas Bullzip would be saving the output files at a single preset level of compression, so I would not expect input and output files to be consistently of the same size. My printing test would be limited rather to input PDFs that flatly failed to print.

To work up my desired commands for the set of 55,000 PDFs being tested, I used a DIR command and an Excel spreadsheet, and pasted the Excel results into Notepad to create my new version of PRINTER.BAT, as described in more detail in another post.

Unfortunately, the PRINT command was still unable to produce a proper filename when used with Bullzip as described here. If I used Bullzip’s <docname> or <basedocname> macro, PRINT produced something called LocalDownlevelDocument.pdf, regardless of the input filename. If I used Bullzip’s <title> or <smarttitle> macro, PRINT produced something called .pdf, without any filename.

This raised the question of whether I could use something other than Bullzip as my printer. Sumatra had not installed as a printer per se, and I had not installed CutePDF, so the only other PDF printer shown in my Windows Explorer > Printers dialog was Adobe PDF. So I set that printer’s Properties to Shared (as described above) with a name of ADOBE, and retried my PRINT command that way (i.e., with a reference to PRINT /D:\TP\ADOBE). That resulted in the usual command-line affirmation that my file “is currently being printed,” but no output file resulted.

Trying again, I installed CutePDF Writer, taking care to decline and skip the crapware offers during installation. Now my Printers dialog showed CutePDF Writer as an option. (It had set itself to my default printer during installation; I undid that.) I set it to shared as CUTEPDF and tried the PRINT command again. Once again, there was no output file. In the Printers dialog, I saw that CutePDF was registering an error. Investigation revealed that CutePDF had choked while attempting to print Local Downlevel Document.pdf.

I concluded that PRINT was not going to work that way — with, that is, a reference to a shared printer. As described above, there was also the option of using PRINT with a designated (not necessarily shared) printer. Following that approach, I went back into the Properties for CutePDF > Ports tab > LPT1. Then I changed my print command to use PRINT /D:LPT1. That gave me an error: “Unable to initialize device LPT1.” There had been indications that PRINT was able to print only ASCII text; a search yielded the conclusion that this must be the explanation.

Other One-Command-per-File Approaches:
Printto, PDFCMD, Sumatra, Adobe

I tried replacing PRINT with Printto. This required a copy of Printto.exe in the input folder, where the batch file was running. It also required a somewhat different command. I started with just Printto [filename], using Bullzip as the default. I hoped that, with the filename specified instead of using a wildcard (above), Printto would work. It did. Unfortunately, as above, despite settings to the contrary in Bullzip’s Options dialog, the command also opened an Adobe Acrobat session. (It did that even with Adobe Reader set as my default PDF reader.) It would not progress to the next command until I killed that Acrobat session.

To avoid opening Acrobat or Reader, I tried specifying CUTEPDF as my printer: Printto [filename] CUTEPDF. It did recognize the shared printer name, but the result was even worse: in addition to opening Acrobat, it also opened a Save As dialog (though perhaps I would have been able to defeat that by specifying a default output folder and filename format somewhere in CutePDF’s options, as I had done in Bullzip’s options).

There was still an option of trying to kill those unwanted Adobe (Acrobat or Reader) sessions shortly after they were created. I would not be able to do it by adding TASKKILL to the print command because, as above, the command processor would not reach the TASKKILL command until the unwanted Acrobat (or Reader) session was killed. I would need a separately running KILLER.BAT tool (or a working alternative along the lines of the Acrobat Wrapper, above) to take care of that.

Before taking that route, I tried PDFCMD instead of PRINT or Printto. Once again, I would have to put a copy of PDFCMD.exe in the folder where I would be running the command. The syntax (above) seemed to be

PDFCMD command=printpdf input="R:\PDFCopies\[filename] docname="R:\PDFCopiesOut\[filename]"

Despite that specification of an output directory and filename, it appeared that it was also necessary to specify the output folder and file naming macro in Bullzip’s Options dialog as well; neither was sufficient by itself. The Bullzip output specification was R:\PDFCopiesOut\<basedocname>.pdf. With those settings, the command worked. The documentation explained that, when used this way — unlike the other approaches just discussed — PDFCMD did not open an unwanted Adobe session, but instead simply sent a bitmap to the printer. I was not sure whether the bitmapping process would explain the slowness I had encountered when working with PDFCMD previously (above).

I had yet to examine Sumatra, among the foregoing PDF-printing tools that might be worth trying in a file-by-file command arrangement. The command format appeared to be approximately like this:

SumatraPDF.exe -print-to-default "R:\PDFCopies\[filename]"

Here, again, I would need a copy of SumatraPDF.exe in the directory where the command was being run. Using the same Bullzip settings as in the immediately preceding test of PDFCMD, this command worked too. I was not sure why Bullzip was able to produce output without opening an unwanted Adobe session when Sumatra was the tool being used, whereas Bullzip had opened unwanted sessions when using Printto and would presumably also do so if I reverted to a command-line attempt to use Adobe Reader.

It occurred to me that I had not tested that last assumption by actually trying Adobe Reader in this context, so I did that now, with a command like this:

"C:\Program Files (x86)\AdobeReader 11.0\Reader\AcroRd32.exe /t" "[filename]"

Unlike my earlier attempt to use Adobe Reader (above), this time I did not specify an output printer, and that was OK: the specified PDF was printed anyway in the output folder designated in the Options dialog for Bullzip (still the default printer). It worked when I used the same form of command with multiple distinct file names, one after another. One, but only one, unwanted session of Adobe Reader would be opened. I could live with that.

This success raised the question of whether I could run AcroRd32.exe as I had run the other programs just discussed, by putting a copy of AcroRd32.exe in the folder where the command was being processed. An attempt to do that triggered this dialog:

Adobe Reader Protected Mode

Adobe Reader cannot open in Protected Mode due to an incompatibility with your system configuration. Would you like to open Adobe Reader with Protected Mode disabled?

I decided just to cancel out of that. (That message also recurred when I added AcroRd32.exe to the system’s PATH and used a command referring simply to AcroRd.32.exe without specifying its full path. At that point, I indicated that Reader should always open with Protected Mode disabled, and that took care of that message.)

Next, I tried a modified version of the single-line FOR loop approach presented under the Reader-Based Commands subheading (above), but again it froze after creating the first PDF and would not proceed until I had killed the newly created and unwanted Adobe Reader session. I thought this meant that the problem was in the loop — that my project would succeed if, and only if, I used a batch file containing identical commands for each PDF to be printed, one PDF at a time, and that this approach would work with Adobe Reader as it had worked with PDFCMD and Sumatra.

But then I realized the problem was in the /wait flag. Sure, any of these approaches would work in isolation, with one command manually entered after another. But as soon as I automated the process, print commands would be executed much faster than PDFs could be printed, unless I added /wait — and if I did that, the continued existence of that one Adobe Reader session would prevent any further commands from running. Whenever Adobe Reader was involved, it seemed, there would have to be a separate KILLER.BAT process running simultaneously.

So I was down to PDFCMD and Sumatra. Both had been slow, in the looping batch file (above), but neither had fired up any Adobe Reader sessions. Moreover, the two seemed to take different approaches, insofar as PDFCMD used a bitmap approach. I thought that perhaps both could offer something to a PDF test-printing effort.

Starting again with PDFCMD, I worked up a full command that looked like this:

IF EXIST "R:\PDFCopiesOut\[filename]" (IF EXIST "R:\PDFCopies\[filename]" DEL "R:\PDFCopies\[filename]") ELSE (START /wait PDFCMD command=printpdf input="R:\PDFCopies\[filename]" docname="R:\PDFCopiesOut\[filename]" && DEL "R:\PDFCopies\[filename]")

That was really gangly. Its awkwardness stemmed in part, no doubt, from my lack of batch programming expertise, but also from some difficulty in getting batch commands to work on a single line as desired. There was a particular problem with the /wait option, which once again did not seem to be working as expected: while I wanted the computer to focus on printing one PDF at a time, there were invariably several going on at once — and that, I feared, was the cause of the slowdowns that I had experienced in previous tries. After much tinkering, I gave up on making that command work with PDFCMD. The revised version for Sumatra looked like this:

IF EXIST "R:\PDFCopiesOut\[filename]" (IF EXIST "R:\PDFCopies\[filename]" DEL "R:\PDFCopies\[filename]") ELSE (IF EXIST "R:\PDFCopies\[filename]" START /wait SumatraPDF.exe -print-to-default "R:\PDFCopies\[filename]" && DEL "R:\PDFCopies\[filename]" && BEEP 250 500)

The BEEP command ran BEEP.EXE, a little portable device I downloaded after a search, so as to notify me that something had finally printed, after long waits during which I was working elsewhere. (The numbers following BEEP are there to set its frequency in hertz and its duration in milliseconds.) I guessed that it had to go after the final DEL command, so that DEL would not execute unless printing was successful. Later, I would mute the speaker, or plug in my headset, so that the beep would not bother me unless I wanted to hear it.

That set of commands ran, one iteration per file. This approach was as slow as the looping batch files described above, however, and it lacked some of those files’ advantages. Sumatra gave me the occasional “Cannot print this file message” (above), but otherwise it worked away at the list. Some hours later, I got recurrent “Couldn’t initialize printer” error — why, I could not tell — and those seemed to mark the end of the process: clicking OK on one produced a delay and then another of the same, without any more files being printed. I killed and restarted the job. Within a minute or two, it was back at that same point, with those same errors. A search led to no solutions. But then I got a notice from the computer:

Low Disk Space

You are running very low on disk space on PROGRAMS (C:).

There was also a problem message in the Action Center:

Troubleshoot a file access problem

The problem occurred because Windows could not access a file on your computer that it required.

I ran WinDirStat to see where the problem might be. It looked like I had accumulated a huge amount of material in C:\Users\Ray\AppData\Local\Temp\Bullzip\PDFPrinter. So it seemed that the Sumatra printer error was actually due to Bullzip’s accumulation of unnecessary material. There were items in that folder dating back some days. I selected all that were more than 24 hours old and used Shift-Del to delete them without putting them into the Recycle Bin, so as to make some space on drive C. I also emptied the Recycle Bin. I went back to look at some of the ones that were left. A few, especially of the most recent ones, contained very large PostScript (.ps) files. I opened one that was more than 2GB in size. Acrobat Distiller (my default program for .ps files) said “Cannot open this file. The file is being used by another process.” I used Unlocker to unlock it and tried again. After struggling for a half-hour and getting only to page 77 of that 2GB file, Distiller gave up with error messages: “Error: ioerror; OffendingCommand: imageDistiller.” I tried again with a 400MB file in another subfolder in that Bullzip PDFPrinter folder. That one worked for another half-hour or more and ended with the same error. I also tried a file called printer.pdf in that same subfolder. Adobe Reader said, “There was an error opening this document.” It appeared that Bullzip was saving copies of nonworking files in that subfolder — that perhaps it was failing to purge them after trying and failing to print them.

I deleted the remaining files in that Bullzip PDFPrinter folder and restarted the printing process. But soon the “Couldn’t initialize printer” problem was back. This time, oddly, right-clicking > Properties in Windows Explorer said that drive C was once again running out of space, whereas WinDirStat reported that there was a lot of extra space. I thought perhaps Windows was confused, so I did a cold reboot and tried again. But no, not confused: drive C was full again, and had become so very quickly. WinDirStat found that the Bullzip Temp folder had already accumulated another 10GB of material, but that didn’t account for the full drive. Eventually I found the other culprit: C:\Windows\Temp\BullZip\PDF Printer. It seemed I might need to add, to KILLER.BAT, a repeating command to delete leftover files in these two directories. But then, as I watched those folders, I saw 10GB of disk space vanish while Sumatra and Bullzip tried to print a 300MB PDF. There seemed to be a problem here.

Meanwhile, something got screwed up with my system somehow, either from the printing project or from my efforts to recover from it, because ultimately I had to use the Windows 7 Recovery CD and run CHKDSK to get the machine running again. Then it seemed I was getting firmware error messages! Surely the software had not caused this. Had some kind of hardware/firmware problem been responsible for any of my troubles in this post, at least recently? It was suspicious that this problem emerged only now, after two years of using this laptop. Anyway, I updated the BIOS (which amounted to a reinstallation of the existing BIOS, since there had been no recent updates) and resumed. Then, once I was back in the saddle, to speed things up a bit, I put the larger (>40MB) PDFs into a separate folder for special treatment; I wanted to see if I could progress more rapidly through the bulk of the PDFs.

Revised Adobe Reader Approaches

Seeing the problems I was having with Sumatra and Bullzip, it seemed I really should try again with Adobe Reader. This time, I thought, I might return to an approach using a version of KILLER.BAT.

For all its bulk, I found that I was preferring the single-file-command approach over the FOR loop approach to my batch file: it seemed more transparent, less esoteric, easier to troubleshoot. (Again, by “single-file-command,” I mean that the batch file would contain one command line for each PDF being processed — so if there were 10,000 PDFs to test, there would be 10,000 command lines in the batch file.) So now I tried a version of the single-file command for Adobe Reader:

IF EXIST "R:\PDFCopiesOut\[filename]" (IF EXIST "R:\PDFCopies\[filename]" DEL "R:\PDFCopies\[filename]") ELSE (IF EXIST "R:\PDFCopies\[filename]" START /min Killer.bat && START /wait AcroRd32.exe /t "R:\PDFCopies\[filename]" && DEL "R:\PDFCopies\[filename]")

without the ending BEEP option shown in the previous version. (Again, PDFCopies was the input folder, containing the PDFs to be examined, and PDFCopiesOut was the output folder, containing the PDFs that had been printed. The “Copies” part of those names emphasizes that these were just copies of the original PDFs.) Of course, AcroRd.32 had to be on the system’s PATH (see above, just after the discussion of the Adobe Reader Protected Mode error). The version of KILLER.BAT that I devised for this purpose looked like this:

TASKKILL /f /t /im AcroRd32.exe

This ran really well. PDFs began flying out of the computer like coins from a slot machine on your lucky day. I tinkered with the TIMEOUT setting in KILLER.BAT, starting with ten seconds and discovering that three seconds was too fast — Adobe Reader would not yet have gotten started, it seemed, and therefore the TASKKILL command would not take effect and progress would come to a standstill, until I manually killed Reader — and finally settling on four seconds as the magic number, at least on this computer. (That time period referred to the duration of the Adobe Reader session. The resulting Bullzip printing session could take some seconds longer; sometimes several of them would be piled up in the system tray, trying to finish while the printing command raced on ahead.)

This Adobe Reader approach was the only PDF-printing approach I had tried that seemed stable enough to run for hours without needing supervision, and fast enough to have a serious chance of getting through a mountain of PDFs. In a one-hour test period (bearing in mind that I had removed the PDFs larger than 40MB), this setup printed a total of 704 PDFs.

It seemed that it should be possible to combine this single-file approach with the simpler one-line commands considered above. (As detailed above, “one-line” means that there would be no need for a batch file. I would enter just one command, on the command line, and it would process every PDF in a folder, regardless of whether there were 15 or 150,000.) For that, I came up with this long command (using the same version of KILLER.BAT as above):

FOR %i IN (*.pdf) DO (IF EXIST "R:\PDFCopiesOut\%~nxi" (IF EXIST "R:\PDFCopies\%~nxi" DEL "R:\PDFCopies\%~nxi") ELSE (IF EXIST "R:\PDFCopies\%~nxi" START /min Killer.bat && START /wait AcroRd32.exe /t "R:\PDFCopies\%~nxi" && DEL "R:\PDFCopies\%~nxi" && BEEP 250 500))

That worked too, and all it required was for me to run that one line in the folder where the PDFs were stored. I might have thought of it earlier, if this project had not been pursued on such a disjointed basis and in physically adverse conditions. Note that this one-line command assumed that Adobe Reader (i.e., AcroRd32.exe) was installed and was included in the system’s PATH; otherwise, the command could be altered to specify the location of AcroRd32.exe.

Since that command deleted input files from the input folder (R:\PDFCopies) as soon as printed output PDFs were found in the output folder (R:\PDFCopiesOut), the number of input files remaining would continue to shrink. So the command could be interrupted and, upon restarting, would immediately attack the remaining PDFs, as distinct from starting over at the top of a long list.

After producing about 23,000 output PDFs in the first day or two, the system began slowing down. Suddenly the four-second KILLER.BAT delay was no longer enough; repeatedly the process hung up on an open Adobe Reader session that KILLER.BAT had failed to close. Changing to a five-second delay made little difference. It did not help to move thousands of output PDFs into a subfolder, where the system would not have to deal with them as it constantly tried to update the display. Instead, the system crashed with a bad pool header BSOD. I knew of no particular reason why that should have happened, so I just restated the system and resumed. Perhaps a system problem was the reason for the slowdown: after resuming, a four-second delay was adequate.

As the process continued, I did get an occasional error message: “You are trying to print a protected document. This is not allowed.” But the process continued and did ultimately finish: when the batch command concluded, there were no PDFs left in the PDFCopies (i.e., input) folder referred to in the foregoing command. There had been some interruptions in the process, so I was not able to calculate an average print time per document. Overall, the command had processed 55,633 input PDFs and had produced 53,777 output PDFs, leaving me to wonder how many of those 1,856 unprinted PDFs were bad.

The one-line command (above) would automatically delete input PDFs whose filenames were already represented in the output PDF folder. So a redo would take only a fraction of the time, and would provide confirmation of the foregoing numbers. I took a directory listing (DIR /a-d /s /b), just in case the numbers did not match up the second time around. Then, from backup, I restored another set of the same PDFs to the input folder and ran the command again. This time, I ran it without the concluding DEL command. That way, especially after a second rerun, I expected the input folder to contain those PDFs that had not been printed into the output folder.

The redo was thus largely a matter of waiting while the command deleted the many input PDFs it found already created in the output folder, after printing some that had not been printed in the first round; and keeping an eye on the “protected document” error messages, which popped up, one after another, much more quickly when there were not a dozen or a hundred printable PDFs buying time in between. (I did wish that the Adobe Reader command line options had included a “quiet” mode that would spare me such messages.) The fact that PDFs were printed in the second round, but not in the first round, suggested the existence of a hole in my one-line command: somehow input PDFs seemed to be getting deleted without producing output PDFs. In other words, it appeared that what I was doing as a double-check should actually be a part of the procedure: line it up and run it again, as a fast way of reducing the pile of unprinted items left over from the first round.

The second round left 1,936 files in the input folder. With all the printing that had seemed to be going on, I was surprised that there were actually more files left this time than there had been after the first round. I ran the command again, without reloading the input folder from backup, just to let it sweep out any other files that had been printed but not deleted, and to see whether it would print any others. The answer was yes on both counts. This suggested that the concluding DEL command had not worked as intended — that somehow it had deleted input PDFs that had not in fact produced output PDFs. The explanation seemed to be that Windows felt it had successfully sent the Acrobat Reader print command; whether the file was actually printed was not its concern. So I repeatedly re-ran the long command (above) without that ending DEL command. Without that ending DEL, the number of files remaining in the input folder would not necessarily be reduced when an output PDF was created, but that was OK — the ones that had printed would be deleted next time I re-ran the command. Eventually I realized that I could also run just the first part of the command, to quickly sweep out input PDFs that had already been reproduced in the output folder.

I could see that some PDFs were not loading successfully in the Adobe Reader window, when it would open up, and that Bullzip print jobs in the system tray were sometimes terminating without producing a pop-up bubble notice informing me that the print job had succeeded. These two adverse indicators seemed to suggest that, at least for some files, the TIMEOUT in KILLER.BAT was set too low. I played with it but decided not to adjust it as long as there was a steady stream of bubble notices telling me that other PDFs were being successfully printed. I made that decision because, for many PDFs, even 15 seconds was not long enough to permit a successful print.

I noticed that sometimes a bubble notice would inform me that a PDF had been printed, but the count of files in the output folder did not increase, and those supposedly printed PDFs were not deleted from the input folder on the next round. Upon closer inspection, I found that the filenames of these PDFs were messed up. For example, one file had a name like Filename..pdf, with two dots immediately before the extension, which was apparently a no-no. The explanation there was that PDFs were indeed being produced in the output folder, but they were being produced with correct filenames — so there was no incorrectly named output PDF justifying deletion of the corresponding input PDF. Fixing the filenames and retrying solved that problem. After that fix, to make sure the output folder contained only files whose names existed in the input folder (after refilling the latter from backup), I used Beyond Compare with a filename-only comparison.

I would have liked to identify, in advance, the PDFs producing the “protected document” error messages — so that those messages, proliferating when unwatched, would not pose a threat of swamping the system. If nothing else, it would have been nice if those error messages had specified the files to which they were referring, so that at least I could identify them manually and move them to another folder for separate treatment. Alternately, if Bullzip had produced a log of its results, I could have consulted that to identify the offending files. I wondered if perhaps I could make my own list of problem files by using a different version of the long command, like this:

FOR %i IN (*.pdf) DO (IF EXIST "R:\PDFCopiesOu\t%~nxi" (IF EXIST "R:\PDFCopies\%~nxi" DEL "R:\PDFCopies\%~nxi") ELSE (IF EXIST "R:\PDFCopies\%~nxi" START /min _Killer.bat && START /wait AcroRd32.exe /t "R:\PDFCopies\%~nxi" >> _ErrorLog.txt))

But the error messages seemed to be produced by Bullzip, not by the command: “protected document” messages came and went, yet _ErrorLog.txt remained empty. So now the preferred form of the long command was as shown just above, but without the ” >> _ErrorLog.txt” part.

There was another possibility. In Windows Task Manager (taskmgr.exe), I noticed that the “protected document” error messages were instances of gui.exe, the Bullzip interface program. That raised the idea of modifying _KILLER.BAT (prefixing an underscore to its name so that it would remain near the top of the list in Windows Explorer) to terminate instances of gui.exe as well:

TASKKILL /f /t /im AcroRd32.exe
TASKKILL /f /t /im gui.exe

That worked. Of course, not much new material was being printed; I had already had a few runs with TIMEOUT = 4. The remaining PDFs seemed to need more time. I settled in to recurrent runs of this revised version of the long command and _KILLER.BAT, each time with successively longer timeout settings, until all that remained were the protected PDFs. There would be undesirable delays, doing it that way: _KILLER.BAT would be waiting 15 seconds (or a minute, or five minutes) for PDFs that were simply not going to print due to DRM protection. But that was OK. I was willing to let that computer spend another few days at it, rather than spend my time manually logging the protected files. When nothing more would print automatically, I would be pretty sure I was down to a residual core of protected PDFs calling for special measures.

At this point, following the four-second runs mentioned above, along with several brief and interrupted prior experimental runs with longer timeouts, there were only 730 unprinted PDFs remaining in the input folder, out of a starting total of 55,633. Now, after another run with a timeout of 30 seconds (taking more than six hours), the input folder was down to 431 PDFs. Much of that decrease was due to the successful printing and elimination of numerous academic journal articles, which tended to be of relatively similar size (typically around 15-20 pages), for which a timeout of 30 seconds was enough in most cases.

At some point, it occurred to me that I might want to farm out more of the project to _KILLER.BAT, at the risk of making that filename a misnomer. For example, if _KILLER.BAT looked like this:

TASKKILL /f /t /im AcroRd32.exe
TASKKILL /f /t /im gui.exe
:: BEEP 250 500
FOR %%j IN (*.pdf) DO (IF EXIST "R:\PDFCopies\%%~nxj" DEL "R:\PDFCopies\%%~nxj")

then the long command would become a much shorter command, like this:

FOR %i IN (*.pdf) DO (START /min _Killer.bat && START /wait AcroRd32.exe /t "R:\PDFCopies\%~nxi")

The potentially wasteful deletion sweep (i.e., the FOR loop near the end of _KILLER.BAT) would run only for the PDFs remaining in the input folder. As such, this adaptation might not be ideal at the start of a project dealing with many thousands of PDFs. But at this point, when there were only a few hundred files in the input folder, that deletion sweep would be almost instantaneous. (Note that the BEEP command in this revised _KILLER.BAT was commented out. If I wanted it back, I could just remove the two colons from the start of the line. An advantage of putting such commands in _KILLER.BAT rather than in the long command was that I could edit _KILLER.BAT while the main command was running — changing its timeout, for instance — and the change would become effective as soon as I saved the revised _KILLER.BAT.)

For some reason, _KILLER.BAT failed to delete a small fraction of files from the input folder, after they had successfully printed to the output folder. I verified that the filenames were exactly the same. I was not able to figure out what the problem there might be. The output PDFs did open successfully. I had to delete those files from the input folder manually.

I decided to try another run with a 60-second timeout. But that didn’t seem to be having much of an effect. I tried again with a 120-second timeout. But now I noticed an odd thing: some files were still not printing. It was not that they were running out of time. The Adobe and Bullzip processes would seem to complete. But there would be no resulting file, and no bubble message announcing a successful printing. Nor was there any pile-up of unprinted PDFs in the Bullzip print queue. The job would just sort of go away. Often, it was not clear that the job even started.

It seemed I was approaching the point of needing to taking a semi-manual loop through the remaining input PDFs. I was able to do this with the aid of a feature of the TIMEOUT command: it would say something like, “Waiting for 90 seconds, press a key to continue . . .” The countdown would begin: 90 – 89 – 88 etc. Pressing a key at any point would curtail the countdown and move us along to the next file. Hence I was able to take a manual look at those PDFs that did not seem to be cooperating. I could also move protected files into a separate input subfolder. Given this capability, I set _KILLER.BAT to allow a 1000-second timeout, and ran the shortened “long” command (above) again.

I started that semi-manual loop but did not complete it. Nonetheless, it gave me a chance to inspect some PDFs that were printing but were not then being deleted from the input folder. Adobe Reader was tied up at the time, but I was able to use Adobe Acrobat (and would also have been able to use Sumatra or Foxit) to open the PDF and verify that it was OK. Also, I noticed that some printable PDFs that had not been able to print within a shorter timeframe did ultimately print within two minutes. To avoid having to sit there and watch, I changed _KILLER.BAT to begin with this line:

TIMEOUT 100 && BEEP 250 500

followed by a longer TIMEOUT or just a PAUSE command. The beep would draw my attention away from my ongoing work on my other computer at the point when most input PDFs would have completed printing, thrown up a “protected document” error, or otherwise do whatever they were going to do.

There were various options for what would display onscreen, as I worked through this somewhat manual examination of these residual files. One option was to change the AcroRd32.exe command line, so that the output PDF would appear onscreen. Another was to use the command line for Everything, the file-finding tool, so that Everything would open and would display the file being printed, so that I could see when and if a copy of that file appeared in the output folder. (A batch file command to run Everything in this way would apparently use something like “everything.exe -s [filename].” The everything.exe file would have to be in the system’s path or in the folder where the batch file was being run.) I tried this:

FOR %i IN (*.pdf) DO (START _Killer.bat && START Everything.exe -s "%~nxi" && START /wait AcroRd32.exe /t "R:\PDFCopies\%~nxi")

Some PDFs, after long delays, produced a “Fatal Error” message: “Acrobat failed to connect to a DDE server.” Although that message referred to Acrobat, evidently it was a holdover from Adobe Reader’s previous name (i.e., Adobe Acrobat Reader), because Adobe Acrobat was actually able to open at least some of these files. The problem seemed to be just that the files in question were too large for Reader. A search led to other suggestions as well: two instances of Reader trying to run simultaneously; conflicts with Adobe Air; need for a registry edit; try running Acrobat as a different (Admin) user; use Acrobat’s Help > Repair command (also available via Control Panel > Programs and Features); could be an antivirus problem; in Windows Explorer, try Tools > Folder Options > File Types > PDF > Advanced > Open > Change > uncheck Use DDE; change the logon; and so forth.

Next, it occurred to me that I could set up my command so that it would not need a preset delay (e.g., four seconds) before moving on to the next PDF; it could just sit there and wait for the printed output, for however long that might take. Consolidating several insights, I worked up a revised long command:

FOR %i IN (*.pdf) DO (IF EXIST "R:\PDFCopiesOut\%~nxi" (IF EXIST "R:\PDFCopies\%~nxi" DEL "R:\PDFCopies\%~nxi") ELSE (SET printing="R:\PDFCopies\%~nxi" && SET printed="R:\PDFCopiesOut\%~nxi" && START _Killer.bat && START Everything.exe -s "%~nxi" && START /wait AcroRd32.exe /t "R:\PDFCopies\%~nxi"))

At least at my level of batch programming skill, it was unfortunately necessary to repeat the directory names as shown, adding to the command’s length. Accompanying that long command, _KILLER.BAT had grown:

SET /a counter=0
GOTO waiter
TASKKILL /f /t /im AcroRd32.exe
TASKKILL /f /t /im gui.exe
RD /s /q "C:\Users\Ray\AppData\Local\Temp\BullZip\PDF Printer"
ECHO Printing %printing%
ECHO Waited about %counter% seconds so far ...
TIMEOUT 4 /nobreak && SET /a counter+=4
IF NOT EXIST %printing% GOTO resumer :: Because by now I might have moved it manually
IF EXIST %printed% (DEL %printing% & GOTO resumer)
IF %counter% EQU 60 BEEP 250 500
IF %counter% GEQ 3600 (MOVE /y %printing% R:\PDFCopies\TestManually\ && GOTO resumer)
GOTO waiter

In this case, I set a one-hour (3600-second) cutoff. EQU (“equal to”) and GEQ (“greater or equal than”) were among the available IF statement comparison options. The EQU comparison used here would draw my attention back to the computer running this program, to see whether the attempt to print it had yielded a relatively immediate error message. The RD line took care of the problem where drive C would fill up (ab0ve) and I would get error messages because of material accumulating in C:\Users\Ray\AppData\Local\Temp\Bullzip\PDFPrinter.

It seemed that this approach might have multiple advantages. For one thing, it would eliminate unnecessary delays for PDFs that would print in, say, 32 seconds, rather than wait for the entire preset TIMEOUT delay (e.g., 60 seconds) to elapse. In addition, I had noticed, along the way, that some of the pop-up bubble messages, indicating that Bullzip had finished printing a document, were telling me that a printed PDF consisted of zero pages. I thought that problem (discussed further below) might arise when the preset cutoff (e.g., four seconds) occurred at precisely the wrong instant in the printing process. This line of thought suggested that the four-second cutoff approach (above) was perhaps not the best way to reduce the large initial stack of input PDFs. To minimize the risk of creating nonworking output PDFs, I built an additional one-second TIMEOUT into the penultimate line of _KILLER.BAT (above). Here’s an annotated screenshot of how it all looked when it was running (click to enlarge):

PDF Program Screenshot 01

(A couple of updates to the notes printed in that image: (1) the RD line (above) eliminated the need to keep the PDF Printer folder open. (2) The PDFCopies copy would not necessarily vanish from the Everything window exactly one second later.)

Ideally, for purposes of verifying that the printing process had completed successfully, it seemed that one might want one’s input PDFs to have the same characteristics as one’s output PDFs — the same dots per inch, and so forth. If it was thus possible to produce output PDFs identical to the input PDFs, one could perform an exact comparison, using something like DoubleKiller or Beyond Compare, to identify any defective output PDFs. But no doubt that sort of solution would work only for some PDFs. At any rate, I was not inclined at the moment to start a separate project of learning how to verify that the quality of output PDFs, transformed into some standardized configuration, would be equal to the quality of the input PDFs.

These reflections suggested that I should take a look at the output folder, to see whether there were any output PDFs with obvious problems. After all, it would not be satisfactory to have gone to all this trouble to test input PDFs, only to find that, for instance, a certain kind of input PDF was producing defective output PDFs. That sort of outcome could mean that this approach to PDF testing was at least a partial failure. I decided to defer that inquiry until later (below), however; I wanted first to finish doing whatever I could to print the variously problematic PDFs in the input folder.

There were other difficulties. Sometimes, the Title in an input PDF’s Properties would be preset to something other than the name under which I had saved the file. That Title would evidently preempt the filename in printing. Thus, the output PDF’s name would not match, and manual intervention would be necessary in order to let the long command progress to the next file. Occasionally, I would have to move an input PDF generating an error message into either the ProtectedDocs or TestManually folders. [Later, I would find that PDFInfoGUI (below) could quickly produce a nice list identifying filename-Title mismatches for all PDFs in a folder.]

With the foregoing adjustments and delays, the long command ran for several days. When it was done, I had a few dozen PDFs in each of the TestManually and ProtectedDocs subfolders. I found that some in each had security restrictions that were not enforced by passwords and thus could be changed in Adobe Acrobat. Doing that enabled me (after saving and reopening the document’s Properties) to change its other aspects as well. A few PDFs could be modified by using Acrobat > File > Save a Copy. Even so, I was left with several dozen PDFs that used passwords to enforce their internal prohibitions against printing. For these, a search led to a variety of approaches:

  • Adobe Acrobat could have handled it, if all files in the folder had the same password and if I had known what that password was.
  • One comment raised the prospect that those “protected document” messages were due to Digital Rights Management (DRM) restrictions, and thus that printing would be feasible via tools that ignored DRM. Suggested possibilities included Sumatra and Okular (see also PDFCMD, above).

Following the advice on an About.com page, I started with GuaPDF. I started ten sessions and let each run for about eight hours on a different passworded PDF. None reported completion at the end of that time. Only two had reached GuaPDF’s reported 99% completion mark. I killed those and tried again with FreeMyPDF. That required manual uploading of each file individually. Except for one or two large files (their maximum was 250MB per file), that proceeded at a rate of maybe two PDFs per minute. FreeMyPDF was able to unlock all of the PDFs I submitted to it except for one large one (~150MB). It appeared the FreeMyPDF uploader was timing out before that file could be fully uploaded. Others with faster Internet connections might not have that problem. There were some locked PDFs that weren’t important to me; I did not bother trying to unlock those.

Having unlocked those PDFs, I was able to reset their names and other properties so that they, too, would be amenable to batch printing, and then I ran them through the same long command (above). Then I manually printed to PDF those files in the TestManually folder. For that purpose, in some cases it seemed I had better luck with Adobe PDF rather than Bullzip PDF in the printer dialog. Ultimately, I was left with just a handful of PDFs that could not be printed by either automated or manual processes, due to various problems within the PDFs themselves. I held those separate, for re-comparison with the CorruptedPDFinder and Acrobat Merge approaches (above). Otherwise, as I verified by running Beyond Compare comparisons between folders, all PDFs were accounted for.

So that concluded my effort to test my PDFs by printing a copy of each of them to an output PDF folder. There had been some rough spots in the process, so I ran it again, using the long command and the version of _KILLER.BAT last shown above. I excluded the PDFs that I’d had to print manually, and now I had removed the protections preventing me from printing other PDFs. Seeing how quickly most PDFs were processed, I also reduced the maximum TIMEOUT setting from 3600 to just 300 seconds and, as shown in the version of _KILLER.BAT shown above, I reduced the other TIMEOUT from five seconds to four. Those changes left me with a few dozen files that had not printed within the 300-second timeout; most of those did print when I increased the TIMEOUT to 1200 seconds. Overall, the process still took days.

I wondered how much of a difference a faster machine would make. If time had permitted, I would have done another run, to see how quickly the two machines would print a sample set of, say, 300 PDFs selected from my large set of files. Unfortunately, at this point I had to move on to the next phase.

Comparing PDF Page Counts

With the limits described above, I now had a complete set of output PDFs, corresponding to the backup set of input PDFs. I decided to test those output PDFs in terms of page count. That is, it seemed there must be a way of finding out the number of pages in a PDF; perhaps I could compare the pages in the input and output PDFs, to see whether anything had been lost in the print test of my original PDFs.

A search suggested that there might be a different ways to count pages in a stack of PDFs. There were GUI programs for the purpose (e.g., Tiff-Tools Tiff/PDF Page Counter; Tiff & PDF Page Counter FREEWARE 2.0; TTFA PDF Page Counter), and there were also tools and techniques that could achieve the same end (e.g., Nirsoft SysExporter; various programming solutions; pdfinfo in the bin32 or bin64 folder in Xpdf).

I decided to start with the GUI programs. Unfortunately, my attempts to download a couple of those programs were blocked by my antivirus program. TTFA’s (short for Tech Tips For All) PDF Page Counter installed OK. There was a video explaining it, but that seemed unnecessary; the program offered only a few options. Sadly, it crashed repeatedly when I started it and clicked its Add Files button.

I tried again, this time with pdfinfo.exe. I had already downloaded Xpdf, so I had that program’s 32-bit and 64-bit versions, and I also downloaded Pdfinfo from Softpedia. These were not the same program. Typing “pdfinfo” on the command line for the 32- and 64-bit versions from Xpdf both claimed to be running pdfinfo version 3.04 by Glyph & Cog. Both seemed to offer the same command-line options. I was not sure why those two programs were not the same size. In contrast against the dozen or so command-line switches offered by those Xpdf programs, Pdfinfo from SoftPedia gave me just one syntax option: pdfinfo path. In other words, I would type something like “PDFinfo D:\Folder\” to get information on the PDFs in D:\Folder. But no, judging from a screenshot that seemed to provide the only documentation or explanation, I would have to specify the individual PDF. That seemed to be how the versions of pdfinfo obtained from Xpdf would work too. So I would need to work up a set of commands, or a looping command or script, that would run pdfinfo on each PDF; and then I would have to figure out how to capture the multiline data from each PDF and isolate the “Title” and “Pages” lines.

That seemed like a lot of work. A discussion thread led instead to PDFInfoGUI, a frontend for pdfinfo. This impressive little tool seemed to have figured out how to do all that hard work of getting useful data out of pdfinfo. It quickly analyzed thousands of PDFs in a folder selected at random, including its subfolders. Its analysis indicated not only the number of pages but also a variety of other items (e.g., keywords, file size, creation date, encryption status). I was able to select multiple individual or grouped rows using Ctrl- or Shift-Click (or all rows via Ctrl-A) and export to a CSV file that, opened in Excel, could be swiftly pared down to just the filename and the number of pages. I wondered, for future reference, whether this information from PDFInfoGUI could also be used to detect whether a set of PDFs had been saved in PDF/A format or had been OCRed.

At this point, incidentally, I saw that one of these programs had succeeded in starting an installer for the Findwide Toolbar. I didn’t know what that was, and I didn’t know which program had managed this. I only hoped that killing the installer would put an end to it.

So now I ran PDFInfoGUI against the “before” and “after” sets of PDFs — that is, those input PDFs that I had printed to PDF, using the technique described above, and the output PDFs that resulted. The question, again, was whether the PDF printing process had created output documents containing the same numbers of pages.

That comparison ran much more slowly on the test machine, using files on a drive in an external USB dock, than it had run in my initial trial on my fast machine — raising, again, the question of whether some of the processes described above would also run much more quickly on a different computer. There was another speed quirk. I noticed that PDFInfoGUI ran much more quickly when I disconnected the network cable connecting my two computers. Not sure if speed would also be affected by a wireless network connection, or whether this sort of effect was peculiar to this computer.

At this point, I was returning to this project after some weeks away. My analysis of the resulting spreadsheet identified problems that I might have understood more readily if I were fresh on this material. But at this point, I was no longer so familiar with the troublesome filenames and other issues that may have arisen during this epic journey.

Within that spreadsheet, displaying the output of PDFInfoGUI tests on both the input and output folders, I saw a couple different kinds of problems. First, there were files that existed in the input folder but not in the output folder, and vice versa. There were also files that existed in both folders, but that did not have the same numbers of pages.

The first problem, involving filename mismatches, was largely a problem involving the tilde (~) character. I had several dozen PDFs whose names contained tildes. In my Microsoft Excel 2010 spreadsheet comparing PDFInfoGUI results from the input and output PDF folders, it appeared that the VLOOKUP function did not work properly with tildes. So I had to do some alternative Excel tinkering to do those comparisons.

There was one anomaly. Somehow, the PDFInfoGUI output filelist included an entry called “ecordchklst.wp            [PFP#790659330].” I was not able to find any file of this name on my computer. I could not tell how it got included. Possibly it was an artifact of some PDFInfoGUI procedure.

Some of the PDFs that existed in the input folder but not the output folder were actually in an input subfolder labeled “May Be Bad PDFs.” I had identified these as bad in my previous testing (above). These did not appear in the output folder because I did not attempt to print them as output PDFs, having seen that they could bog down the PDF printer for long periods of time and yet produce nothing.

A few PDFs existed in the output folder but not in the input folder because they had been printed under the name given in their Properties. As described above, for some files I had gone into Acrobat > File > Properties > Description and had seen that the files in question had a Title that was not the same as their filename; and in most if not all cases, my automated PDF printing process had chosen the input file’s Title, not its filename, as the name of the output PDF. The output from PDFInfoGUI did display both the filename and the title, so that tool would provide a fast way of identifying PDFs whose Titles might need to be conformed to their filenames, for purposes of avoiding this problem in the future.

Those difficulties accounted for all of the PDFs that appeared in the input folder but not the output folder, or vice versa. I still had to deal with the situation where, according to the information provided by PDFInfoGUI, my output PDF did not contain the same number of pages as the input PDF.

At this point, I was able to address the concern (above) arising from bubble messages that had popped up during the PDF printing process, informing me that some files were being printed with zero pages. Those messages appeared to be incorrect, resulting apparently from a bug in Bullzip PDF Printer: there were no zero-page output PDFs. There was, however, one case where PDFInfoGUI reported no count of pages in an output PDF. Inspection revealed that that file was corrupted. I was not sure why PDFInfoGUI balked at that one; it did produce page counts (although perhaps not accurate ones) for other corrupt PDFs.

I was now looking only at what had happened to the attempt to produce output PDFs from valid, good-quality input PDFs. From my test set of more than 55,000 good input PDFs, only 73 (i.e., about one-seventh of 1%) produced output PDFs with the wrong number of pages. So for the overwhelming majority of input PDFs, the printing process seemed to produce valid outputs. Manual spot checks offered further confirmation that output PDFs were readable. Combined with the earlier discoveries that PDF printing could identify problems with input PDFs (above), it seemed, in general, that it could make sense to print PDFs as a way of testing them.

In every one of those 73 cases where the numbers of pages in input and output PDFs differed, the input PDF had more pages. Almost half (37) of those output PDFs consisted of only one page — even though the input PDFs contained as many as 465 pages. Spot checks indicated that those single output pages were entirely blank. When I looked at the corresponding input PDFs, I saw that some had visible imperfections — where, for example, an image seemed to have been partly corrupted. Possibly all had imperfections, some of which might not be noticeable upon casual inspection. I was not sure whether that sort of imperfection would cause the one-blank-page PDF output. There was a possibility that there was simply something wrong in Bullzip PDF Printer, or in some other aspect of the testing process I used, or that perhaps the weird output resulted from some kind of ephemeral system irregularity (a/k/a glitch).

PDFInfoGUI also informed me that the PDF printing process had produced 36 output PDFs consisting of more than one page, but still having fewer pages than the corresponding input PDFs. The results here seemed to be all over the map. For instance, on one hand the output PDFs in two cases contained only 7% as many pages as the input PDFs; the latter had 88 and 1,735 pages, respectively. On the other hand, two output PDFs contained 97% as many pages as the corresponding input PDFs. Both of those input PDFs consisted of more than 100 pages. Manual inspection of those four examples demonstrated that pages were being output in good quality; the problem was just that, for some reason, some pages were being skipped. This could be valuable testing information, if the pages not printed contained some kind of flaw.

It appeared, then, that the combination of PDF printing and PDFInfoGUI had given me a fairly thorough way of testing PDFs and of identifying those that had problems.


This long post explores multiple ways of testing PDFs. I found two that worked: CorruptedPDFinder and Acrobat’s File > Create PDF > Batch Create Multiple Files procedure. I was not sure how thoroughly those had worked, however. Most of this post is devoted to the search for another PDF-testing approach, preferably one that would actually print the PDFs. I did not want to print large numbers of documents on paper, so I was looking for a way to print the PDFs to produce output PDFs.

Among the multiple methods I considered, by far the best was the one described at the end of the section on Adobe Reader PDF Printing approaches. That amounted to a single (long) command line that could be entered once, in the folder containing the PDFs to be tested, and that would automatically process all of those PDFs. That command assumed the existence of a separate output folder where printed copies of those PDFs could be saved. I also kept a backup copy of the input PDFs being tested. It would probably make sense to leave original PDFs where they were, and to do these tests on copies of those PDFs. PDFs across multiple drives could easily be located and copied to a single folder using the Everything file finder.

My testing did find corrupted PDFs. The three methods emphasized here — CorruptedPDFinder, Adobe Acrobat’s Batch Create procedure, and a command line approach using Adobe Reader — did not necessarily find exactly the same PDFs. It appeared, in addition, that PDFInfoGUI provided a way to verify that the PDF-printing method was producing output PDFs with the same numbers of pages as the input PDFs. PDFInfoGUI seemed to identify some PDFs that had produced output PDFs with the wrong number of pages. Possibly those PDFs had additional problems not identified by these other methods.

I was not able to conduct a careful comparison of the extent to which these three preferred approaches achieved inconsistent results, due to the amount of time and the degree of disorder that unfolded during the weeks of intermittent testing described above. But I did conclude that I had done a relatively thorough job of seeking ways to identify PDFs whose contents might be partially or entirely corrupt. With appropriate use, these approaches could alert a user to restore a good copy from backup before it was too late.

This entry was posted in Uncategorized and tagged , , , , , , , , , , , . Bookmark the permalink.

5 Responses to Bulk Testing and Printing Large Numbers of PDFs (Raw Version – Batch Tutorial)

  1. Dear Mr. Woodcock,

    Let me start by complimenting your very engaging blog. Regarding this post here, I specifically admired your thorough and well designed challenge set-up.
    As the author of CorruptedPDFinder, I would like to add a short comment. As you justly noticed, CorruptedPDFinder was not designed for “deep screening” of PDF files. It is, indeed, merely a ‘quick and dirty’ tool that can fetch PDF files in a folder (and sub-folders) meeting specific attributes that allows it to discriminate corrupted from non-corrupted PDF files. Unfortunately, as I did not receive any functional feedback on this tool thus far, I did not improve it further.
    Perhaps you would be interested in helping to improve CorruptedPDFinder. I could take a look at some representative PDF files that CorruptedPDFinder missed and attempt to improve its detection specificity. Otherwise, as it is open-source, you could also contribute to its programming code directly.

    If you are open to the idea, please contact me.

    Again, my sincere appreciation for your work.

    Kind regards,
    CG Silva.

  2. rocky vander says:

    Hello Sir,
    Thanks for such valuable post. I have to get a Bulk PDF Printer software for my firm. Can u suggest one? Also Kindly share the review about PrintConductor as well. Here is the link – http://www.print-conductor.com/articles/bulk-printing-documents.html

  3. Sam Tyler says:

    CorruptedPDFinder batch scans pdf files and identifies them as either ok, possibly corrupt, or corrupt. The program can delete or move corrupt files but not files identified as possibly corrupt. This aspect limits the program’s overall utility when scores or hundreds of files are under examination. CorruptedPDFinder earns 2 out 5 stars at Softpedia. That might be the reason why.

  4. maratovich says:

    Additionally : Automatic batch printing PDF files on different printers. Compose pages.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.