Windows 7: Batch Printing HTML Files as PDF

In a previous post, I explored ways to print many MHT files.  I wanted to revisit the matter, perhaps finding a more streamlined way to complete the project.  This time, I was working with HTM and HTML rather than MHT files.

I began with the PrintHTML option mentioned in a comment to that previous post.  For some reason, it wasn’t working for me.  Example webpages led me to wonder whether PrintHTML worked in Windows 7.  I had previously used VeryPDF, but I believed there was a limit on how many files you could convert with that program before you would need to buy a copy.  VeryPDF offered a bewildering variety of PDF tools, and I was not sure how much they all cost, but one of the recommended ones cost $299.  I decided to keep looking.

I flailed around quite a bit.  I looked at HTMLDOC, but it turned out to date from 2006, and thus apparently did not support more modern webpages.  I looked at a number of webpages suggesting various command line approaches that did not work.  I did not keep notes, but I did burn up several hours on efforts that other people said would work, and that seemed like they should work, but that just were not working for me.  Possibly they were using Windows XP; possibly there was some other reason that escaped me.

Ultimately, I came back to the wkHTMLtoPDF approach described in the previous post.  There appeared to be a newer version, so I downloaded and installed the Windows version (wkhtmltox-0.11.0_rc1-installer.exe).  As I could see from the Everything file finder, the installation gave me C:\Program Files (x86)\wkhtmltopdf\wkhtmltopdf.exe.  I put a copy of that file in C:\Windows.  That way, it would run on the command line in any folder where I might be working.  To test it, I typed “wkhtmltopdf” on the command line in another directory.  It gave me an error:

wkhtmltopdf..exe – System Error

The program can’t start because libgcc_s_dw2-1.dll is missing from your computer.  Try reinstalling the program to fix this problem.

The actual problem, I found, was that I had not copied that and three other .dll files from C:\Program Files (x86)\wkhtmltopdf to C:\Windows.  Once I took care of that, I was back in the game.  I started with a simplified version of the command I had tried last time:

start /wait wkhtmltopdf “D:\Workspace\BIOS.htm” “D:\Workspace\Output\Testfile.pdf”

where BIOS.htm was the name of a file I was trying to print.  It worked, but it defaulted to the European A4 paper size, so I had to use the additional instructions I had used last time:

start /wait wkhtmltopdf -s Letter -T 25 -B 25 -L 25 -R 25 –minimum-font-size 10 “D:\Workspace\BIOS.htm” “D:\Workspace\Output\Testfile.pdf”

and that worked.  Unlike the situation I’d had last time, with MHT files, it looked like these HTML files were going to print directly into a nice-looking PDF file, with no further hassle.

Now I had to come up with my list of commands, one for each file to be converted to PDF.  For this, I used my usual combination of DIR and Excel to produce a set of commands that I could then paste into Notepad, save with a name like converter.bat, and execute.  This process gave me a series of commands, each like the one shown above.

Actually, I used that sort of procedure for several different steps:  to identify the .HTM and .HTML files to be converted; to write batch commands that moved them all to D:\Workspace; to produce the PDFs; and then to move the PDFs back to where the .HTMs and .HTMLs had been, to replace them.  Since I had not specified a source path in the command shown above, I would have to run that converter.bat file in the folder containing the HTMs.  That went without any major problems.  End of project.

This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

4 Responses to Windows 7: Batch Printing HTML Files as PDF

  1. kurtosis says:

    You’ve got some very helpful stuff here, but for someone like me who’s wary of putting the time into command line operations, mainly because it always seems to mean a painful learning experience, I’ve found something else that worked for converting about 4000 .mht files to pdf. This was after I’d considered macros in Firefox, macros in Word, AutoIt, and who knows what else. I found the Firefox addon based on wkhtmltopdf but for me it constantly crashed. I already use the PDF Xchange printer driver, which nearly always does a good job of printing web pages direct from the browser to pdf. However, I couldn’t find any way of getting it to batch print my mht files, as it needed to load each one into Internet Explorer or Firefox first (another Firefox extension to read mht files) for manual printing of each file. What I ended up doing was to instal Peernet File Conversion Center, and using it to batch print to the PDF Xchange driver. I got the nice and generally reliable print output of the driver, and by tweaking the driver settings (for example unchecking a box which was bringing up a “Save as…” dialogue) the process automated and printed a whole directory to pdfs with the text already recognized (unlike some programs), after one click to run the job. One nice touch was that I got a list of files that for whatever reason failed to print, so I could go and do those few manually. You can get a lite version of the driver and a time-limited demo version of the conversion program, so — end of my project!

  2. Ray Woodcock says:

    CNET lists Peernet as $190 shareware.

  3. Paul Gabor says:

    After a lot of work, basically on the same way, your solution worked (wkhtmltopdf) very good!
    Thank you!

  4. sunk818 says:

    This works great for converting MHT to HTML:

    It is free and you can use 7-zip Portable to Open the Inside of the Archive to retrieve the MHTM Converter.exe and run it standalone.

    I don’t mind wkhtmltopdf making A4 by default. Most PDF programs should resize the paper to letter automatically for you. If there is a way to save the parameters to a .ini file that would be better for me going forward though…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.