Capturing a Long Webpage: Merging Hundreds of Screenshots

At times, I wanted to capture an entire discussion thread, or other long webpage, in a single image or document. This post discusses what I learned about achieving that.

Contents

Summary
PDF Solutions
Capturing a Long Webpage in Screenshots
Choosing Panorama Software
Trying ICE and Photoshop
Trying Other Stitchers
Another Try with ICE
Dumb (Manual) Appending
Editing Composites
Final Manual Merge

.

Summary

I was not able to find a PDF solution capable of capturing a webpage longer than the equivalent of 22 pages of single-spaced text (or less than half that, for some purposes). The webpage in question ran to the equivalent of more than 35 pages. Hence, I sought to capture it as an image file. I did this by running a batch file that took screenshots as I scrolled down the page, and then combining those screenshots into a single image. The combination process was troublesome: Microsoft Image Composite Editor did a great job, once I figured out how to work around its weak spots, but it still left me with four long PNG images that neither it nor any other program seemed to be able to merge into one final image. I crossed that final hurdle with the aid of ImageMagick commands.

PDF Solutions

The best solution, for my purposes, would have been one very long PDF page. Unfortunately, Acrobat had a maximum limit of 200 inches (i.e., about 22 pages measuring 8.5″ x 11″ with normal margins), and this document was well over 30 pages. Moreover, files printed to Acrobat would cut off at 78″ in my experience (or 82″ according to another source), at least when printing to PDF via Distiller. Even worse, Acrobat would perform optical character recognition (OCR) only on pages of no more than 45″ long.

I saw reports that version 1.7 of the PDF format had a maximum length of 15 million inches, but there was another problem: regardless of what the PDF specification could theoretically accommodate, I would need an image or other source containing the full contents of the webpage I was trying to capture. The webpage itself was an obvious candidate; unfortunately, Acrobat and screen capture add-ons (to e.g., Firefox, my preferred browser) were limited in the length of what they could capture from webpages.

At this point, it seemed that, if possible, I should start by trying to create one long image capturing the full length of the webpage I was trying to capture. Ideally, as I say, I would make that image into a PDF and OCR it, so as to create searchable text; but if the PDF part fell through, at least I would have the image itself.

Capturing a Long Webpage in Screenshots

As summarized in a previous post, I could use my Shotshooter.bat batch file to take screenshots automatically at a specified rate. To do this, I would take several preparatory steps:

  • Get the desired webpage set up in my browser, starting at the top of that page.
  • Start Shotshooter, and view its output folder to make sure it was capturing screenshots.
  • Start Windows Task Manager via taskmgr.exe, and have it positioned to display the Processes tab > nircmd.exe, so that I could terminate just by right-clicking on nircmd.exe > End Process.
  • Rotate the orientation of the screen 90 degrees (in Windows 7, I did that via Control Panel > Display > Change Display Settings > Orientation > Portrait). This would expose more of the desired webpage in each screenshot. Ergonomically, to make this work, since I didn’t have a rotating screen and didn’t feel like standing my monitor on its edge, I had to turn my head 90 degrees. Once I did that, the necessary mouse motions felt relatively normal.

With those preparations, all I had to do was scroll down the webpage. I was not sure exactly how often Shotshooter would be able to take screenshots. The pace seemed to depend on system resources, and it was not necessarily consistent. But I would set it at 50000 500 (i.e., keep shooting for 50,000 seconds, one shot per 500 milliseconds).

As I scrolled down the webpage, Shotshooter would be taking screenshots. Note that I say I was scrolling down, not paging down. The photo stitching described in the next section seemed most successful when there was a lot of overlap between one screenshot and the next. I might have preferred paging down, rather than scrolling down, if my intention was just to convert each individual screenshot to a separate PDF page or other short file, as distinct from stitching them all together into one very long file.

Previous experience suggested that the situation would be complicated if the successive screenshots were not entirely consistent with one another. So, for example, if a screenshot included an embedded video, I believed it would be best to have that video paused, not running. Otherwise, each screenshot would differ, as the video continued to display different scenes, and the photo stitching software would become confused, because it would be looking for exact overlap, from one image to the next. As we shall see, later I came to see that this was not always necessarily an issue.

My initial belief was that, if the video or other changing element could not be paused in advance, it might be necessary to go through the screenshots manually, and keep only those screenshots (or those portions of screenshots) that showed exactly the same scene from the video — taking care to preserve enough overlap so that the stitching software would have enough material to work with. I found that IrfanView was very useful for moving through large numbers of images with just one click or keypress to move on. Editing out the unwanted pieces was just a matter of drawing a rectangle (via Shift-drag with the mouse left button) and then cropping everything outside that rectangle (via Ctrl-Y); then Ctrl-S to save the changed image.

Often, webpages would have material at the top, bottom, or sides, separate from the text or other material that I would want to stitch together. This side material would not change, as I paged down: for instance, the title of the website might continue to appear on each screenshot, and likewise for various advertisements and other material on the sides and bottom. This extraneous material would confuse the photo stitching software. The software would expect that the title would appear on the first screenshot, but not on the second one. In other words, what appeared in the lower part of one screenshot should be identical to what appeared in the upper part of the next screenshot.

The first step in getting rid of that side material was to make sure that each screenshot was taken in a consistent manner. I would want to be scrolling down the page while Shotshooter was capturing its screenshots; I would not want to be scrolling from left to right, resizing the window, or otherwise disrupting the simple, steady downward progression of screenshots. The second step in getting rid of that side material, in IrfanView, was to view a representative screenshot > draw a rectangle around the place where each screenshot would be displaying the material that I wanted to keep > go to File > Batch Conversion/Rename > Advanced > Crop > Get Current Sel. This would insert, into IrfanView, the coordinates of the rectangular area that I wanted to save from each screenshot. Of course, before proceeding with the batch processing effort, I would want to have a backup of my originals, and would also need to configure other settings in these IrfanView dialogs.

iview

Just as it was advisable to make sure that each screenshot overlapped perfectly with the ones preceding and following it, it would also be best to make sure that the entire webpage was ready for capture. So if this was the kind of webpage that would only show some of its results, waiting for the user to click “More” or keep moving downwards, it would be best to do that in advance: before starting the screen capture, keep paging down until the bottom is reached, so that the whole thing is available for viewing.

Note that the technique described in this post could allow capture, in a single image, of webpages whose information would normally be spread out across multiple webpages. For instance, the AutoPagerize add-on in Firefox (or some later improvement thereof) would apparently combine the results of websites like Google Search, where the user ordinarily sees numbers at the bottom of the screen (1 2 3 . . .) and has to click on the number (or on “Next”) to see more results.

After making and editing my screenshots, so that only the desired material remained, initially I believed that I needed to delete duplicates. It seemed advisable to do so, to reduce the load on (i.e., the potential for failure by) the photo stitching software (below). Shotshooter had been busy, during those moments when I dawdled: it could easily have created several (or several dozen) identical images. To get rid of them, after making a backup, I used VisiPics at its strictest settings. Note that VisiPics > Tools > Auto-select would automatically mark for deletion all but one of the images it considered identical, saving the user the trouble of having to click manually on each unwanted duplicate.

Choosing Panorama Software

To capture the entire webpage in a single image, the next step was to merge all those screenshots. Experience had taught me that the panorama software that I had used — principally Adobe Photoshop (via File > Automate > Photomerge) and Microsoft’s free Image Composite Editor (ICE) — would be easily confused by inconsistencies in the areas of overlap, from one image to the next, so — for these programs, at least — I believed it was important to observe the precautions noted above. My version of Photoshop, at least, would also be overwhelmed by more than a smallish number of images, so in this exploration I planned to work principally with ICE.

Those two were not the only programs capable of merging photos. Carl Cheo (2014) listed a number of alternatives: Hugin, Autopano, Autostitch, ArcSoft Panorama Maker, PTGui, PanoramaPlus, PhotoStitcher, and Panoweaver. From his description, the most appealing ones for my purpose seemed to be PTGui ($86; “can create gigapixel panoramas from thousands of images”) or Panoweaver (probably the $100 Standard Edition, for my needs, but with a $400 Professional Edition alternative), and possibly also PanoramaPlus ($50) or PhotoStitcher ($20). Lifehacker (Purdy, 2008) offered tips on taking panorama photos, and recommended Hugin, but that advice had apparently been superseded by the time of Cheo’s article. (See also AlternativeTo.) Gizmo (2015) preferred ICE as the best stitching freeware, with secondary mention to Autostitch, Windows Live Photo Gallery, and Panorama Perfect Lite. Beebom (2016) named Hugin, ICE, Kolor Autopano, Autostitch, PTGui, PanoramaPlus, ArcSoft, and Panoweaver. MakeUseOf (Coelho, 2016) offered a list including online as well as installed programs: ICE, Windows Live Photo Gallery, Autostitch, Hugin, Dermandar, and Google Photos. Wikipedia provided a seemingly comprehensive list and comparison of several dozen photo stitching programs.

My feeling, after reviewing those recommendations, was that ICE headed enough lists, and was received with enough enthusiasm, that if it wouldn’t do the job for me, I would probably have to explore the trial versions of PTGui and PhotoStitcher, with perhaps some halfhearted attempts with one or two other free alternatives.

There was, by the way, a question of how large a photostitching project might be. A search led to an Adobe discussion (2008) in which someone asked for recommendations regarding a project involving up to 15,000 images. One participant referred to a New Yorker article (Preston, 2005) describing a photography project in 1998 for New York’s Metropolitan Museum of Art. The purpose of the project was to capture a digital representation of certain medieval tapestries. The tapestries — indeed, the article itself — are works of art. The data for the digital images filled more than 200 CDs. It took two mathematicians and a specially designed supercomputer to achieve a full-size 13-foot reconstruction of one such tapestry. Great story. But I digress.

Someone else in that Adobe discussion recommended Autopano Pro, for working with more than 100 images at a time. Even in 2008, though, someone else was recommending ICE. And that’s where I was going to begin.

Trying ICE and Photoshop

This time around, it seemed I had done a relatively good job of preparing the images so that they could fit together. After cropping, deleting duplicates, and taking the other steps described above, I had 190 PNGs, ranging in size from 46KB to 233KB. Those small sizes were probably due to the fact that these particular images captured mostly text onscreen, as distinct from images. The small filesizes probably contributed to my success, such as it was.

I started a new ICE session and imported all 190 PNGs. ICE accepted them. I clicked Stitch. ICE proceeded with its steps of aligning and compositing images. It succeeded in producing a single image. The image displayed a slight curvature or S-shape varying from true vertical. Of the various forms of projection ICE offered, the transverse ones seemed to do best. I chose Transverse Cylindrical and exported the resulting file as a PNG. The file was 23.1MB and measured 920 x 52018 pixels. IrfanView was able to display the whole thing.

Aside from the bowing (i.e., curvature), the only immediately apparent problems arose in connection with the two images that I had edited to remove conflicting screenshots of videos playing. ICE did merge those, but it did not preserve a constant text size. Instead, the ones that I had edited were allowed to swell. The text sizes were no longer the same, and the paragraphs didn’t align.

I tried again, hoping to combine those edited images with the ones immediately preceding and following them. ICE did a better job in that case, but still not great. With just a handful of files to combine, Photoshop did a better job than ICE. So I saved the two Photoshop merges as PNGs and included those two PNGs in the larger ICE project, in place of the images that I had merged. But now the project became complicated. Material was getting left out. Things got a little confusing.

I made another start, this time using ICE only on those sets of images that I had not individually edited. In other words, I was trying the theory that ICE worked best with images that were all of the same size and type. That seemed to work: the ICE segments seemed to come together OK. (I say “seemed” because this was the equivalent of about 37 pages single-spaced, and I was trying to avoid repeatedly having to read or even skim through it.) Then, in ICE, I was able to combine the small groups of photos that I had manually edited, surrounding two videos where I’d had to delete dissimilar repeated images. This left me with five composite or panorama images. I tried combining them in ICE. Two could combine; the other three couldn’t. So now I had four composites to combine manually.

I created a large, empty image in Photoshop. I opened the first of the four composites in IrfanView. I hit Ctrl-C to copy from IrfanView and Ctrl-V to paste the first composite into my empty Photoshop image. I pasted the second composite into Photoshop and attempted to move it into position, near the bottom of the first one. This gave me an error: “Could not complete the move command because the result would be too big.” A search led to an old (2004) indication that Photoshop had limits of 30,000 pixels or 2GB. Another source repeated the 30,000-pixel limit as of 2013, though others were talking about much larger size limits (e.g., 300,000 pixels) circa 2016. Windows Explorer > Properties > Details tab told me that the first of my four composites had dimensions of 555 x 17627 pixels. I started over with an empty Photoshop image measuring 1,000 x 300,000 pixels at 72 pixels per inch. Photoshop said the resulting file would be 1.68GB. I added the second composite with no problem. But again, when I attempted to move it even the slightest bit, I got the “too big” error. Actually, I got the error when trying to move even the first image added to the blank.

Trying Other Stitchers

I went back to the list of recommended software alternatives. Sticking with the free options for the moment, I saw on Softpedia that Autostitch 2.2 (updated October 2013, 3.5 stars from 9 raters) lagged Hugin (September 2016, 5.0 stars from 1 raters). I downloaded and installed the 64-bit version of Hugin. It crashed, the first two times I started it and hit the “Load Images” button, after which Windows 7 popped up a Program Compatibility Assistant notification that Windows had applied compatibility settings and hopefully this would fix things. It did: we progressed. I handed Hugin the four composites that I’d been wrestling with. Now I got another notice, this one telling me that Hugin had generated a debug report. I clicked OK. Hugin disappeared. I restarted the program again. This time, Hugin accepted the four composite PNGs, except that it said, “No or only partial information about field of view was found in image file [name of first composite].” I had no idea what the hell they were talking about. I clicked Cancel. Same thing for the three other composites. I clicked through the Hugin tabs. Greek!

I could have looked for Hugin tutorials, but instead I bailed out and turned to Autostitch. Softpedia said it was good only with JPGs. We had not traveled these many miles, finally boiling down those 190 PNGs into just four, only to be told they would have to be converted before Autostitch would do anything with them. Or perhaps we had. The Softpedia review enthused that the program was “packed with limited features,” which sounded about right: apparently it wouldn’t even let me name the output folder or file. Given the 3.5 stars, I was not optimistic. But I tried it. IrfanView converted the final four PNGs to JPG, and I fed them to Autostitch. Well, after Hugin, it was certainly easy to use. It quickly completed its task and proudly showed me something that would not be entirely out of place in a modern art display. Next, please!

I decided to try a commercial program. PhotoStitcher was cheaper than PTGui, so I started there. It installed swiftly. I went to Edit > Add Images > Stitch. It asked for a warp surface. I believed the question, here, was whether this was perhaps a 360-degree panorama or something. I chose Plane. It said, “Cannot stitch, not enough key points.” Well, how about if I started over, giving it the original (cropped) images, before I had even tried to edit the ones with video elements? I seemed to have misplaced the set from which I had removed duplicates, so this was a somewhat larger set of 227 PNGs. At the same time, just to be sure, I ran ICE again, on this exact same set of 227 PNGs. Between the two, I much preferred ICE’s progress monitor, which led me to believe that the program was not simply hung. Both took a while. The difference was, PhotoStitcher eventually crashed, while ICE improbably produced a representation of the entire set of PNGs. Let it not be assumed that a commercial product is necessarily a more capable product. When I uninstalled PhotoStitcher, I agreed to answer their question as to why I was uninstalling. Their webpage notified me that I was uninstalling version 1.6, and asked if I would like to try version 3.0. But back at their download page, they were offering me version 2.0 — and the Changelog on that page only went as high as 1.6. They seemed confused.

I looked at what ICE had given me. Some pretty significant curvature and, unfortunately, it was a mess — it started in the middle and ended in a different middle. I mean, the pieces were not in order. So why hadn’t it just balked and given me a broken set of pieces, as it had done in other projects? My guess was that it was able to focus on only so much. If you gave it too much material, it would lose focus and would see things as being similar, and therefore stitchable, even if they weren’t.

I decided to try the free trial version of PTGui. Their product comparison page indicated that the Pro version had some advantages over the standard version, but I didn’t know enough about photo stitching to understand those advantages, and their purchase page indicated that the price difference was $86 vs. $162. I doubted I was going to spend either, but I was pretty sure I was not going to spend the latter. They offered video tutorials and a support/FAQs page. I went to Load Images and designated the 227 PNGs. It gave me a notice, “EXIF data was not found in the image(s).” I did not know or care about Exif data for these images, so I clicked Cancel. It gave me the option of selecting images, but I just went with the default next step: Align Images. After a moment, it said,

PTGui has analyzed your images but was not able to match all of them. You will need to add a few control points before the panorama can be stitched. These control points tell PTGui which parts of which images should overlap.

I agreed to go ahead and add control points now. It gave me a list of a large number (estimate: about 140) “orphaned” images. In addition, it provided a list of clusters, which it described as follows:

In the following groups, images are linked by control points within each group, but there are no conrol points that link between an image in one group and an image in another group.

There were many such groups (e.g., it said slides 0,1, 2, 3, and 4 formed a cluster, and also slides 5 and 6). It appeared that these clusters might account for all of the remaining PNGs that were not in the first group of ~140. In other words, out of the box, PTGui was not readily identifying links that ICE and Photoshop had been able to detect. I watched most of their video, “How to Stitch a Panorama,” but did not obtain any additional enlightenment. The advice was to identify at least four control points between each pair of related photos, so providing the information that PTGui needed would have been an enormous task in this case: identifying about 1,800 control points altogether (i.e., 4 x 2 x 227).

I went back to Hugin. This time, I fed it the 227 relatively raw images, instead of the final four composites. (To my dismay, I had either lost or not kept my backup of the more advanced set of images, from which I had removed duplicates.) Hugin informed me that it lacked camera and lens data for the first image in the set. It was insisting on that information. There was no lens — these were screenshots — so, as before, I could only click Cancel. It didn’t seem to have done anything, so I clicked Load Images again. It seemed unresponsive, but apparently it was just thinking. Then it indicated that it had loaded the 227 images. I went through the Preview and other tabs. There didn’t seem to be anything I was supposed to be doing, but also no obvious way to proceed. The tutorial said I might have to spend a lot of time stitching together my control points. It referred to “Hugin’s Control Points tab” — but there wasn’t one. Back in the Assistant tab, I saw that the buttons were numbered: 1. Load Images. 2. Align. 3. Create Panorama. So I should have ignored the other tabs and just clicked Align. When I did that, Hugin indicated that it was looking for control points. The Assistant appeared to stall after image 226. I assumed it was just thinking. Eventually, it said, “Detection took 1128.78 seconds,” or about 19 minutes. It told me that it had written its output to a temp file, and said, “Statistically cleaning of control points…” (sic).

At that point, Hugin stayed in orbit for hours. In fact, it never did come back to Earth. It was still running when I had to reboot the computer for an unrelated task, half a day later. Before shutting down, I scrolled back through the information presented in that dialog. Most of it seemed to consist of reports of points of commonality between each image and each other image. Example: “i55 <> i80 : Found 7 matches.” Apparently 7 matching points was not enough to persuade Hugin that image 55 belonged next to image 80. Earlier in this listing of findings, it said “i55 <> i56 : Found 21 matches.” That was presumably more like what it was looking for; after all, image 55 did belong with the image immediately following it. But not sure: Hugin also found that “i55 <> i59 : Found 21 matches.” Whoa! Confusion! No wonder Hugin got lost within itself. If it was looking for similarities the size of a single letter of text, it should have found hundreds of matches, between i55 and i56.

Another Try with ICE

While Hugin was thinking, it occurred to me to try again with the set of 227 images in ICE, taking them in several smaller groups, and then trying to combine those, without the confusion that I had apparently caused by trying to edit the images containing inconsistent snapshots of embedded videos.

I started with the images preceding the first embedded video. It seemed to succeed, but the result was wedge-shaped, with a wide top and a narrow bottom, when of course I would want the webpage to be portrayed in the shape of a straight vertical line, displaying the same width from top to bottom. I went back to the Import phase and tried various Simple and Structured Panorama options. The best choice seemed to be Simple Panorama > Camera Motion > Rotating Motion. That worked. Inspection confirmed that the images had been combined in the correct order. But I hadn’t had to deal with that until now — I had just used Auto-orientation — so apparently there was something unusual about this particular set.

Then I combined all of the images containing the first embedded video, even if the video portion showed incompatible scenes. To my surprise, that worked too. Maybe it helped to have multiple images that were otherwise identical: in that case, apparently ICE was smart enough to overlook the changing video scenes — to just choose one of them, and go with it. So it seemed I had caused my own problems, in the previous effort, by editing the images to remove all but one of the various video scenes. Better to just give ICE a set of identically formatted images, and let it figure out the right answer for itself.

Next, I repeated those steps for the remaining images, up to the second embedded video, and again with all of the images showing the second embedded video, and finally with the images after the second embedded video. That last set was large — 125 files — and it completely confused ICE: the resulting panorama looked like a pile of spaghetti. I tried again with the first 46 images in that set of 125. Still no go: a nice curlicue. I wasn’t sure why ICE was having a problem with these, when it had sailed through a much larger set previously. Another try, this time with 25 images. That worked. I wondered whether maybe the problem was with something in that set, so I tried the next 50 images. That worked — and, for some reason, the result did not have the slight weaving (a/k/a bowing or curvature) I had seen previously. But the final set of 50 images did not work, so there, again, I had to take them 25 at a time. Even that failed. Likewise just 15 images. I wondered if ICE was getting lost in its memories. I killed it, restarted it, and tried again with the entire final set of 50 images. But no, that still produced the same set of linear fragments I had seen previously.

I wondered what could be causing the problem. As I viewed the remaining images, the only possible culprits I could spot were a few embedded images. Had I been wrong in previously thinking that videos were the cause of confusion in ICE — was the program actually balking whenever it found an image in these captures of webpage text? I tried that hypothesis, merging all of the files before the first (remaining) embedded image, and then all of the files containing that embedded image — and likewise with the files before, containing, and following the other remaining embedded image. This worked, although there was some relatively severe curvature in one of these composite results, and as far as I knew ICE itself lacked a good way of correcting it.

curvature

Those efforts left me with eleven composite PNGs, containing my 227 individual screenshots. Unlike the first set of composites, these were all constructed entirely of identically formatted images. Would ICE be able to combine these into one long panorama? It failed in my first try, attempting to combine them all at once. It was able to combine the first six composites, but they were confused. I tried again with the first three. Still confused. First two? This time, “Could not stitch together any of the input images.” I tried again with the full set of eleven, specifying Simple Panorama > Rotating Motion. How about Structured Panorama > Rotating Motion > 2% Vertical? I couldn’t find a working solution. Starting from identically formatted images was not the answer. Once again, it appeared that ICE would get me only partway to my goal.

Dumb (Manual) Appending

All of these techniques had involved reliance on the software, to detect where the images should be connected. But now that I was down to just a handful of composites, it seemed another approach might work. What if I edited the composites so that the one ended exactly where the next one began — if, that is, I removed all overlap — and then used some program to butt them up against one another, allowing B to pick up where A left off?

I would have been glad to do that in Photoshop, but the files were too big. But maybe all I needed was some command-line solution that would tell the computer to just Combine Files A and B.

A search led to a suggestion that I interpreted as this DOS command: COPY A.PNG+B.PNG C.PNG. That produced C.PNG, but it wouldn’t open. Several (1 2 3 4) other search results pointed toward ImageMagick. My previous attempt to use ImageMagick hadn’t gone well, but I thought I might try again. Another post describes the steps that I took to pursue that approach.

By the time I finished that, it had occurred to me that I could probably use ICE to achieve the same thing. It seemed that, at the Import stage, I would have to specify Structured Panorama with two rows and one column, so that the images would line up vertically. I could also play with the Vertical overlap option (check “Preview Overlap”) to see how much the two images were overlapping. Presumably vertical overlap would be zero, and overlap preview would be unnecessary, if I had already cropped the top of the second image to fit precisely at the end of the first one.

Having gotten familiar with the ImageMagick approach by this point, I didn’t like the ICE approach as much; it didn’t seem to give me as much precise control over horizontal as well as vertical match-up between the two images. That said, at my beginner state of ImageMagick expertise, ICE would seemingly make it a lot easier to achieve a manual merge of multiple images. But when I clicked the button to stitch the first two images in ICE, I saw that it had stitched the two images so that the second one was much wider than the first. It appeared there was going to be a learning curve, for me, with ICE too — if, indeed, it could even do what I was trying to do.

In any case, if it was going to be a manual process, it seemed that I might as well work with the composite images that I had created in the first approach (above), where I had used Photoshop to combine a few problematic screenshots containing video or other images, and had thus managed to boil down the whole set into just four PNGs. That would seem to be be a lot easier than figuring out the commands and dimensions needed to manually join the eleven composites that I wound up with in my second attempt.

Editing Composites

Before attempting a final merger of any images, I wanted to review the question of how bad the curvature was in some of the longer composites, and what I could do to reduce it.

By far the worst example was the one shown above, curving sharply to the right in its middle. Results from a search suggested that (among others) Photoshop offered a Warp command. To use it, with the image onscreen, I had to go the bottom right corner of the Photoshop window > right-click on the Background layer > Duplicate Layer > click on the image > menu pick: Edit > Transform > Warp. To explore the Warp possibilities, I could start by going to the Warp: Custom box at the upper left corner of the screen, click on the drop-down arrow, and choose various options. This was just experimental: the effects would not take effect yet, and I could revert to the starting point by choosing the None option.

Among those drop-down options, I could choose Custom. This would divide the image into a 3×3 grid, with handles around the perimeter. I could click on those handles and drag them in various directions in pursuit of the desired anti-curvature effect. It was easier to see what I was accomplishing when I hit Ctrl-+ (i.e., Ctrl-Plus) a few times to zoom in. At first, it appeared that the solution for this particular curvature was to drag the four handles at the top somewhat to the right, and likewise the four handles at the bottom, so as to fill the empty space at the top and bottom right corners. But as I proceeded, I saw that I actually had to move all of the handles somewhat, to prevent distortion at various places around the image. (Note: there was also the option of postponing this editing until the end, after all PNGs had been combined.)

Photoshop was not cheap. Apparently Photoshop Elements did not have the Warp feature. Microsoft Paint didn’t seem to have anything of the sort. Lightweight freeware editors (e.g., SharpShot) would also probably tend not to. There may have been some other less expensive commercial software options. GIMP reportedly had only a limited capability along these lines. It appeared that BeFunky might be able to do it. I found a discussion suggesting other possibilities.

Final Manual Merge

So ICE had taken me as far as it could, giving me (in the first try) a set of four composite PNG files; and I had done whatever editing I was going to do with them — removing vertical curvature, in particular, and deleting overlapping material. Those four PNGs now captured the entire content of the webpage, and they contained no duplication. I could go ahead and butt them up against one another manually.

To do that, to summarize the solution reached in the other post, I used an ImageMagick “convert” command to create a blank canvas long enough to accommodate all four PNGs. I was able to determine the needed length by using Windows Explorer > right-click > Properties > Details for each image, adding up their lengths, and then allowing at least a few hundred pixels extra for both length and width. Once I had that blank canvas — referred to, here, as Canvas.png — I was able to use a series of ImageMagick “composite” commands to put these four PNGs in place on the canvas, one at a time.

Determining the vertical location of each successive PNG was not hard. In the other post, I described how I used IrfanView to view the images and determine the right locations for the top left corner of each successive image. Having already edited out the overlap, I could arrive at the vertical location by just making a list of the Properties > Details for each image, and then doing the math. (It would develop that I should especially notice differences in the horizontal coordinates, because somehow some of these PNGs had gotten slightly resized, and I could use those horizontal differences to tell me the percentage of resizing in IrfanView needed to make them all visually consistent.) I looked at the results manually, making trial runs, to work out the best horizontal coordinate. The resulting commands were as follows:

convert -size 800x59000 xc:black Canvas.png
composite A.png Canvas.png Canvas.png
composite -geometry +0+17498 B.png Canvas.png Canvas.png
composite -geometry +5+23881 C.png Canvas.png Canvas.png
composite -geometry +0+25939 D.png Canvas.png Canvas.png

In each of those composite commands, the next file (i.e., A, B, C, D) would be added to the updated version of Canvas.png. These steps worked: I ended up with a good final PNG image capturing the entire webpage.

Advertisements
This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s