I acquired more than 10,000 JPGs. It tentatively appeared that some of them would interest me, and some wouldn’t. Unfortunately, their filenames were often very uninformative. I would have to take a manual look at “Picture 001.jpg,” for instance, to see whether it was a keeper. I preferred to be able to sort photos by filename, where possible — doing a search for all files whose names referred to last year’s office picnic, for example.
Even when filenames were more informative, I suspected (and I would find) that these photos probably had much more information in their metadata. (A discussion of metadata follows.) I wanted to get that information out into the filename, where possible, so that my efforts to sort files by filename would capture most of those whose present filenames were not highly informative.
This post describes the steps that I took to make the names of these JPGs more informative and to sort them as desired. Some of these steps required some work. Depending on the size and nature of the project, however, and on the user’s skill level and interest in learning how to do this sort of thing, these steps may be helpful.
Early sections of this post take pains to describe detailed steps in the use of commands and Excel formulas. Later sections are written in a more summary style, on the assumption that users proceeding step-by-step will have learned what they need by that time. Some users may want to skim the details of those early sections. If you find better tools or techniques for these tasks, please mention them in the comments.
Getting Metadata with ExifTool
Streamlining the Collection
Matching Up the File Count
Renaming Files to Include Folder Name Information
Picking Out the Most Useful ExifTool Metadata
Redundancy and Renaming
Reducing Duplicates and Other Cleanup
Categorizing, Combining, Presenting, Storing, and Disposing
To start with, I made a backup of the files I would be working on. To keep the files in the backup removed from file tools that could find them in any folder (in e.g., a search for *.jpg in the Everything file finder), I made the backup in the form of a ZIP file.
I took Gizmo’s advice to download and install Zoner Photo Studio 18 (not 19) (rated 4.1 stars by 156 users at Softpedia). Gizmo felt this image viewer had better features than other tools mentioned by Gizmo, Tom’s Guide, AlternativeTo, or others. Zoner was adequate for my purpose, anyway, which was simply to see what metadata the JPGs might contain. To do that, in Zoner’s left-side column, I selected Computer and browsed to the relevant folder, and then moused over the (i) circle at the upper left corner of each displayed image file. I could also see more detailed information about the JPG by clicking on it and viewing Zoner’s right-hand pane.
Seeing the information was fine, but I wanted to use it to rename the file. To do that, I tried Zoner’s export option (Menu > Information > Data Import/Export > Export Descriptions). That gave me a choice of exporting either the Title or the Description of some or all files. I did this from the top folder containing the files of interest, checked the box marked “Preserve remaining files’ descriptions,” and sent the output to ZonerOutput.txt. That put a file called ZonerOutput.txt in each subfolder under that top folder. I had a lot of subfolders, so this approach was going to require me to find a way to assemble the contents of all those ZonerOutput.txt files into one list, in order to complete my renaming project. I could have done that, using Q-Dir and Everything as detailed in another post, but I decided to look for something different.
In a recent effort to verify MP3 files, I had found that TagScanner (4.5 stars at Softpedia) could export metadata on a folder full of MP3s. Unfortunately, it seemed limited to MP3s. MediaInfo, another tool coming to my attention in that effort, could handle a variety of audio and video formats, but apparently not still images. The situation seemed similar with FFmpeg and ffprobe.
Getting Metadata with ExifTool
Further exploration led to ExifTool (4.7 stars on Softpedia), which ran from the command line, and Exif Pilot (3.0 stars), which had a GUI. Given the sharp difference in ratings, I started with ExifTool. The installation instructions suggested, as a first step, to put the downloaded executable (named exiftool(-k).exe, but renamed by me to be simply ExifTool.exe) in the same folder as the JPGs (or, optionally, put it in C:\Windows, or add it to the computer’s PATH). I also needed a preferred way to open a command line in that folder.
Then I had to work up the proper ExifTool command syntax. I didn’t know much about tags, and the documentation was not helpful. I found the best way to get information about the tool was to use a targeted Google search. One such search led, for instance, to an explanation that the -r option would tell ExifTool to recurse through subdirectories. A proffered list of ExifTool Tag Names baffled me, as it did not seem to include any of the several tags that other pages in the documentation characterized as legitimate ExifTool tags (e.g., Title, CreateDate). Eventually I gathered that different sources (e.g., Adobe) had potentially incompatible sets of valid tags. Piecing together hints from the FAQs and the Examples page and a separate discussion, I came up with two commands:
exiftool -csv -r > ExifTitles.csv exiftool -r -filename -title > ExifTitles.txt
My understanding was that the first one would export all tags from the files in that folder and its subfolders, while the second would export only the named tags (i.e., filename and title). In that case, the first would be better for getting everything, but would have the drawback of producing a potentially large number of diverse tags, while the second would provide only what I asked for. The CSV file output would ordinarily open in the system’s default spreadsheet program (Excel, on my machine), while the TXT file output would open in Excel.
I tried the first command. The resulting CSV file was empty. The command line returned an error:
Error creating temp file C:/Users/Ray/AppData/Local/Temp/par-526179//exiftool_do
So ExifTool can’t print the help documentation.
I thought maybe the problem was that ExifTool was trying to save the CSV file in that C: folder, rather than in the D: folder where the pictures were located. But then, no, that wouldn’t make sense; the CSV file was created in the D: folder. In response to the questions raised in the only relevant page that turned up in my search, I found that the named file (C:\Users\Ray\AppData\Local\Temp\par-526179\exiftool_doc.txt) already existed, that the file contained the help documentation, that I could write changes to it, and that I had Full Control permissions to its folder. The point of the error message, anyway, seemed to be that my command syntax was incorrect, and that for some reason ExifTool was unable to open the documentation.
That particular documentation TXT file was more readable than the version that got streamed on the command line, so I reviewed it. It seemed to say that I was not required to specify a tag (e.g., Title) when I was just reading the JPG file data. But apparently I did have to specify a file. I tried again:
exiftool -csv -r *.* > ExifTitles.csv
That seemed to work. The command line response was, “3 image files read.” But it hadn’t restricted itself to image files. The CSV file indicated that ExifTool had gamely attempted to read tags from the three files in the top-level folder. Those files happened to be ExifTool.exe. ExifTitles.csv (then in formation), and a RAR archive file. The CSV spreadsheet contained more than 60 columns, with titles like CompanyName and FileCreateDate. I would have typed *.JPG in the command, to limit the files examined, but I wasn’t absolutely sure that all of the image files in these folders were in JPG format.
More to the point, it seemed the recursion option (-r) had not worked: I was getting results only from the top-level folder. Neither the documentation text file nor the results of a search seemed to require me to specify the top-level folder, except that I did see one cryptic statement that might have indicated the opposite (“Only meaningful if FILE is a directory name”). I didn’t know why I would have needed to specify the folder; plainly, the command was producing results for the files in the correct top-level folder. But I tried again anyway:
exiftool -csv -r "D:\JPGs Folder" > ExifTitles.csv
That took some minutes to run. I wasn’t sure if I could have added *.JPG to that command, if I had wanted to. The command produced no errors. When it was done, it said, “327 directories scanned. 10237 image files read.”
Streamlining the Collection
At this point, I expected to begin analyzing the ExifTitles.csv file, toward the main goal of renaming the JPG files. My first looks into that CSV spreadsheet reminded me, however, that in a new and unfamiliar set of files, some could be confused and/or duplicative. It soon appeared that I would want to do some pruning and reorganization, and then re-run the ExifTool command, before getting into the renaming phase.
The ExifTitle.csv file produced by the foregoing command was almost 12MB. It contained 10,237 rows, each seeming to name a file. That did not quite match up with the 10,245 files reported by Windows Explorer > Properties. A search for *. (not .) in Everything, sorted by Path, confirmed that there was one file without a filename extension. I tried adding a .jpg extension (making sure to remember the name, so I could find it once it vanished from that Everything view). That worked: IrfanView (my default image viewer) could read it.
The Excel CSV spreadsheet itself provided a possible explanation for the discrepancy between the 10,237 and the 10,245 numbers. Filtering for FileTypeExtension confirmed that these folders did contain BMP, MPEG, PNG, and THM files, in addition to the predominant JPG files and the ExifTool EXE file. I was least familiar with the THM extension. Evidently those were thumbnails produced in connection with video files. The spreadsheet reported their size as 160×120. When I tried to open one, IrfanView (configured with my preferred options) said,
[Filename] is a JPG file with incorrect extension !
I said Yes. IrfanView did then display an image. I saw then that the image did seem to contain a thumbnail for a similarly named MPG file in one of these subfolders. I decided to delete these two THM/JPG files.
I wanted to convert the other image files (i.e., BMP and PNG) to JPG. But would I be losing any metadata if I did so? One question was whether those files contained any data worth worrying about. The spreadsheet did offer a Title column — but that, and the large majority of the 297 types of metadata captured in the spreadsheet’s columns (as I saw with a horizontal sort of columns, using =COUNTIF([column],”<>”&””) to identify occurrences per column) confirmed that there was little to no metadata in those image files. So, for each of them, I used Everything to search for the filename; I right-clicked > Open Path (to have the folder where I would want to create the replacement JPG); I double-clicked (to open IrfanView) > File > Save as > Save as type: JPG; and then, in Everything, I deleted the previous file and moved the new JPG into the correct folder.
That left a few MPG files, with names like MOV00025.MPG. I didn’t want to lose the contextual information provided by the original folder name, so in Everything I renamed them to include that information in the filename (e.g., MOV00025 – Family Vacation.MPG), and then moved them to a separate folder, so that they would no longer be included in the ExifTool file information output.
Of course, the spreadsheet already listed all those non-JPG file types, so that didn’t explain the discrepancy in total file counts. Then I realized — dumb mistake — that I had created my Excel spreadsheet and backups in the top-level folder where the JPGs were located. I moved those (and the original CSV) to another folder, and put ExifTool.exe in C:\Windows. This, I hoped, would leave me with nothing but JPGs in these folders.
Now I ran DoubleKiller to find exact duplicates. (Note: I was using DoubleKiller Pro. In case anyone wonders, I don’t get paid or otherwise rewarded for any product endorsements.) With an exact match, one file in a matching pair wouldn’t have metadata that the other lacked. So I could delete one of the two without losing anything. But before doing that, if the folder names provided any further information into the content of the files, in DoubleKiller I selected the file to keep > F2 > add the folder name information to the filename.
Later, I realized that I should have postponed that step, because the file I chose to delete on the basis of filename might have contained metadata not visible in DoubleKiller. My subsequent steps to bring metadata into the filename (below) would have taken care of that. On the other hand, if I’d had another collection of files with the same filenames but differing in quality, this might have been the time to do a DoubleKiller name comparison, before taking steps (below) that would change filenames.
Matching Up the File Count
Now I re-ran the ExifTool command with a slight change, to put the output CSV file in a separate folder:
exiftool -csv -r "D:\JPGs Folder" > "D:\Another Folder\ExifTitles.csv"
This time, the command reported 10,200 image files read. Windows Explorer > Properties reported 10,203 files. To try to eliminate that remaining discrepancy of three files, I opened the CSV, saved it as an Excel spreadsheet, and added a second worksheet to it. In that worksheet, I pasted the results copied from FileList.txt, which I produced with this command:
DIR "D:\JPGs Folder" /s > "D:\Another Folder\FileList.txt"
In that worksheet, I inserted column A, labeled it as Index, and filled its cells with numbers 1 through 12843, to remember the original order of the rows. (Ways to add those numbers: see sources on series fill; or enter 1 in the first row, 2 in the second row, and then highlight the rest of the space to be filled and use Alt-E I S Enter); or make each cell equal to the preceding cell plus one, and then “fix” the values (i.e., convert the formulas to fixed numbers, so they won’t change after sorting) using Copy and then Paste Special (e.g., select cells > Alt-E C > Alt-E S V > Enter > Enter).)
I added column C to detect lines in the DIR command output that specified a new directory, and to capture and repeat that directory name until another new directory name was encountered. Example: in cell C5, I put this:
=IF(ISERROR(FIND(" Directory of D:\",B5)),C4,MID(B5,15,LEN(B5)))
I added column D to detect rows that contained <DIR> information, which I didn’t need. The formula in this case was just
I copied those formulas from cells C5 and D5 to all rows of the spreadsheet, and then fixed their values (see above), sorted the table by column D, and deleted all rows containing <DIR>. I sorted the table by column B (containing the original output) and deleted all rows not containing filenames. I deleted column D, and put these commands in columns D, E, F, G, and H, and copied them down to all rows:
D2: =MID(TRIM(B2),21,LEN(B2)) E2: =FIND(" ",D2) F2: =LEFT(D2,E2-1) G2: =MID(D2,E2+1,LEN(D2)) H2: =RIGHT(G2,4)
These formulas gave me file size in column F, filename in column G, and extension (e.g., JPG) in column H. I could filter the table on those three columns to detect files that didn’t belong. There were two such files. It looked like I, or the person assembling the files, had inadvertently included some files from the System Volume Information folder in Windows. I deleted them from this set of JPG folders and from the spreadsheet. That left me with a discrepancy of only one file. I wasn’t sure what that might be due to. I decided to move on.
Renaming Files to Include Folder Name Information
Before I could safely eliminate exact duplicates in this photo collection, I would need to make sure that I wasn’t losing any file information contained in the names of the folders where the person had organized these files. To do that, I continued with (or could have re-created) the spreadsheet in which I was parsing the output of the DIR command, as described in the previous section.
For this particular set of files and folders, it looked like I might be well advised to keep all of the subfolder names, because they were well-organized and informative. So, for example, I wanted “D:\JPGs Folder\2015\April 15 Meeting\Picture 001.jpg” to become “D:\JPGs Folder\2015 – April 15 Meeting – Picture 001.jpg.” This would obviously be much more informative than just “Picture 001.jpg,” when it came time to review DoubleKiller results and decide which exact duplicates to discard. To achieve that renaming, I put these formulas in these cells and, as above, I copied them down to apply to all rows:
I2: =FIND("\",$C2,10)+1 J2: =MID($C2,I2,LEN($C2)) K2: =SUBSTITUTE(J2,"\"," - ")&" - " L2: =K2&G2 M2: ="ren "&CHAR(34)&C2&"\"&G2&CHAR(34)&" "&CHAR(34)&L2&CHAR(34)
The formula in cell I2 would tell me where the first backslash (“\”) appeared, after the part of the file path that I didn’t want to preserve (i.e., the uninformative “D:\JPGs Folder”). J2 could then give me the rest of the pathname. K2 replaced the backslash with the ” – ” separator. L2 combined the file path and filename. Finally, M2 used CHAR(34) to insert quotation marks in a way that would not confuse Excel. The formula in M2 gave me this:
ren "D:\JPGs Folder\2015\April 15 Meeting\Picture 001.jpg" "2015 - April 15 Meeting - Picture 001.jpg"
That was a command that Windows would recognize. I selected and copied every formula in column M — that is, all of these file renaming commands — and pasted them into Notepad. I saved that file as Renamer.bat, so that it would run; and then I ran it. (Again, make sure you have a backup!) To run it, I could double-click on it in Windows Explorer, or run it from another command, or type its name on the Windows command line.
So far, the files were still in separate folders. I wanted them all in one folder. This could raise a problem of name conflicts. For instance, there might be something called Picture 001.jpg in Folder A, and there might also be something called Picture 001.jpg in Folder B. If I tried to put them both into the same folder, I would have a problem: one would overwrite the other, or else one would refuse to go. Fortunately, there was a solution. Now that I had nothing but JPGs in this set, I could use Everything to search for *.jpg, sort the results by Path, select all found JPGs, and paste them into the target folder in Q-Dir. Q-Dir was a Windows Explorer replacement — but, unlike WinEx, if it found duplicate filenames, it would offer to save them with unique filenames (e.g., Picture 001 (2).jpg, Picture 001 (3).jpg).
Once that was done, I should have nothing but empty subfolders left. To make sure, I used Remove Empty Directories (RED) (3.7 stars at Softpedia; top-ranked at AlternativeTo) to delete the empties and show me what might be left. Seven files with a .jpeg rather than .jpg extension, not detected by my targeted Everything search. I renamed and moved those too. Windows Explorer showed a total of 10,200 files in the folder, and (after another round with RED) no subfolders.
Moving the files to one folder was optional. I preferred it, at least in this case. Having the files in one folder made them more visible than they had been when they were sectioned off into various subfolders. I might have felt differently about it if I had been convinced that they were already organized the way I wanted them.
Picking Out the Most Useful ExifTool Metadata
Now, at long last, I could try to get these JPGs’ metadata into their filenames. To do that, as above, I ran the ExifTool command to produce ExifTitles.csv:
exiftool -csv -r "D:\JPGs Folder" > "D:\Another Folder\ExifTitles.csv"
After running that command, ExifTool reiterated that it had read 10,200 image files. That matched the number of files counted by Windows Explorer > Properties. The number of types of metadata provided had also declined, probably because we were no longer dealing with EXE and BMP and other types of files, containing other sorts of information.
At this point, the CSV contained 10,200 rows and 242 columns. Each column appeared to contain a piece of data, or metadata, about the file listed on a given row. As above, I sorted the columns horizontally, from largest to smallest count, so as to encounter first the ones that were most frequently used, within these 10,200 JPGs.
Among those 242 columns, the most frequently used ones tended to be those offering the kinds of information that was visible or at least available in Windows Explorer for virtually any file. These included FileName, Directory, SourceFile (i.e., full path, combining name and directory), FileModifyDate, FilePermissions, FileSize, FileType (e.g., JPEG), and FileTypeExtension (e.g., jpg).
The spreadsheet also contained columns offering other types of metadata that one might expect of an image file. These included, for instance, BitsPerSample, ColorComponents, EncodingProcess, ImageHeight, ImageWidth, and MegaPixels. These, too, were available for every file in the list.
I could imagine that some of that information could be useful, when deciding whether to discard a duplicate. I might prefer to keep a file with higher resolution, for example. I was able to eliminate some of these by filtering the spreadsheet: this demonstrated that all of the files for which information was available had BitsPerSample = 8; ColorComponents = 3; EncodingProcess = Baseline DCT, Huffman coding; ResolutionUnit = inches; and Compression = JPEG (old-style). Also, at this point I did not really understand, and had not yet encountered a reason to care about such specifications as YCbCrSubSampling, ExifByteOrder, ModifyDate, Orientation, ThumbnailImage, ThumbnailLength, ThumbnailOffset, or About (which turned out to contain a UUID) — at least for purposes of including such information in the filename. Similarly, I could imagine situations where it might matter which device or program had created or edited the JPG, but I didn’t need that information now.
Those considerations reduced the number of kinds of metadata that I needed to look at more carefully. And then, as I proceeded rightward across my horizontally sorted spreadsheet, I found myself looking at increasingly exotic types of metadata, found in very few JPGs. For instance, data on GreenTRC appeared in only 146 of those 10,200 files, and CanonFirmwareVersion appeared in only nine. There did not seem to be a reason to include this information in the names of these JPGs.
Now my spreadsheet contained about 15 columns of data. Each of these columns held information that appeared in the tags of at least 3,000 JPGs. I saved the spreadsheet at that point, and continued work in a copy of it — because now it was time for surgery. Specifically, I deleted the columns just discussed, containing data that I did not expect to use.
It appeared that I might be able to reduce the number of potentially relevant columns further, because there appeared to be some duplication. After filtering and viewing the available information for a while, I added another row at the top of the spreadsheet, and in that row, above each column, I added a category name; and then I resorted the table horizontally to put those categories together. Those categories, and the types of metadata arising under them, were as follows:
Date: FileAccessDate, FileCreateDate, FileModifyDate, ModifyDate, DateAcquired, CreateDate, DateTimeOriginal
Title: Title, XPTitle, ImageDescription, Description
Keywords: LastKeywordXMP, Subject, XPKeywords
There were also a handful of other surviving metadata types (e.g., Megapixels) that did not seem to repeat information found in other columns. To identify and eliminate duplicates within the three categories just shown, I proceeded as follows:
- Date. Some of the date columns contained entries for every file, so it made sense to start with these: I might be adding date information to the name of each file. Now that I had renamed files to include folder names (above), I saw that many of the JPG filenames already included at least an indication of the year of the photo. I tried to see if I could get anything more specific from the date columns. I could, but most of the dates were from the past year or two, whereas it was clear that these photos were much older. I did find that the CreateDate and DateTimeOriginal, which generally seemed to be identical to one another, often provided the oldest dates. Ultimately, though, I concluded that the date information was unreliable, and decided to stick with the year information (if any) included in the filename.
- Title. After the Date, the Title columns tended to be the next most frequently used, among these JPG files. Title and XPTitle data were provided for 6,495 files; ImageDescription for 6,426; and Description for 3,123. I inserted comparison columns and used formulas to highlight those whose contents were identical. I started with Title (Column M) and XPTitle (Column N), and put this formula into Column O:
When I filtered on Column O, I saw all blanks: the contents of Title and XPTitle were identical in every case. I deleted the redundant XPTitle column and conducted similar comparisons with the other title-related columns. In the case of ImageDescription, I saw that some device or software had inserted “My beautiful picture” as the default description for many images. Otherwise, it developed that Title and ImageDescription were almost entirely exclusive, presumably reflecting two different origins: when one had content, the other would almost always be blank. So after a few edits, those could be merged into a new Revised Title column. Some of the same observations applied to the Description column, which had far fewer entries: basically, the Description column added nothing. With these efforts, the Revised Title now contained data on 7,074 files. I could use that data, but I would need to do more work with it (below).
- Keywords. The LastKeywordXMP and Subject columns proved to have completely identical contents, providing information for 3,112 files. The same was true between LastKeywordXMP and XPKeywords. Hence, I kept LastKeywordXMP and discarded the other two.
Taken together, the Revised Title and LastKeywordXMP columns contained data on 7,161 of the total of 10,200 JPGs. It appeared that the keyword and title columns were not largely redundant, that both could be informative, and that the keyword information tended to be more general and should thus come first in the filename.
Finally, there was the question of which other bits of metadata should be included in the filename. The remaining columns in the spreadsheet included ImageHeight, ImageWidth, ImageSize, Megapixels, XResolution, and YResolution. I verified that the first two were correctly captured in ImageSize, and therefore deleted them. For the other three, I saw that XResolution and YResolution data were missing for 78 files. I verified that X and Y resolutions were equal for the remainder. It was thus possible to calculate resolutions (which I doubted I would need anyway), given ImageSize and Megapixels. So I decided to keep just those last two.
Redundancy and Renaming
Those steps gave me the elements that I planned to combine into more informative filenames. After the existing filename (which now included the folder name), I proposed to append LastKeywordXMP, followed by Revised Title, ImageSize, and Megapixels, in this model form:
Folder – Filename – LastKeywordXMP – Revised Title (Width x Height, MP).jpg
But now I saw some redundancy in those fields. For one thing, the contents of LastKeywordXMP were already included in many file or path names. I used a FIND formula to convert LastKeywordXMP to a blank where that was the case. For example, with the file name in column C and LastKeywordXMP in column D, I put this formula in cell E2:
This dramatically reduced the number of LastKeywordXMP entries. It appeared that field was overwhelmingly redundant. Moreover, sometimes LastKeywordXMP contradicted Revised Title entries, if any were present. I decided to use LastKeywordXMP only where Revised Title was blank.
Now that filenames included the folder name, I could see redundancy there as well. For instance, “October 13 Session Members.jpg” might have been in a folder called “October 13 Session.” Simply combining path and filename had given me files that, in that example, would contain both of those references to the October 13 Session. But while adding material to filenames would merely make them redundant, reducing filenames could create name conflicts (e.g., producing two files called “Members Present.jpg”). So I decided to postpone the task of trimming existing filenames for now.
So now I had a complete list of the longer filenames that I wanted to give to these JPGs. It would soon be necessary to abbreviate some of those longer filenames. But I did not want to lose the detailed comments that someone had added for some photos. The comments might be too long to serve as filenames per se, but someone might still find their information valuable, even irreplaceable. So I exported the list of long filenames to a text file and saved it for future reference.
Then, I did have to trim some of these proposed filename additions. The maximum effective path length in Windows was 255 characters, and some of these new names exceeded that. Ultimately, I would want much shorter names — because, aside from being difficult to read in a window taking up less than the entire screen, I couldn’t predict how many layers deep they might be buried in some subfolder somewhere, thereby potentially exceeding the 255-character limit. In that case, undesirable things could happen. Windows could refuse to move files to an excessive depth, thereby giving me problems when I didn’t need problems; Windows could truncate filenames to 8.3-character short forms (e.g., 2012-1~0.jpg) that would remove meaning entirely. But it wasn’t ideal to fine-tune all of the filenames now. I would be taking some cleanup steps (below) that might delete a bunch of these files. For now, I limited my manual rephrasing efforts to those instances where the combination of these filename elements would not fit within the Windows 255-character limit.
Once that was done, I created another file called Renamer.bat (see above), and (after making another backup) I used that batch file to rename these JPGs. It seemed like I had to run it more than once to get it to make the desired changes, though maybe it was just that my system was slow to update the displayed file list, for so many files. Regardless, even after several runs, some filenames remained unchanged. I used a command (DIR /a-d /s /b > filelist.txt) to capture the new file listing, put that list into Excel, and used VLOOKUP to see which files still had their previous names. This effort revealed that 534 files had not been renamed. For many, the reason appeared to be that the proposed new filename contained characters that were not allowed in Windows filenames (i.e., \ / : * ? ” < > | ).
I should have taken care of that earlier, because now I saw I had made some problems. In a copy of the spreadsheet, I removed the rows containing the names of files that had been successfully renamed. This allowed me to focus on the files that had not been renamed, and reduced the time needed for the spreadsheet to calculate. Now I could see that I had gotten various results for those disallowed characters. In the case of the ? symbol, the files had been renamed, but in some cases, for some reason, the ? had been replaced with a letter, typically p or g. So if I wanted those to be correctly named, I had to rename them manually now, while I was still able to see that the p or g did not belong. I renamed some, but then saw that, in most cases, the renaming had proceeded, and the ? had simply dropped out.
Renaming did not occur, with or without the ? symbol, where the proposed filename also contained others from the foregoing list of prohibited symbols. Using Excel’s SUBSTITUTE function, I revised the spreadsheet to replace those prohibited characters with alternatives (e.g., replacing / with – ). That would not prevent someone from putting some other strange character into a JPG tag, but hopefully either those would be rare or would be so common as to catch my notice earlier in any future project of this nature.
After another round with a new Renamer.bat, I took another file list (DIR) and excluded the names containing “MP).jpg” because those would be the final characters in a successfully renamed file (see model filename form, above), and would probably not occur in many other filenames. It seemed that everything had been renamed.
Reducing Duplicates and Other Cleanup
As noted above, I had already made a premature effort to remove duplicate files, by running a DoubleKiller search for exact duplicates. It was not necessary to run a DoubleKiller search for files with duplicate names, because exact duplicate names would not have been able to coexist in the same folder. I tried a few other DoubleKiller comparisons (e.g., not exact matches, but same filenames for the first 80 characters, plus no more than a 1KB difference in file size), but these were not rewarding.
It was time to try VisiPics (3.5 stars at Softpedia; favorite in its class at AlternativeTo. In my experience, VisiPics was generally better than its competitors. I found that VisiPics worked best at its Basic setting. Sometimes it would be helpful, a notch or two looser, but usually those settings would produce too many false positives — unless, that is, the user actually wanted to be prompted to choose just one among several similar but not identical photos. In this case, even at the Basic setting, VisiPics found duplicates when the photos were actually not identical. In previous usage, it had seemed less likely to do that with lower-quality photos. On the other hand, I found that VisiPics continued to show near-duplicates even down to its loosest comparison setting. Generally, below its strictest settings, VisiPics did not seem very useful for comparing images of documents.
(Note: sometimes it seemed that VisiPics needed to be closed and reopened in order to function properly, but that may not be correct, and doing so would remove everything from its scan memory, making it start over for any further comparisons involving the same files. Although I wasn’t sure, it didn’t appear that the VisiPics memory would include the sets of photos already viewed. Note also that VisiPics would scan the Recycle Bin. To avoid repeating the same tasks and perhaps inadvertently deleting desired photos, it could be advisable to deselect or empty the Recycle Bin before proceeding.)
For some reason (possibly overlong filenames), VisiPics examined only 8,240 of the 10,200 JPGs in this set. Among those 8,240, on the Basic setting, VisiPics found 183 images that, in its view, looked similar to one or more others in the set, though not necessarily of the same size or quality. I moused over the thumbnails it showed me, watching the status bar and being reminded that VisiPics could benefit from a redesign (to display the comparative photo information right next to them; to show larger thumbnails; and also to offer the option of automatically checking all duplicates in a folder whose JPGs I had repeatedly marked, because some sets (e.g., pre-edit) were generally inferior). I marked for deletion the duplicates that had inferior characteristics (in terms of e.g., resolution or filename information). As with DoubleKiller (above), there was also the possibility of using VisiPics to compare this set of JPGs against any other image sets that might contain duplicates. Before deleting files in VisiPics, though, I right-clicked to rename the ones that would remain, whenever necessary to preserve file information from the duplicate that I was about to delete.
Another kind of cleanup involved revisiting filenames. There were some patterns, in this set (as renamed, above), involving excessive filename length and repetition of certain information. If I had been working with a smaller number of files, such that I could page down through the proposed changes to see what I was doing, I might have used Bulk Rename Utility (4.0 stars at Softpedia; surpassed by ReNamer at AlternativeTo). Instead, I opted for another series of changes as above: first, a backup; then a filelist (produced by DIR) imported into Excel, massaged to produce the desired new filenames, and then applied via a Renamer.bat file. This approach had the advantage of showing me, in the spreadsheet, what the new filenames would look like, before making the changes.
With the shorter filenames, I retried VisiPics. I also tried Easy Duplicate Finder, Awesome Duplicate Photo Finder, and dupeGuru. In this file comparison, as in previous comparisons, I found that VisiPics and Awesome were the only ones that captured a number of near duplicates that the other tool had missed. It was just as well that I had tried VisiPics before Awesome, because Awesome did not give me the option of renaming files to contain information from the name of the duplicates that I was deleting.
Categorizing, Combining, Presenting,
Storing, and Disposing of Images
Finally, it was time do something useful with the main body of JPGs. For some projects, the first step could be to divide them into subfolders whose names indicated the groups and subgroups they belonged in (e.g., Vacations, Meetings). In this case, where the filenames were clear, I was able to use the Windows Explorer or Q-Dir (above) file listings to cut and paste large numbers of JPGs into the desired folders. There was also the option of using Everything to search for keywords: I could sort the search results by Path, and cut and paste groups of pictures into the desired folder.
Ultimately, it was still necessary to view each individual photo. Sorting by filename did assist in getting most into the right category, but sometimes (after making a backup) viewing would tell me that a particular photo was in the wrong place, or that it needed to be rotated 90 degrees, fixed up, or deleted. I had found that IrfanView could do all of those things quickly, while allowing me to move from one photo to the next with a single tap on an arrow key.
It is pretty tedious to tap your way through 10,000 photos — repeatedly, in some cases, for those that had to be resorted and/or subsorted into more specific topical groups. This experience encouraged me to offer a suggestion or, perhaps, a warning. People who accumulate large numbers of photos, without any effort to distinguish the important from the unimportant, are essentially leaving it to the viewer to organize and dispose of them as s/he sees fit. In a mass of photos, even a diligent, well-intentioned viewer, possessing limitless time to fool with such things, could easily overlook or misunderstand some of what s/he is seeing. If there is any chance that photos will be shared, it would be advisable to make sure their file and folder names are informative, and that junk photos are deleted.
As I was going through that process, I selected a small number of photos for use in a video. In my own recent video work, amateurish yet rewarding, I had been using Adobe Premiere Elements to combine photos with audio and video materials. Video editing software would allow panning and zooming inside an individual photo, fades between photos, and special effects. Of course, video is very space-intensive: you’re using 30 frames per second to repeatedly capture a still image. Preparing a good video is also time-intensive: I could devote several minutes, or more, to the preparation a single photo. So video treatment would usually make sense only for the most interesting and high-quality photos. Also, good video editing software was not usually free. My exploration of such software had persuaded me that the Linux-based free alternatives were no match for programs like Premiere Elements, and the best Linux-based alternatives (e.g., Lightworks) also appeared on some lists of the best free Windows editing tools (but see e.g., TechRadar).
Experience suggested that I might want to create the video(s) first, before deciding what to do with the large mass of JPGs. The choice of files used in the video would not be finalized until the video was complete. New questions and ideas could arise during the video creation process. It might develop that I would not use some of the photos I expected to use, or that I might need more photos on a certain subject. After the video was done, there remained the question of how to deal with the large majority of photos. I had several options:
- Simply delete them. For some sets of photos, those that were not good enough to include in a video might not be worth keeping.
- Archive them in a subfolder. I disliked this option because these photos could appear again if I ran a duplicate detector (e.g., DoubleKiller, VisiPics) — which might be good for some users and some situations — and also because their long filenames would be a recurrent hassle if some future file arrangement moved them to a deeper folder.
- Archive them in a compressed file — referred to here as a ZIP file, although non-ZIP compression formats (e.g., RAR, 7z) would offer superior compression and features. I liked WinRAR, which was not free, but 7-Zip was a free and in some ways superior alternative. For files like these JPGs, whose format already involved sophisticated compression, an attempt at further compression could counterproductively produce a file that would take more disk space than the original files. That would be more likely when using WinRAR’s recovery feature as a sort of built-in backup, and less likely when using the option (available in WinRAR but apparently not in 7-Zip) to create a solid archive. It would also be possible to combine files into a single file without any attempt at further compression, using the Store compression level (available in WinRAR, 7-Zip, and probably others).
- Combine them in a PDF. With good PDF creation software (I used Adobe Acrobat), the PDF file’s settings could be configured to display just one image per page, and to keep the Bookmarks panel open, so as to display at least a portion of the filename. It would be important to make sure the resulting PDF would capture the image at an acceptable level of quality.
- Combine them in a slideshow. Note that video and slideshow treatment would reduce photos from their original resolution (e.g., 3456 x 2304 pixels, for an 8MP image) to no more than 1920 x 1080, since the latter would be the maximum viewing size on typical computer screens. This would matter if there might later be a desire to take a magnified view of any photos.
Of course, I could mix and match those options — deleting some JPGs, putting some in PDFs, and putting others in slideshows. For some of these options, a search on my computer for an individual filename (e.g., “Our Party at Mike’s”) would no longer turn up anything, if the desired file(s) had been combined into a single archive file, PDF, or slideshow. In this case, as detailed in another post, my exploration of slideshow options culminated in production of a PDF slideshow.