I was using Thunderbird on Windows 7. I wanted to export a bunch of emails from Thunderbird to EML format. (At some later point, I would probably convert those EMLs to PDFs, building on lessons learned in previous attempts.) This post describes my Thunderbird-to-EML export efforts.
I wasn’t currently using the latest version of Thunderbird. I would have been happy to upgrade, but there was a question of whether my Thunderbird add-ons would work with that version. I viewed my currently installed add-ons via Thunderbird’s menu: Tools > Add-ons. I decided that the extensions I was most concerned about were ImportExportTools and Remove Duplicate Messages (Alternate) (RDMA). A look at their webpages confirmed that those two add-ons were compatible with Thunderbird 31, the latest version. (Otherwise, Mozilla said that I could download older versions of Thunderbird from 1 2 different archival sources.) So I installed Thunderbird 31. Those two extensions were updated automatically during the Thunderbird installation.
My next step was to delete duplicate emails using RDMA, so that I would not have unnecessary extra copies of my emails. Duplicates could arise if, for example, I went into my Hotmail account and moved the copies of the Sent folder to the Inbox, for downloading to Thunderbird. (I might do that if I had sent emails from Hotmail’s web interface without using Thunderbird as my Hotmail interface: my copy of the sent email would be in Hotmail’s website, not in Thunderbird — but copies of other emails, sent through Hotmail via Thunderbird, might appear in both places.)
As described in more detail in the post on my previous attempt to remove duplicate emails, I began the duplicate detection process by making sure I had a backup of my Thunderbird installation, in case something went wrong. Then I configured RDMA via Tools > Add-ons > RDMA > Options > Message Comparison tab. I began with all boxes checked and Seconds as the timeframe, so as to identify exact duplicates and verify that the add-on was working as expected. I put all of the emails that I wanted to duplicate-test into the same folder; I was not sure whether or how well RDMA would work across multiple folders. In Thunderbird’s main view, I selected that folder > right-click > Remove Duplicates.
The first time I ran this, it took quite a while for RDMA to finish its work. The status bar at the bottom of the Thunderbird window indicated that it was still “Searching for duplicate messages.” Then I got a “Warning: Unresponsive Script” error. I told it to Continue. Eventually an output window opened, showing me the duplicates that RDMA had found. I marked and deleted duplicates in that output window. (Again, see the previous post for more detail.)
I repeated that process, each time unchecking one or more boxes in the RDMA configuration window and then examining the results in the output window, until I was satisfied that I had eliminated most duplicates. As in the previous versions of Thunderbird and RDMA, it appeared that checking just Author, Recipients, Send Time, and Subject, with a Seconds time resolution, would yield a comparison right at the borderline between detecting all duplicates and wasting too much time unchecking non-duplicates.
Now I was ready to begin exporting. First, I did a search to see if any new tools had emerged since the last time I had done this. The search suggested that there were commercial tools available for the purpose, at prices of $50 and even $69, but it was not clear that they would do a better job of producing appropriately named EML files than I could do myself, and it still appeared that the free Thunderbird ImportExport Tools add-on (above) was still drawing lots of positive reactions. In my brief review, the most tempting alternative was Aid4Mail MBOX Converter. If I had not already worked out an approach that achieved what I wanted using ImportExportTools, I might well have tried that Aid4Mail alternative.
My goal, in this process, was to produce files following this naming convention: Date-From-To-Subject.eml. For example, one of my emails might get exported and ultimately saved as 2014-11-15 Email from Joe Blow to Jane Doe re Friday Night Preparations.eml. My previous notes suggested that, if the ImportExportTools upgrade had not changed anything too dramatically, my first steps would be to export individual emails to EML files and also to export the index of all emails.
To export the individual emails, I began by going into Thunderbird > Tools > Add-ons > ImportExportTools > Options. I adjusted those as desired. (Among other things, I chose a customized filename format consisting of Date-Name (Recipient)-Subject.) Then I right-clicked on the Thunderbird folder that I had named Export, containing all my non-duplicative emails to be exported. There, I chose ImportExportTools > Export all messages in the folder > EML format, and pointed to the Exported EMLs folder that I had created for this purpose in Windows Explorer. The status bar indicated how many messages were being exported. ImportExportTools proceeded to fill a subfolder within that folder with EMLs. I moved them to the top level in that Exported EMLs folder; no need for extra subfolders.
I noticed that, in one of those subfolders, ImportExportTools had given me a file called index.html. When I opened this, I saw that it contained all of the data I wanted: Subject, From, To, and Date. Now the challenge was to link those lines of data with the actual files, so that I could use the former to rename the latter. As in my previous effort, the EMLs in that folder seemed to be listed in nearly if not exactly the opposite order as the entries in this index.html file. So perhaps I could just invert the latter and use it to rename the former. To do that, I copied the contents of that index.html file, pasted them into Notepad, and then copied them again from Notepad into Microsoft Excel. In Excel, I added an Index column, to number each row with ascending numbers. I sorted to reverse the index order.
Now I had to see if that reversed list matched up with my exported EMLs. At the command prompt, I used “DIR /a-d /b > dirlist.txt” to obtain a list of those EMLs. I opened that dirlist.txt file in Notepad, made sure Format > Word Wrap was turned off, and copied those contents into another worksheet in the same Excel spreadsheet. (To create another worksheet, I used Alt-I W. An alternative: right-click the tab at the bottom of the Excel spreadsheet > Insert > Worksheet.) I renamed those two worksheets as Index (for the one containing the data from index.html) and EMLs (for the one containing the list of exported EMLs).
My first effort was to reduce the size of my task by getting the date and time data for both the Index and EMLs worksheets into the same format. As illustrated in the foregoing example of an email from Joe Blow to Jane Doe, I preferred the format YYYY-MM-DD HH.MM (i.e., year-month-day hour-minute). Setting up a column that would present the dates and times in that format required me to extract, for example, the value of the month, using an Excel formula that looked something like =MID(A1,5,2). In a new version of the spreadsheet (saving the prior one as a backup), I sorted both worksheets by their reformatted date columns, and then copied the contents of the one worksheet into the other. This allowed me to make direct comparison between the Index data and the EML filenames. I added a Match column so that I could mark those that matched (by e.g., unique dates and times), thereby reducing the size of my correlation task. I added a Re-sort column to write the Index numbers of the occasional batches of emails, sent out within the same minute under identical Subject lines, for resorting after manual opening and matching of EMLs.
Then I massaged the spreadsheet’s data. I did many searches and lookups to improve or replace existing From, To, and Subject values with more accurate and/or informative alternatives. I set up columns to identify the first and last characters of each proposed output filename, and altered some of those that began or ended with punctuation. I also wanted to replace certain other characters appearing in filenames provided by ImportExportTools, for several reasons:
- Windows would not permit files to be named using the Windows reserved characters (i.e., < > : ” / \ | ? * ). (I wasn’t sure how ImportExportTools had managed to export EMLs that would contain such characters in their names.)
- I recalled having experienced problems with filenames using some other characters in some programs.
- I knew that sometimes I would search for a name spelled with, say, an e rather than a é, and thus would not find the desired file.
- I suspected that some characters (e.g., ! $ = % ) might introduce complications when trying to address filenames containing those characters in batch files or other commands.
So I decided to search for a broader set of atypical characters. To do this, I set up a spreadsheet with rows and columns looking something like this:
That example does not show the filename produced by ImportExportTools (i.e., 20140715-1016-Netflix-For Wed_ 42.eml). It shows only the proposed replacement name, where I had already massaged the date and time data and had used the filename offered by the index.html file provided by ImportExportTools. The columns at the right side of that example show a few characters (i.e., ! # – . /0 1 ) that might occur somewhere in a given filename. (To work up the full set, I created row 2 (which has now been deleted) containing ascending numbers 1 to 255. The row of characters displayed in row 2 here (previously row 3) was created by use of commands like =CHAR(AK2), which in this example would produce the ! sign. The characters shown here (e.g., ! # – ) could be made unchanging, so that row 2 (containing those ascending numbers) could be deleted, by using Excel’s Alt-E C, Alt-E S V Enter.) The top row shows a simple count of how many times the specified character appears in any of my filenames (e.g., =SUM(AK3:AK5855)). I could then hide those columns containing values that I did not want to change (e.g., 0 1 ) and those columns identifying characters not appearing in any of my filenames (e.g., column AY, in this example), so as to highlight those characters that I did want to change. Then I set up a column using Excel’s TRIM and SUBSTITUTE functions, to replace the undesired characters with desired ones. (For some of the least frequently occurring characters, it was faster to highlight the specific filenames using filters, and change them manually.)
After making those changes, I used Excel to produce the commands needed for a batch file that would rename the EMLs. The basic command was something like this:
="ren "&CHAR(34)&C2&CHAR(34)&" "&CHAR(34)&D2&CHAR(34)
so as to surround the old (C2) and new (D2) filenames with quotation marks. I ran that batch file. It renamed the vast majority of exported emails. I saw, however, that several dozen emails were not renamed. In some cases, that was because of duplicate filenames. In other cases, however, it seemed that ImportExportTools may have failed to export the names of the emails. Some of those failures may have been due to the use of odd characters in the filenames; but for others, I was not sure what had gone wrong. These, I renamed by hand. And that was pretty much it.