I bought a copy of Dragon NaturallySpeaking version 12, Premium edition, and installed it on my laptop. This post describes certain aspects of that experience.
Creating a Profile
As part of its installation, Dragon took me through its process of creating a profile. It asked about the primary speech device I expected to be using. I indicated that I planned to use Dragon to convert speech that I had recorded on a digital voice recorder (DVR). The installation process then instructed me to read a file into my recorder. It said I could read as much or as little as I wanted. I read the whole file. It took 19 minutes. I made a bunch of mistakes in the reading. Then I transferred that file from the recorder to the computer. I told Dragon to digest it or, as they called it, to “start training.” It said I should not use the mouse or keyboard until training was completed. I wasn’t watching the clock precisely, but I think it took about 25 minutes to process the file.
Then Dragon wanted to look in my email and my word processing documents. I said sure, go ahead. That was almost instantaneous, because I didn’t have many documents on the laptop. Dragon said it was going to give me an opportunity to turn it on to other materials, but I didn’t see that option. Dragon asked if I wanted to run various things on a schedule and then said my profile was ready for use. I was thinking, wait, I wanted to see how that 19 minutes of dictation turned out.
Choosing a Recorder
The next step, it seemed, was to try Dragon with some files I had dictated into my DVR. My long-running DVR of choice was the aging Olympus VN-960PC. That model was no longer available at retail. It was also not on Dragon’s Hardware Compatibility List. This suggested that Dragon might not fare well in its attempts to interpret my words. I considered switching to an Olympus VN-702PC. Given my irritation with Sony for gratuitously wiping out the audio on a YouTube video I had posted, I was not about to buy a Sony.
Ultimately, a careful look at the Hardware Compatibility List suggested that the most affordable and predominant entries were by Philips, so I bought a Philips DVT1000. I installed its batteries, connected its USB cable to my computer, and went into its Start.htm file in Windows Explorer. That gave me the User’s Guide. A search revealed no Dragon-specific contents. A search led to no immediately obvious guidelines on using the Philips with Dragon. Evidently I would just have to blunder ahead on my own.
On the Philips DVR, I went into Menu > Mic Sensitivity > Speech to Text. On the Quality settings, I set it to PCM WAV: I was used to working with WAVs; I had memory to burn; and it looked like Dragon could swallow WAVs as well as MP3s. I discovered that it recorded better to its internal memory than to the Micro SD card; that it took a second or so to get ready to record, and therefore missed the first couple of syllables I dictated; and that it would pick up the scraping sound of every finger touch on the unit. In other words, for speech recognition, I would have to place this unit in a cushioned space and use an external mic. The Olympus VN-960PC didn’t have an external memory option, but otherwise, in the regards just cited, it was superior to the Philips despite being a half-dozen years older.
I dictated a short piece of text into both the Olympus VN-960PC and the Philips DVT1000. I moved those files to the laptop where I had installed Dragon. Now, how could I get Dragon to try to convert them to text? There was an interruption in my work at this point. When I returned to this process, several days later, I remembered that I had tried the Transcribe button — the obvious starting point — but that there had been a problem with that. So I looked for the manual or other sources of information that would just lay out the process for me.
In that pursuit, several searches yielded no obvious guidance. Dragon’s Help feature, consulted for the topic of “Prepare Dragon to transcribe recordings,” told me to use a nonexistent “Digital audio recorder” option under Dragon’s Profile > Add Dictation Source menu pick. I didn’t seem to have gotten the User’s Guide with the program itself, though possibly it was somewhere on the installation disc, but I did find the place to download one from the Nuance website. It repeated the advice to use the “Digital audio recorder” option, but acknowledged that there was also a “Handheld or smartphone with recording application” option. I did have that option. But it seemed that this could give me inferior results. Why was there no “digital audio recorder” option? Their knowledgebase had no answer. When I clicked their E-mail Technical Support option, I got “Bad Request (Invalid Hostname).” I would have reported that to their Webmaster, but there was no webmaster or “Contact Us” link on that page. I tried the “Phone Technical Support” link, but it just put me back at the knowledgebase. Fortunately, I was still within my free tech support period, so I was able to report these issues to them.
Meanwhile, I went ahead with the “Handheld or smartphone with recording application” option. But it did not seem willing to work with audio (e.g., WAV) files. It said, “Recorder training has not yet been successfully completed for this user profile and dictation source.” It then led into a “Using a smartphone as a recorder” option. I didn’t want to go with that, so I canceled out. It occurred to me that maybe I was not seeing the digital audio recorder option because I had already added that to my profile. Maybe the menu options I had been looking at were intended just to add new dictation sources to profiles.
That was fine. So how could I get Dragon to look at — I mean, listen to — the audio recordings I had made? In the manual, the next main section after “Choose a Speech Device” was “Using the Dragon Sidebar.” It appeared to be intended for navigation and commands within Dragon. I didn’t need that. I was OK with just choosing a menu option. The next chapter in the manual was “Dictating Text.” We weren’t quite getting to the destination. I tried a search for “digital.” Aside from the sections just discussed, there was a brief mention of Sony’s Digital Voice Editor software, and an index entry for “digital audio recorder.” That’s it. The index entry pointed me to page 40, which just contained the stuff discussed above. I was on my own.
I went into Nuance’s forum for Dragon NaturallySpeaking for Windows. There were a couple of questions vaguely similar to things that I wanted to know about the DVR process, but nobody had answered those questions. It took me a minute to figure out that, if I wanted to post a new question, I couldn’t do it within the search results page; I had to go back to the forum’s home page and do it there. So I did. This generated some activity as forum members bickered over who was legitimate, and whether I was better advised to post my Dragon questions in the NaturallySpeaking forum at KnowBrainer. But anyway, it seemed that I might have made things worse by trying to add the smartphone option. Now, when I chose the Dragon bar option for Profile > Open User Profile, I saw that I had two entries: one for me with a digital audio recorder, and one for me with a handheld device. I wanted to get rid of the latter. But there was no button to do that in Profile > Manage User Profiles, though it was quite ready to delete my entire profile! But then I found the option in the nonintuitive location of Open User Profile > Source > Delete.
The Dragon bar said, “No user profile is open.” I went into Profile > Open User Profile. Now it said, “Dragon’s microphone is off.” Why was it telling me that, for a profile that indicated “Digital audio recorder” as the only input source? Now, with the aid of advice in the forum, I belatedly retried the Transcribe button. It gave me options to “Personalize how you transcribe.” Now I remembered: I had bailed out at this point before because it didn’t give me choices I liked. It was going to apply commands that I had dictated, such as “New line” and “All caps.” I hadn’t intended to dictate any commands. I just had my audio recording. I just wanted all of it transcribed, verbatim. I chose the least intrusive option here and, this time, I continued. I felt a little foolish: this was the answer, after all. If I hadn’t gotten interrupted, I would probably have proceeded directly with this, instead of all this fumbling around to find help. I just had to accept that Dragon might delete or move stuff in unpredictable ways because it thought my dictated words were commands.
So anyway, Dragon transcribed the file from the Philips recorder. It looked pretty good. It interpreted “wanna” as “will” — seemingly demonstrating a realization that “wanna” (which sounded nothing like “will”) was not good form, but erroneously making it “will” instead of “want to” — and it interpreted “as speech” to mean “a speech” even though the sound I made was not “uh speech” but rather more like “ass peach.” Still, pretty good.
I tried again with the same text as recorded by the old Olympus recorder. Dragon didn’t open another session of Dragon Pad. Instead, it put this second rendition into the same file as the one that it had just created when transcribing the dictation from the Philips, immediately after that previous dictation, without even a paragraph break between them. It produced exactly the same results from the Olympus as from the Philips. So far, there did not seem to be any reason to have purchased a DVR listed on the Dragon Hardware Compatibility List.
That first little dictation session was only a couple of sentences. I tried again, with a much longer speech. I was carrying the DVRs as I spoke, figuring that the Philips would pick up the noise of my hand brushing against its case, but this was the price of war: I needed to pace back and forth to inspire my expostulation. This was a random 3.5-minute soliloquy interrupted with pauses and various sounds of me coughing, water running, a door closing, and so forth. How would Dragon handle all that?
This time, I changed the option from using Dragon Pad: I told it to use “a selected window.” I think this is what it was referring to when, after designating the Philips recording for transcription, it told me to click on some other application. I clicked on Microsoft Word, where I hoped it would place the output. It told me — as it may have told me before, but I had been busy writing these words instead of watching the screen — not to touch the mouse or keyboard during transcription.
Halfway through its process, a batch file ran, starting some other programs. That may have interrupted Dragon. I wasn’t really sure what happened. It did output text into Word, but what it put there was bizarre. In my dictation, with all of its interruptions and noises, I had been talking about what I was going to do today. Somehow Dragon interpreted this as a speech about the Netherlands and Milosevic and the LAPD and some welders.
So, OK, I tried again. This time, I looked for an option to turn on a beep, or some other sound, that would prompt me to turn my attention back to the Lenovo ThinkPad Edge E430 (Intel i3-2350M CPU) where this was happening. But a search for “sounds” in Dragon’s Help file did not seem to turn up anything relevant. I went ahead with the transcription. When I did look back at the screen, I could see that Dragon was really struggling. It was going very slowly, apparently trying to figure out what those noises meant. I didn’t know exactly when the process ended. My guess was that it took maybe seven minutes. And yes, the output was still gibberish. The batch file interruption hadn’t caused it.
I tried transcribing the same speech, this time in the form recorded by the old Olympus DVR. I used Word to compare the two renditions. They were really different. Neither was usable. It looked like the Philips might have done a very slightly better job of making sense of all those noises. But in both cases, the output really was gibberish.
I tried another recording. This time, I spoke for just over one minute (1:06), without stray noises, just carrying both recorders in my hand. At the start of the dictation, I spoke slowly and clearly. Then I sped up. At the end, I was rattling off words quickly. I ran the recordings through Dragon. It took a bit less than two minutes (about 1:50) for each of the two recordings. This time, the quality was much better. Using Word’s ability to compare documents, I saw that
One thing I didn’t like about Dragon’s output was that it did not attempt to insert punctuation, even when I paused long enough to suggest the start of a new sentence. But on reflection, I decided that was just as well; I was quite likely to pause in the middle of sentences too. Dragon did a decent job of detecting and ignoring “uh.” It dropped an occurrence of “the” when I was talking at a medium-fast pace.
The Olympus did a better job than the Philips in most of the eight to ten instances when the transcription from the two recorders diverged. That is, Dragon seemed somewhat better able to understand the old Olympus. For example, I said this:
. . . and then I’m gonna speed up a little bit more and . . . .
The transcription from the Philips said, “. . . and then unlistenable that more and . . . .” That’s right: it inserted the word “unlistenable,” without brackets, and it dropped a bunch of words. The Olympus transcription, meanwhile, said, “. . . and then unless the other little bit more and . . . .” It seemed that a reader, alerted to the machine origin of the printed text, would likely be able to piece together more of the story from the Olympus output. Another example: speeding up at the end, I said,
. . . and, finally, uh, I’m gonna talk really fast and see if this thing can ca-, can keep up when I’m, you know, when the ideas are really coming out and maybe my voice changes a little d- in tone and I, um, I just have a lot to say.
To clarify, I misspoke, starting to say “catch up” and then changing my mind to “keep up,” and I also inserted a very slight dental sound, like a T or a D, just before the words “in tone.” How did the recorders handle this? First, the Philips version:
. . . and finally little growth existing contract can keep up whatever is really coming out and maybe my voice changes alone in tone and a decent a lot to say.
Now the Olympus:
. . . and finally will talk really fast and see of this and contact can keep up what I’m the only occasionally come out and maybe my voice changes old in tone and they just have a lot to say.
Neither was great with that bit of fast speech. But the Olympus output seemed better. I don’t know where “occasionally” came from, but the “old” in the last few words of the latter made some sense, given that I had made that semi-D sound. My conclusion was that there was no magic, and there was no guarantee, in the Dragon Hardware Compatibility List. A recorder that sounded better to the human ear would apparently tend to give somewhat better results in Dragon.
The dispute in the Nuance forum (above), and my exposure to other materials along the way, had alerted me to the likelihood that there was a lot to know about training Dragon to recognize speech from a given source. It seemed that I could probably improve substantially on the results shown above, regardless of the particular recorder being used, if I devoted sufficient time to the task.
I decided to look into that question — to see, that is, whether I was correct in my long-held impression that good speech recognition tends to require a lot of time in which you give the device your text and then you tell it where it went wrong. One thing I learned almost immediately was that, at least according to David Wood, if I really wanted to get the best speech recognition from a DVR, I should go with Dragon NaturallySpeaking Professional ($350, or apparently $250 as an upgrade from version 12 Premium). He said that the Pro version would allow (a) an “auto-transcribe folder agent” and (b) ” the ability of external dictation software such as Olympus ODMS to drive Dragon as an engine.” The mere fact that I had no idea what he was talking about was sufficient to persuade me that there was probably a whole world, there, with which I could probably develop basic fluency within a few days of dedicated work (and $350). A Vox Scripta page led me to think that the advantages of the Pro version would be especially noticeable to people who not only wanted to dictate words for conversion into text, but who also wanted to run software and enter data into it via voice commands.
Lunis Orcutt suggested, in that same thread, that the auto-transcribe folder agent was a tool that would process all of the audio files in a folder without requiring you to sit there and feed them one at a time. So if it came to a point of needing to transcribe a number of dictated files, it seemed the Pro version would be essential. Lunis also provided a link to his commentary on DVRs. He (and also David Wood) evidently considered the Olympus DS-7000 (~$500) the king of the speech recognition DVRs. His comments also supported my skepticism toward Dragon’s Hardware Compatibility List.
David Wood also seemed to say that I could train Dragon by feeding it corrections that I would supply in “a Full Text Control application.” He cited Word, WordPerfect, and DragonPad as examples. A search yielded indications that Full Text Control meant being able to dictate commands (in addition to text) effectively within application software. That sort of control was interesting but somewhat tangential to my interest in transcription accuracy. I was closer to the mark with an interesting Amazon review by Charles Bittner (“disabled comedian & gamer”). It appeared that practice and careful use could result in highly accurate dictation. But it still wasn’t clear whether and how I could train Dragon to have a better understanding of what I was saying.
Chapter 12 of the User Guide said the top ways to improve accuracy were: dictate carefully, import lists of words or phrases, learn from specific documents and emails, run accuracy tuning, perform additional training, turn off unused Dragon features, use Smart Format rules, make corrections, and save your user profile. Also, within Dragon Help, the Dragon Accuracy Center appeared to be a principal tool for improving accuracy. At this writing, these remained areas to be explored.