What Is Involved in Speech Recognition Training?

Article Details
  • Written By: Mary McMahon
  • Edited By: Shereen Skola
  • Last Modified Date: 31 March 2020
  • Copyright Protected:
    Conjecture Corporation
  • Print this Article

Speech recognition training familiarizes software with a user’s accent and speech patterns to make the programs more accurate. This can improve the accuracy and speed of dictations. The length of time required can depend on the program and the level of accuracy needed by the user. In addition to an initial training period, it’s also possible to correct the software periodically, adding words to its dictionary and teaching it not to make common mistakes.

When voice recognition software is first installed, it may prompt the user to initiate training. It can be possible to skip this phase, but accuracy issues may become a source of frustration. Initial speech recognition training usually takes around 15 minutes, and it may be possible to record more audio to provide more nuance and detail. Users can choose to do this later if they want to start using the software immediately to get a feel for it before recording more audio samples.


In this process, it is important to have a good microphone and sound card to provide a clear, stable signal. The operator reads sample paragraphs to the program so it can match a known text with the voice patterns. Speech recognition training can also include requests to repeat words that sound similar so the system can learn to distinguish between them. People need to speak as naturally as possible to ensure greater accuracy; if they are enunciating more clearly because they feel self-conscious, the system may not be as accurate when they dictate.

As users operate voice recognition software, it can become more accurate. Many programs are designed to learn from corrections with a spoken cue like “correct that” which provides an indicator that the program didn’t accurately transcribe a word. All of the audio files associated with speech recognition training are stored in a central location and can be backed up. This is advisable so users don’t have to train a new program if they’re forced to reinstall or when they buy a new computer.

Professionals who use a lot of specific terminology in their work may need to spend more time on speech recognition training. Doctors, for example, may use formal anatomical terms and complex disease names when they dictate patient records. For them, taking time at the beginning to train the program with a list of common terms can save time in the long term. Some programs also have modules available with some of these terms already loaded, so the software doesn’t need to be taught.



Discuss this Article

Post your comments

Post Anonymously


forgot password?