Speech Recognition

Topic > Speech Recognition - 1021

Speech recognition is the act of a computer listening to what you are saying and converting it into written text. It might seem like a very simple task to do, knowing that computers are surprisingly fast and powerful, but it is exactly the opposite. Most recognition software can achieve an accuracy of between 98% and 99% when used under optimal conditions. Optimal conditions assume that users have speech characteristics that match the training data, can achieve adequate speaker adaptation, and work in a noise-free environment (e.g., a quiet office or laboratory space). The two essential steps that a speech recognition system must take are training and decoding. There are two classes of speech recognition, one called speaker-independent, which has a small vocabulary of words/commands, and the other called speaker-dependent, which has a very large vocabulary but needs to be trained for each individual user. This training phase might involve a user reading a book aloud on the computer, while the system follows the words as they are spoken. It can also involve inputting pre-recorded speech and transcribing the audio into the corresponding text word. The training of the speaker independent system involves the collection of different commands and their configuration for the different accents and for the differences in male and female voices, slang, acronyms, articulation of words and temporal non-uniformity. One intriguing obstacle that speech recognition must overcome is homonyms, or words that sound the same but have different meanings. The common solution to this problem is to understand the context in which the possible words will be used and choose the corresponding word. This solution can also be used in all forms of… middle of paper… of the item. A recent application of voice recognition technology in entertainment is the horror film Last Call. When spectators purchase tickets they are asked to provide their mobile number. Before the start of the film, the database of telephone numbers for the film screening is sent to the company. Sometimes, during the film, an audience member's mobile phone rings and it is up to this audience member to give directions to the character on screen. Surprisingly the film is controlled by the voice of a casual spectator. Furthermore, the software must overcome the film's strong background noise. Voice recognition has also reached the video game market. Their distinguishing feature is that the player controls the game entirely by using a microphone to speak commands to the characters on the screen. commands are interpreted by the game's in-game voice recognition software.