Sound Recording Solutions


Sound Card Recorder

Sound Card Recorder

Powerful voice activated microphone recorder for Windows. Click here to learn more.



Speech Recognition

Speech recognition technologies allow computers equipped with a source of sound input, such as a microphone, to interpret human speech, for example, for transcription or as an alternative method of interacting with a computer. Such systems can be classified as to: Whether they require the user to "train" the system to recognise their own particular speech patterns or not. Whether the system is trained for one user only or is speaker independent. Whether the system can recognise continuous speech or requires users to break up their speech into discrete words. Whether the system is intended for clear speech material, or is designed to operate on distorted transfer channels (e.g., cellular telephones) and possibly background noise or another speaker talking simultaneously. Whether the vocabulary the system recognises is small (in the order of tens or at most hundreds of words), or large (thousands of words). The context of recognition - digits, names, free sentences, etc. Speaker-dependent systems requiring a short amount of training can capture continuous speech with a large vocabulary at normal pace with an accuracy of about 98% (getting two words in one hundred wrong) if operated under optimal conditions. Other "limited vocabulary" systems require no training can recognize a small number of words (for instance, the ten digits) from most speakers. Such systems are popular for routing incoming phone calls to their destinations in large organisations. Commercial systems for speech recognition have been available off-the-shelf since the 1990s. Despite the apparent success of the technology, few people use such speech recognition systems on their desktop computers. It appears that most computer users can create and edit documents and interact with their computer more quickly with conventional input devices, a keyboard and mouse, despite the fact that most people are able to speak considerably faster than they can type. Using both keyboard and speech recognition simultaneously, however, can in some cases be more efficient than using any one of these inputs alone. A typical office environment, with a high amplitude of background speech, is one of the most adverse environments for current speech recognition technologies, and large-vocabulary systems with speaker-independence that are designed to operate within these adverse environments have significantly lower recognition accuracy. The typical achievable recognition rate as of 2005 for large-vocabulary speaker-independent systems is about 80%-90% for a clear environment, but can be as low as 50% for scenarios like cellular phone with background noise. Additionally, heavy use of the speech organs can result in vocal loading. Speech recognition systems have found use where the speed of text input is required to be extremely fast. They are used in legal and medical transcription, the generation of subtitles for live sports and current affairs programs on television; not directly but via an operator that re-speaks the dialog into software trained in the operator's voice; in such cases the operator also has special training, first to speak clearly and consistently to maximize recognition accuracy, second to indicate punctuation by various techniques, and also often domain-specific training (especially in medical or legal contexts where the operator needs to know specialized vocabulary and procedures). In courtrooms and similar situations where the operator's voice would disturb the proceedings, he or she may sit in a soundproofed booth or wear a Stenomask or similar device. Speech recognition is sometimes a necessity for people who have difficulty interacting with their computers through a keyboard, for example, those with serious carpal tunnel syndrome, impaired extremities, or other physical limitations. Speech recognition technology is used more and more for telephone applications like travel booking and information, financial account information, customer service call routing, and directory assistance. Using constrained grammar recognition (described below), such applications can achieve remarkably high accuracy. Research and development in speech recognition technology has continued to grow as the cost for implementing such voice-activated systems has dropped and the usefulness and efficiency of these systems has improved. For example, recognition systems optimized for telephone applications can often supply information about the confidence of a particular recognition, and if the confidence is low, it can trigger the application to prompt callers to confirm or repeat their request (for example "I heard you say 'billing', is that right?"). Furthermore, speech recognition has enabled the automation of certain applications that are not automatable using push-button interactive voice response (IVR) systems, like directory assistance and systems that allow callers to "dial" by speaking names listed in an electronic phone book. Nevertheless, speech recognition based systems remain the exception because push-button systems are still much cheaper to implement and operate. Speech recognition system are based on simplified stochastic models, so any aspects of the speech that may be important to recognition but are not represented in the models cannot be used to aid in recognition. Speech segmentation, the division of the continuous speech signal into elementary units, is a very difficult problem. This task actually consists of two separate problems, the breakup and classification of the signal into a string of discrete "atomic" sounds (phonemes), and the division of that string into into meaningful substrings (words, or, more generally, lexical units). For most languages, the first task is already quite difficult - partly because of co-articulation, which causes phonemes to interact or combine, even across word boundaries. The second task is not trivial either, because in normal spoken speech there are no pauses between words. An example, often quoted in the field, is the phrase how to wreck a nice beach - which, when spoken, sounds like How to recognize speech. Proper segmentation therefore depends on context, syntax and semantics, meaning human knowledge and experience, and would thus require advanced pattern recognition and artificial intelligence technologies to be implemented on a computer. Intonation and sentence stress can play an important role in the interpretation of an utterance. As a simple example, utterances that might be transcribed as "go!", "go?" and "go." can clearly be recognized by a human, but determining which intonation corresponds to which punctuation is difficult for a computer. Most speech recognition systems are unable to provide any more information about an utterance other than what words were pronounced, so information about stress and intonation cannot be used by the application using the recognizer. Researchers are currently investigating emotion recognition, which may have practical applications. For example if a system detects anger or frustration, it can try asking different questions or forward the caller to a live operator. In a system designed for dictation, an ordinary spoken signal doesn't provide sufficient information to create a written form that obeys the normal rules for written language, such as punctuation and capitalization. These systems typically require the speaker to explicitly say where punctuation is to appear. The challenge for developers of Automatic Speech Recognition (ASR) engines is that the end customer judges them on one criterion: did it understand what I said? That leaves little room for differentiation. Of course, there are areas like multi-language support, tuning tools, integration API (the proposed standard MRCP or proprietary) , etc., but recognition quality is most visible. Because of the complex algorithms and language models required to implement a high-quality speech recognition engine, it is both difficult for new companies to enter this market as well as difficult for existing vendors to maintain the necessary investment level to keep up and move ahead. Currently, Nuance Communications (formerly known as ScanSoft) dominates the speech recognition market for server-based telephony and PC applications. There are several small vendors, like Aculab, Fonix Speech, Loquendo, LumenVox, Sensory Inc., Verbio, etc., but they are essentially niche players. Nuance's speech recognition business is actually composed of SpeechWorks and the products of several former niche players. IBM has also participated in the speech recognition engine market, but their ViaVoice product has gained traction primarily in the desktop command and control (grammar-constrained) and dictation markets. Nuance also makes Dragon NaturallySpeaking, a desktop dictation system with theoretically possible recognition rates of up to 99 percent. In practice this is impossible to achieve, due to the fact that people use a vocabulary of over 300,000 words and these words have not all been put into the vocabulary of Naturally Speaking. Philips Speech Recognition Systems is the market leader in enterprise healthcare speech recognition systems with its flagship product SpeechMagic (according to a report of Frost & Sullivan of December 2005). SpeechMagic is installed in more than 7000 professional sites world-wide, supports 23 recognition languages and boasts a portfolio of more than 150 specialized recognition vocabularies. For Mac users, iListen from MacSpeech, Inc. is available. Based on the Philips speech engine, iListen has been shipping on the Mac since 2000. Speaker-independent speech recognition embedded for mobile phones is one of the fastest growing market segments. Grammar-based command and control and even dictation systems can now be purchased in mobile handsets from operators such as Cingular Wireless, Sprint PCS, Verizon Wireless, and Vodafone. Voicesignal is the dominant vendor in this rapidly growing segment. Microsoft, Nuance, and IBM have also announced intentions to enter this segment. The big software heavyweights, Microsoft (Speech Server) and IBM (references - main site, voice toolkit preview, eWeek article, older InternetNews article, new InternetNews article on VXML toolkits) are now making substantial investments in speech recognition. IBM claims to have put one hundred speech researchers on the problem of taking ASR beyond the level of human speech recognition by 2010. Bill Gates is also making very large investments in speech recognition research at Microsoft. At SpeechTEK, Gates predicted that by 2011 the quality of ASR will catch up to human speech recognition. IBM and Microsoft are still well behind Nuance in market share. Start-ups are also making an impact in speech recognition, most notably SimonSays Voice Technologies. A Toronto-based company, Simonsays has made several breakthroughs in robust server-based speech recognition. Though SimonSays currently possesses a smaller market share, they are certainly a company to watch. Low cost speech recognition chip for toys and consumer electronics are supplied by Sensory Inc. and Extell Technologies.



Phone Call Recorder

Phone Call Recorder

Must have software for voice modem. Record all phone calls automatically, watch Caller ID information, create you own powerful answering machine. Perfect sound quality. Click here to learn more.






3d audio effect - 3gp - a-law - aac - acm - adc - aiff - aliasing - amplifier - amr-wb - amr-wb plus - amr - apev2 tag - asf - atrac - audio codec - audio compression - avi - bitrate - bitrate peeling - chord - codec - comfort noise - compact audio cassette - compact disc - compression - compression artifact - compression ratio - contact microphone - container format - dab - data compression - digital audio - digital camera - divx - dolby digital - dolby digital plus - dsp - dvd-audio - dvd - effects unit - equalization - ffmpeg - flanging - fourcc - frequency spectrum - hi-fi - high-end audio - hiln - id3 - joint stereo - laser microphone - line level - lossy - loudspeaker - matroska - mcf - microphone - midi - mixing console - mp2 - mp3 - mp3 surround - mp3 sx - mp3pro - mp4 - mpeg-1 - mpeg-21 - mpeg-3 - mpeg-4 - mpeg-7 - mpeg - mu-law - musepack - music - mxf - nut - osm - parabolic microphone - pcm - perception - phonograph - pink noise - pqf - psychoacoustics - qdesign - quadraphonic - radio receiver - ratdvd - realaudio - red noise - reverberation - rhythm - ribbon microphone - riff - rmvb - sbr - signal processing - sound - sound card - sound effects - sound recording - spdif - speech encoding - speech recognition - speex - stereo - subwoofer - surround sound - synthesizer - tape recorder - tdm - tweeter - video - video compression - vob - voice analysis - vorbis - wav - white noise - wma - woofer

Email to webmaster.

Copyright © Sound Recording Solutions Inc, 1999-2008
Partners