Electronic Design

Speech Recognition Gets Intuitive, Finds New Life In Mobile Applications

Speech recognition has been one of those technologies that's perpetually "just around the corner." But after a recent IBM press event showcasing advances in speech technology and a spate of impressive applications, I'm convinced the age of speech recognition has dawned at last.

For years, IBM has pursued dictation applications. I tried IBM's ViaVoice for the PC some 10 years ago to dictate a magazine column. While it was fun and fairly efficient, there were too many MPMs (manglings per minute) to make it practical.

A decade later, IBM is still chasing the dictation holy grail. In 2001, the company says, speech input lagged human keyboarding by a factor of 10. Now, the company predicts we'll have machines that transcribe better than people by the end of the decade.

But all the while IBM has chased dictation's receding goal posts, the company also has been effectively re-channeling its speech expertise into "command recognition." These interactive phone-based applications have successfully moved speech recognition from the realm of sci-fi to an automation technique giving offshore call centers a run for their rupees. Tens of thousands of telephony and call-center applications now use speech input/output.

The proliferation of cell phones, PDAs, and other portable data devices has created mass-market opportunities for speech recognition and response systems. Bluetooth wireless headsets invite mobile users to multitask, naturally driving demand for hands-free application control or query and response. Automotive telematics create another opportunity for speech, enabling drivers to keep their eyes on the road while commanding an ever-expanding array of in-car electronics systems.

I was impressed by the demonstrations of IBM's Embedded ViaVoice 4.4 and its new freeform command recognition capabilities. The technology eliminates the need for users to memorize predefined control terms. Instead, it uses statistical language modeling and semantic interpretation to accept intuitive command phrases.

In contrast, my cell phone requires me to say, "call someone" to voice-activate the dialing function. If I forget the command and say "make a call" or "place call," it typically screws up and starts voice message playback. Then I have to take my eyes off the road, pick up the phone, perhaps swerve dangerously, and utter some choice freeform commands of my own.

At least in the confines of the demo room, the freeform commands worked as advertised. Demos included integrated car audio, phone dialing, and navigation system control. Context recognition was announced for an XM Satellite Radio hands-free interface, based on Embedded ViaVoice integrated into VoiceBox Navigator from VoiceBox Technologies.

VoiceBox offers intelligent searches by determining the context of a user's requests, whether searching for music or asking for driving instructions. If I tried changing a radio station by saying "change station," the system might ask me to specify an FM frequency, a type of music, or whether I wanted to scan the available stations.

Beyond the vehicle, IBM sees the natural opportunity to make speech a "multimodal" option for grabbing "info ondemand" on mobile devices. Several excellent demonstrations showed how speech could be used to efficiently fill data fields in PDA-enabled applications, like insurance claims inspection and mobile stock trading.

With today's college students coming of age with the cell phone, it's not surprising that the coolest applications were presented by Anne Bishop, director of Information Systems R&D at Wake Forest University. Named the "most wired" liberal arts university by Yahoo, Wake Forest has worked with IBM since 1995 as a "ThinkPad" school.

With 95% of its students carrying cell phones, Bishop says it makes sense for the university to integrate mobile technology into college life. So, Wake Forest offers PocketPC-powered smart phones and many services to enhance both academics and campus living.

The first applications that incorporate speech recognition involve the campus shuttle system and dorm laundry. For more about the unwired college lifestyle, go to www. electronicdesign.com and see Drill Deeper 12024.

See Associated Figure

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.