Lip-Reading Technology Knows What You Said

Video surveillance systems incorporating intelligent analysis already do a good job of tracking people, recognizing faces, and even interpreting physical gestures. But if a British research team has its way, surveillance system operators will have yet ano
April 27, 2007
3 min read

Video surveillance systems incorporating intelligent analysis already do a good job of tracking people, recognizing faces, and even interpreting physical gestures. But if a British research team has its way, surveillance system operators will have yet another new tool to use in their fight against crime and terrorism: automatic lip recognition.

Computer-based lip-reading technology would help video surveillance systems spot people planning a crime or terror attack by literally watching suspects’ lips for clues. Once it finds someone speaking certain key words or sentences, the system would automatically send an alert message to a central console, mobile phone, or other communications device. Police or security agents could then be dispatched to the scene to question the individual.

Richard Harvey, a senior lecturer in computer vision at the University of East Anglia in Norwich, England, is embarking on a three-year project that will collect lip-reading data. The information will then be used to create systems that can automatically convert lip motions into readable text.

The Home Office, the U.K. government department responsible for domestic security, is interested in the project, according to Harvey. So is the U.K. Engineering and Physical Sciences Research Council, which has awarded the venture a £391,814 grant.

Harvey says he and his researchers will investigate techniques for recognizing head positions, lip shapes, and their related sounds. “We have several methods for extracting what are called ‘features’ from the lips—sets of numbers that vary with the lip shape, but not with anything else,” he says, adding that the researchers won’t use any specialized cameras or computers in their work. “Our technology is very standard,” he says. “We are using standard speech recognition technology.”

Current automated lip-reading systems, which require good lighting and static heads, are limited and relatively inaccurate. “We can lip-read between 10 and 30 utterances at the moment, with an accuracy of around 50%,” Harvey says. “Given the difficulty of lip-reading, that is regarded as pretty good. But obviously there is a huge way to go before we can handle natural speech.”

Yet Harvey feels that once all the kinks can be worked out, automated lip reading could eventually be applied to consumer and business products as well as video surveillance. Camera phones incorporating the technology, for example, would let users communicate in even the noisiest environments.

“Lip-reading is important because normal people lip-read all the time... in cars, aircraft, parties, offices, and so on,” Harvey says. “Therefore, lip-reading is useful as an adjunct to normal speech recognition.”

About the Author

Sign up for our eNewsletters
Get the latest news and updates

Voice Your Opinion!

To join the conversation, and become an exclusive member of Electronic Design, create an account today!