High-speed speech calls for hardware
Share with others:
Rob A. Rutenbar, professor of electrical and computer engineering at Carnegie Mellon University, aims to create a computer chip that understands speech and processes it more quickly than current software can.
Click photo for larger image.
Imagine a computer understanding everything you say, regardless of how fast you speak or the words you use.
And while you're talking to that computer, it's also turning your words immediately into type.
Then imagine technology that can do this a thousand times faster than real time as a means of processing thousands of hours of recorded speech in a fraction of the time.
These are the goals of Rob A. Rutenbar, professor of electrical and computer engineering at Carnegie Mellon University and director of the national MARCO Focus Center for Circuit System Solutions.
And he's taking a whole new approach to accomplish these goals.
Dr. Rutenbar said current speech-recognition technology is approaching a ceiling in speech recognition, including understanding what people say, turning speech into text or filtering through hours of recordings to find key words or phrases.
His revolutionary approach is to transform computer "ware" from soft to hard.
Rather than rely on software, he's creating better hardware. His goal is to create a specialized computer chip that understands speech and processes it more quickly than current software is capable of doing.
"I came very late to this party," Dr. Rutenbar said, noting he's a "chip guy -- a silicon guy," while those working the last 20 years on speech recognition have been software experts.
But a software program sophisticated enough to recognize and process speech in real time is reaching its limits.
For now, software technology is used by phone companies to provide phone numbers without operator assistance and used by stores to gather information from callers. It recognizes and reacts to speech.
But Dr. Rutenbar said chips will be more efficient at these tasks. Computer graphics already are done with chips, he said, reflecting hardware's potential.
So he thought a chip also could be developed for speech recognition.
Already he and his CMU research team have produced a early prototype chip that can recognize 1,000 words, but not yet in real-time speed.
But he said the potential of his chip technology is boundless, and news of his success so far has created a buzz.
Teresa Meng, a professor in the electrical engineering department at Stanford University, described Dr. Rutenbar's research as the most advanced in speech recognition.
"I think it's phenomenal," she said. "He's put grammar and structure in the chip in a multistep recognition process to cast a fairly wealthy set of thinking into hardware.
"It's a very challenging and complex endeavor and that's why no one ever did it before."
She said it requires someone like Dr. Rutenbar who understands algorithms and circuitry.
His long-term goal, he said, is to develop a chip that can understand 50,000 words at a rate of 1,000 times faster than real time. That would mean filtering through 10 hours of taped archives to locate telltale words or phrases in 30 seconds.
Such technology is important for national security. It would allow faster analysis of wiretaps or help government agencies comb through thousands of hours of recorded intercepts that have yet to be processed.
"I want a box that can listen in one hour to thousands of hours of conversation," he said, describing the process as "audio mining."
There also are many commercial applications for such technology, he said.
Cell phones could have better voice recognition potential. One also could ask one's DVD player to scan the movie "Terminator 2," and stop when it reaches the point where Arnold Schwarzenegger says his famous line, "Hasta la vista, baby," in just seconds.
"We think we can develop architecture of this thing to get five to 10 times faster than real time with 5,000 to 10,000 words," Dr. Rutenbar said, noting his expectations to reach that level within a year. "We're in the lab right now working on it.
"I think it's doable," he said. "It's generated excitement. We're on track to pull this off."
Dr. Rutenbar is working in the second year of a four-year, $1 million National Science Foundation grant, along with grants from the U.S. Department of Defense and the semiconductor industry, to develop technology to assist Homeland Security.
Dr. Rutenbar did the first public demonstration of his new chip recently at a Hot Chips conference at Stanford University.
He also is scheduled to present a paper tomorrow on his technology at Interspeech 2006, the Ninth International Conference on Spoken Language Processing that CMU is hosting at the Westin Convention Center.
"Getting off the software track onto the hardware track was the way to liberate this technology to be all it can be," Dr. Rutenbar said.
First Published September 20, 2006 12:00 am