Max Huang says he has something cool to show me. I'm skeptical: he's
holding in his hand what looks like a PDA. It is a PDA, a
Compaq 3600, to be exact, unadorned and, to my eyes, unremarkable.
What's special is what's inside: this PDA understands what you say.
Huang and his colleagues at the Philips Speech Processing office
in Taipei, Taiwan, have streamlined the company's standard
speech recognition engine, meant for servers and PCs, to run
instead on a PDA. It's just a prototype, Huang says, but the
Mandarin-language recognizer can distinguish about 40 000
words and still not tax the Compaq's memory, power, or processing.
With it, Huang can access his address book, schedule appointments,
and dictate e-mail. Considering the alternative—poking
away at the device's tiny display with a skinny stylus—I'm
starting to be convinced: this does seem pretty cool.
To the extent that the average person is familiar with speech recognition,
she probably thinks of dictating reports to a PC, or maybe
dialing an automated call center for flight or train schedules.
Indeed, the speech industry has been pushing those kinds of
applications over the last decade.
But some of the most novel and most challenging work being done now
involves putting speech recognition where it was previously
thought infeasible: into toys and MP3 players, car navigation
and entertainment systems, and cellphones and PDAs. What's
enabling the migration of speech to smaller devices is, on
the one hand, efficient speech recognition engines that can
handle noise and variations in speech, and, on the other,
faster, bigger, and cheaper processors and memory chips on
which the engines can run.
The push for embedded speech comes at a time when manufacturers are
trying to cram ever more functions into ever smaller devices.
"There's just not enough room for all the buttons and displays,"
says Erik Soule, director of marketing for Sensory Inc. (Santa
Clara, Calif.), a developer of embedded speech products. A
voice interface that lets you say the name of that Beatles
song you want to listen to, rather than delving through your
iPod's multiple menus, offers a less frustrating alternative.
"We look at voice as a great complement to the visual and
touch user interfaces," Soule says.
Will consumers buy it? The Kelsey Group (Princeton, N.J.), one
of the few analyst firms that track embedded speech, thinks
so. In a white paper issued in July, Kelsey projected that
software licenses from embedded speech will grow from US $8
million this year to $277 million in 2006, making it one of
the fastest-growing segments of the speech market. That said,
speech is not a business where good products translate into
easy profits: witness the 1999 collapse of Lernout and Hauspie,
until then an industry leader and holder of some of the best
technology around.
Still, a wide range of little and big companies are now getting into
the embedded speech market. This includes established players,
like IBM and Philips [ranked (5) and (24), respectively, among the
Top 100 R and D Spenders,
which both have higher-end speech recognition products and
decades of research experience. It also includes smaller firms
like Sensory, Advanced Recognition Technology, and Voice Signal
Technologies, which focus on embedded technology
[see chart, Who's Getting Into Embedded Speech].
They're betting on a wide range of applications. A few, like voice
dialing, have already entered the mainstream, while others,
like voice-activated light switches and TV sets, remain a
novelty, and still others, like composing e-mail on your cellphone
and retrieving directions while driving, lie farther out on
the technological horizon.