| |||
|
Arts & Entertainment Books Comics Health & Body Media Mothers Who Think News People Politics2000 - Free Software Project Travel & Food ![]() Columnists
Current Click here to read the latest stories from the wires. - - - - - - - - - - - -
- - - - - - - - - - - - View From the Top - - - - - - - - - - - - Also Today For a full list of today's Salon Technology stories, go to the
Technology home page. - - - - - - - - - - - - Search Salon - - - - - - - - - - - - Recently in Salon Technology 21st Challenge Technology: View from the top Complete archives for Technology - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
Talking 'bout a computer revolution | page 1, 2, 3
Speech-recognition technology has been in development since the 1970s, but
only in the last two years has the software become truly viable for everyday
consumers. Chris Carrigg, a speech-recognition expert and director of
business development for the speech training company Say I Can, explains: "Up
until two years ago you ... had ... to ... talk ... like ... this." Speech-recognition software, says Carrigg, used "discrete speech models" which could only
parse one word at a time. "Dragon NaturallySpeaking was the first to come
out with a natural speech program. Before that, it was so tedious to use
that the only people interested in it were disabled users who had to
use it." With the advent of continuous speech recognition -- which began appearing in
commercial products about two years ago -- software has now learned to
recognize natural talking patterns, allowing users to dictate in their normal
voice. Lernout & Hauspie's Voice XPress software, for example, uses a
statistical mapping model with language matching and word pairing to gauge
whether words fit together; essentially playing a guessing game with
unidentifiable words to determine whether they fit into the sentence you
just dictated. The early adopters of speech-recognition software were, not surprisingly,
those suffering from hand injuries or otherwise incapable of typing -- journalists with repetitive stress injuries, for example. Doctors, lawyers and others in dictation-intensive professions picked it up next: Radiologists who needed to dictate notes into a recorder while peering through a microscope would instead talk into a speech-recognition device that plugged into the computer, and lawyers used the software to transcribe their endless
legal documents. The software companies have been catering to these niche
markets with products that boast legal or medical vocabularies. Today IBM
ViaVoice is currently the bestselling product, closely followed by Dragon NaturallySpeaking; Voice Xpress comes in third, and FreeSpeech 2000 from Philips is the latest entry on the market. Speech-recognition software, however, isn't yet making a major splash with
everyday computer users; instead, it's still a niche product that is being
used by those who have a pressing need. It isn't that the products are
expensive; most start at $59 for a basic version. In all probability, many potential customers are intimidated by the awkwardness of a new interface and the time commitment involved in making it work. And like I said earlier, it's
still far from perfect software: I spent time practicing with two speech-recognition products, Dragon and Voice Xpress, and was both impressed and frustrated by the experience. Using speech-recognition software is a two-way street: Not only must you
learn how to use the software; the software has to learn how to use you.
Explains David Nahamoo, director of research for human language technologies
at IBM, "First, you need to become familiar with the conversational
interface -- being able to actually talk to a system and understand what it
takes to interact with a machine through speech. Secondly, the machine has to
become used to and customize itself to the way that you ... are using
it." The actual process of training these two products (and almost all speech-recognition software products) is quite similar -- you'll spend roughly a
half-hour setting up your computer system and headset and measuring
microphone and voice levels before moving into a training period. To train
the software, you read documents aloud (in my case, snippets from "Alice in
Wonderland") for anywhere from five minutes to a half an hour, while the
software learns to recognize your voice -- a process called "enrollment."
(With some products, you can also upload documents that contain your typical
vocabulary, so that the software gets a sense of your writing style.) Then
you can start dictating documents. Nahamoo idealistically estimates that a good software program will optimize
itself -- or as he puts it, "hit a plateau" -- within two to three hours of usage. The idea is that the more you use the software, the more it will understand your voice patterns, and the better it will perform. Sure enough, after using the software for several days, I saw a definite improvement -- although that was after four days, not two to three hours. All of these products boast accuracy of 90 percent on up; but getting to that optimal recognition is a tricky, painful process -- in fact, there are entire books dedicated to explaining how to use the software correctly. Yes, these products can quite accurately transcribe your words, but only after you've mastered the ins and outs of proper dictation, specific commands and the oddities of voice-activated computer controls. This can be a major time commitment, as I learned; and even when the software is operating at its optimum performance levels, it will still get one out of every 10 words (or so) wrong. I used the Dragon software for four days, and it was an error-ridden process even after endless hours of corrections and careful dictations. For every sentence that I breezily dictated, I had to spend another minute or so attempting to delete the one mistake. To correct an
error midway through a sentence, for example, you use a string of commands:
"Select error," "Scratch that," "Delete previous character," "Move to end
of sentence." With each of these commands, there's also a chance that
the software will mis-hear you and accidentally transcribe the command --
"motorcycle penance" instead of "move to end of sentence" -- into your
sentence, necessitating yet another string of corrections. In addition, even on my zippy new Pentium machine, there was a lag of a few seconds while the software tries to interpret your words -- and for me, at least, it's much faster to just type.
(Of course, I'm an unusually fast typist; those who are less speedy might find that speech software is much quicker than the old "hunt and peck" method.) There are countless other small frustrations. The Voice
Xpress software, for example, seemed to be very sensitive about my
microphone and sound card drivers; although I got the software working on
one PC, I had problems with installing it on two other PCs. Another niggling
annoyance: You can't eat and dictate at the same time. Sure, you won't get
grease on the keyboard, but the crunching from your Fritos is picked up by
the microphone and appears in your text as some rather mysterious words. The software is supposed to automatically adjust its microphone levels to
your environment, and screen out meaningless white noise. But my Dragon software did pick up the background noise of my office: The loud banter in the next cube showed up in my documents as gibberish. (When I accidentally left the microphone on while I went out to lunch, I came back to discover a stream- I also
learned early on that all of my office-mates can hear every word I say -- and
it's difficult to be a linguistic maestro (or to compose personal e-mails)
when you know everyone around you is listening. For that matter, I'm sure my
constant patter -- and swearing -- has been driving them nuts too. Most important, as a journalist, it's not easy to compose an article
orally -- it's a bizarre feeling to verbalize sentences rather than let
words fall from your fingertips. Writing becomes a tedious, yet thoughtful,
act; you must think the whole sentence out before you say it, and be precise
in your speech -- and proper enunciation is rare in this age of mumbling. If
you aren't careful, it'll be an awfully slow process: Just the last two
sentences alone cost me three minutes of "scratch that" and "select
that" and "move to end of sentence." In fact, using speech-recognition software can stunt the creative writing
process -- you end up feeling like a computer program, thinking in short
phrases with your voice as the command line. The natural cadence of my
sentences instead came out stiff and dry; my complex thoughts were
interrupted by a constant need to correct the mistakes the program had made.
I felt like an automaton; not an author. This is a problem the software creators have witnessed, too. "In the case
of creative writing, we are noticing some of the challenges -- that
challenge is really designing an interface for composing where it's as
natural as possible," says Nahamoo. As it is, he says, users must more
carefully think out what they want to compose before they verbalize it --
which isn't necessarily a natural way of speaking in our rushed age. But regardless of my complaints, the software does have a big upside. It's a
blessing to not have to use your hands; and you can lean back in your chair
with your eyes closed while you compose (as long as you open your eyes every
few sentences to make sure that your dictations weren't boffed). Overall,
it's far less stressful on your body; and it doesn't hurt your enunciation
either. Best of all, you don't have to worry about the proper way to spell
"accommodate"; the software automatically spells it correctly for you. Despite my
impatience with the program, I eventually came up with a solution that
seemed to satisfy even my need for speed: using my voice to dictate, and my mouse to navigate and make corrections. It's not as zippy as typing, but it still saves my wrists.
| ||
Arts & Entertainment | Books | Comics | Life | News | People
Politics | Sex | Tech & Business | Audio
The Free Software Project | The Movie Page
Letters | Columnists | Salon Plus
Copyright © 2000 Salon.com All rights reserved.