Navigation Salon Salon Technology email print
Arts & Entertainment
Books
Comics
Health & Body
Media
Mothers Who Think
News
People
Politics2000
.Technology
- Free Software Project
Travel & Food
_______
Columnists

 

Current
Wire Stories

Click here to read the latest stories from the wires.

- - - - - - - - - - - -

- - - - - - - - - - - -

View From the Top

Full list of profiles

- - - - - - - - - - - -

Also Today

For a full list of today's Salon Technology stories, go to the Technology home page.

- - - - - - - - - - - -

Search Salon


  
Advanced Search  |  Help

- - - - - - - - - - - -

Recently in Salon Technology


Cannibal games
William Latham explains why players get to eat their enemies in his new game, Evolva.

By David Wilson
[10/28/99]


Cartoon for coders
"User Friendly" taps the open-source movement's collective funny bone.

By Janelle Brown
[10/27/99]


The information Laundromat
Whispernumber.com is beating the best minds of Wall Street -- but nobody really knows how.

By Mark Gimein
[10/26/99]

21st Challenge
21st Challenge No. 27 Results
Yachoo, Yaltavista and other "re-branded" sites.

By Charlie Varon and Jim Rosenau
[10/26/99]

Technology: View from the top
Local explosion
Dan Finnigan, president of Knight Ridder New Media, talks about how "the No. 1 newspaper chain on the Internet" is destined to be the king of online local news.

By Janelle Brown
[10/25/99]

Complete archives for Technology

- - - - - - - - - - - -

- - - - - - - - - - - -

Technology
by e-mail
Sign up here to receive our weekly e-mail newsletter listing recent and upcoming articles and events in Technology.

 
Unsubscribe

- - - - - - - - - - - -




Talking 'bout a computer revolution | page 1, 2, 3

"Any device that human beings interact with has a potential for speech technologies -- speech in itself is a natural mechanism for people to interact with, far more than keyboards and mice," explains Tom Morse, senior director for telecom engineering at Lernout & Hauspie. "We see it not as a way to replace other interfaces, but to augment them."

Speech-recognition technology has been in development since the 1970s, but only in the last two years has the software become truly viable for everyday consumers. Chris Carrigg, a speech-recognition expert and director of business development for the speech training company Say I Can, explains: "Up until two years ago you ... had ... to ... talk ... like ... this." Speech-recognition software, says Carrigg, used "discrete speech models" which could only parse one word at a time. "Dragon NaturallySpeaking was the first to come out with a natural speech program. Before that, it was so tedious to use that the only people interested in it were disabled users who had to use it."

With the advent of continuous speech recognition -- which began appearing in commercial products about two years ago -- software has now learned to recognize natural talking patterns, allowing users to dictate in their normal voice. Lernout & Hauspie's Voice XPress software, for example, uses a statistical mapping model with language matching and word pairing to gauge whether words fit together; essentially playing a guessing game with unidentifiable words to determine whether they fit into the sentence you just dictated.

The early adopters of speech-recognition software were, not surprisingly, those suffering from hand injuries or otherwise incapable of typing -- journalists with repetitive stress injuries, for example. Doctors, lawyers and others in dictation-intensive professions picked it up next: Radiologists who needed to dictate notes into a recorder while peering through a microscope would instead talk into a speech-recognition device that plugged into the computer, and lawyers used the software to transcribe their endless legal documents. The software companies have been catering to these niche markets with products that boast legal or medical vocabularies.

Today IBM ViaVoice is currently the bestselling product, closely followed by Dragon NaturallySpeaking; Voice Xpress comes in third, and FreeSpeech 2000 from Philips is the latest entry on the market.

Speech-recognition software, however, isn't yet making a major splash with everyday computer users; instead, it's still a niche product that is being used by those who have a pressing need. It isn't that the products are expensive; most start at $59 for a basic version. In all probability, many potential customers are intimidated by the awkwardness of a new interface and the time commitment involved in making it work. And like I said earlier, it's still far from perfect software: I spent time practicing with two speech-recognition products, Dragon and Voice Xpress, and was both impressed and frustrated by the experience.

Using speech-recognition software is a two-way street: Not only must you learn how to use the software; the software has to learn how to use you. Explains David Nahamoo, director of research for human language technologies at IBM, "First, you need to become familiar with the conversational interface -- being able to actually talk to a system and understand what it takes to interact with a machine through speech. Secondly, the machine has to become used to and customize itself to the way that you ... are using it."

The actual process of training these two products (and almost all speech-recognition software products) is quite similar -- you'll spend roughly a half-hour setting up your computer system and headset and measuring microphone and voice levels before moving into a training period. To train the software, you read documents aloud (in my case, snippets from "Alice in Wonderland") for anywhere from five minutes to a half an hour, while the software learns to recognize your voice -- a process called "enrollment." (With some products, you can also upload documents that contain your typical vocabulary, so that the software gets a sense of your writing style.) Then you can start dictating documents.

Nahamoo idealistically estimates that a good software program will optimize itself -- or as he puts it, "hit a plateau" -- within two to three hours of usage. The idea is that the more you use the software, the more it will understand your voice patterns, and the better it will perform. Sure enough, after using the software for several days, I saw a definite improvement -- although that was after four days, not two to three hours.

All of these products boast accuracy of 90 percent on up; but getting to that optimal recognition is a tricky, painful process -- in fact, there are entire books dedicated to explaining how to use the software correctly. Yes, these products can quite accurately transcribe your words, but only after you've mastered the ins and outs of proper dictation, specific commands and the oddities of voice-activated computer controls.

This can be a major time commitment, as I learned; and even when the software is operating at its optimum performance levels, it will still get one out of every 10 words (or so) wrong. I used the Dragon software for four days, and it was an error-ridden process even after endless hours of corrections and careful dictations. For every sentence that I breezily dictated, I had to spend another minute or so attempting to delete the one mistake.

To correct an error midway through a sentence, for example, you use a string of commands: "Select error," "Scratch that," "Delete previous character," "Move to end of sentence." With each of these commands, there's also a chance that the software will mis-hear you and accidentally transcribe the command -- "motorcycle penance" instead of "move to end of sentence" -- into your sentence, necessitating yet another string of corrections.

In addition, even on my zippy new Pentium machine, there was a lag of a few seconds while the software tries to interpret your words -- and for me, at least, it's much faster to just type. (Of course, I'm an unusually fast typist; those who are less speedy might find that speech software is much quicker than the old "hunt and peck" method.)

There are countless other small frustrations. The Voice Xpress software, for example, seemed to be very sensitive about my microphone and sound card drivers; although I got the software working on one PC, I had problems with installing it on two other PCs. Another niggling annoyance: You can't eat and dictate at the same time. Sure, you won't get grease on the keyboard, but the crunching from your Fritos is picked up by the microphone and appears in your text as some rather mysterious words.

The software is supposed to automatically adjust its microphone levels to your environment, and screen out meaningless white noise. But my Dragon software did pick up the background noise of my office: The loud banter in the next cube showed up in my documents as gibberish. (When I accidentally left the microphone on while I went out to lunch, I came back to discover a stream-of-consciousness rant about football imbedded in my article, courtesy of a loud neighbor.)

I also learned early on that all of my office-mates can hear every word I say -- and it's difficult to be a linguistic maestro (or to compose personal e-mails) when you know everyone around you is listening. For that matter, I'm sure my constant patter -- and swearing -- has been driving them nuts too.

Most important, as a journalist, it's not easy to compose an article orally -- it's a bizarre feeling to verbalize sentences rather than let words fall from your fingertips. Writing becomes a tedious, yet thoughtful, act; you must think the whole sentence out before you say it, and be precise in your speech -- and proper enunciation is rare in this age of mumbling. If you aren't careful, it'll be an awfully slow process: Just the last two sentences alone cost me three minutes of "scratch that" and "select that" and "move to end of sentence."

In fact, using speech-recognition software can stunt the creative writing process -- you end up feeling like a computer program, thinking in short phrases with your voice as the command line. The natural cadence of my sentences instead came out stiff and dry; my complex thoughts were interrupted by a constant need to correct the mistakes the program had made. I felt like an automaton; not an author.

This is a problem the software creators have witnessed, too. "In the case of creative writing, we are noticing some of the challenges -- that challenge is really designing an interface for composing where it's as natural as possible," says Nahamoo. As it is, he says, users must more carefully think out what they want to compose before they verbalize it -- which isn't necessarily a natural way of speaking in our rushed age.

But regardless of my complaints, the software does have a big upside. It's a blessing to not have to use your hands; and you can lean back in your chair with your eyes closed while you compose (as long as you open your eyes every few sentences to make sure that your dictations weren't boffed). Overall, it's far less stressful on your body; and it doesn't hurt your enunciation either.

Best of all, you don't have to worry about the proper way to spell "accommodate"; the software automatically spells it correctly for you. Despite my impatience with the program, I eventually came up with a solution that seemed to satisfy even my need for speed: using my voice to dictate, and my mouse to navigate and make corrections. It's not as zippy as typing, but it still saves my wrists.

. Next page | Coming soon -- computers that can understand speech better than us!



 

Salon | Search | Archives | Contact Us | Table Talk | Ad Info

Arts & Entertainment | Books | Comics | Life | News | People
Politics | Sex | Tech & Business | Audio
The Free Software Project | The Movie Page
Letters | Columnists | Salon Plus

Copyright © 2000 Salon.com All rights reserved.