Sunday, August 10, 2008

You really can tell your PC what to do

David Pogue tests and is impressed by the latest version of the voice-recognition software NaturallySpeaking

Of all the high-tech fantasies that sci-fi movies
tantalise their escapist audiences with, surely that bit about giving
your computer spoken orders is one of the most alluring.

Ever since Star Trek we’ve dreamed of being able to say,
“Computer, display all known sources of dilithium crystals in the
Kraxon Nebula!”

So far, the closest we can get is strapping on a headset
and dictating, using a program like Dragon NaturallySpeaking to do the
typing. This software is great for anyone who can’t type or doesn’t
like to. And it lets you speak the names of menu commands and “click”
links on a webpage.

But that’s not the same as telling the computer what to do in conversational English.

NaturallySpeaking 10, available in the US from last
Thursday, takes some baby steps in the right direction. It doesn’t turn
your computer into the Star Trek mainframe; it doesn’t know what you
mean by, for example, “Make this document shorter and funnier”.

But in its timid, conservative way, it takes voice control unmistakably closer to that holy grail of computing.

NatSpeak’s principal mission, though, is to type out,
into any Windows program, whatever you say. And in version 10, its
maker, Nuance, claims to have eked out yet another 20% accuracy

I installed the program, donned the included headset and
clicked “Skip initial training”. (In the early days of speech
recognition, you had to read a 45-minute sample script to train the
program to recognise your voice. Today, the software is so good, you
can skip the training altogether.)

As a quick test, I read aloud the first 1000 words of
Freakonomics by Steven Levitt and Stephen Dubner into Microsoft Word.
Impressively enough, NatSpeak effortlessly transcribed words like “Ku
Klux Klan”, “futzed” and, “Punic Wars” but it did, however, mistype
seven easier words (including “addition” instead of “edition” and “per
trail” instead of “portrayal”). Accuracy tally with no training: 99.3%.
Not too shabby.

Then I tried a second test: I read one of the
five-minute training scripts (a John F Kennedy speech), which is
recommended for even better initial accuracy. I again read the first
1000 words of Freakonomics, and the program mistyped five words.
Accuracy this time: 99.5%.

In both cases, the number of spelling mistakes was zero. People who use NaturallySpeaking never make typos, only wordos.

As you correct the mistakes with your voice — a speedy,
streamlined procedure — the program learns. Whether you skip initial
training or not, accuracy inches towards perfection over time.

One way that Nuance has improved accuracy is by
acknowledging, for the first time, that not everyone speaks alike.
Version 10 recognises eight accents: general (none), Australian,
British, Indian, Great Lakes (Buffalo, NY, to Chicago), Southeast
Asian, Southern US and Spanish. If you don’t specify, the program will
identify you automatically.

Isn’t that somehow politically incorrect? Should a
software program treat you differently depending on how you sound? Ah,
the heck with it. It’s dictation software. A little stereotyping can go
a long way.

Speed is another virtue in version 10. The program still
waits for a pause in your talking before it types, so that it can use
context to choose, for example, the correct homonym
(there/they’re/their). But that waiting period has been halved; text
appears almost instantaneously at each pause.

Second — and here’s where things start to get Star Trekky — the program understands more “natural language” commands.

For example, italicising something you’ve already typed,
say, the phrase “fuel prices”, used to require three separate commands.
First, “Select gas prices”. Then, “Italicise that”. Finally, to move
your insertion point back where you stopped, “Go to end of document”.

In version 10, a single command does the trick:
“Italicise ‘gas prices’.” The program makes the change and returns to
where you stopped, all in a blink. The same trick also works with the
verbs “bold”, “underline”, “delete”, “cut” and “copy”. (Yes, “bold” is
a verb now.)

You can speak a series of new Search commands, beginning
with “Search computer for. ..”, “Search the web for. ..”, “Search
e-mail for. ..” and so on.

For example, “Search maps for Chinese restaurants near
Hoboken”, or “Search Wikipedia for Bay of Pigs”, or “Search images for
Gwyneth Paltrow”. These short cuts work 100% reliably and do save you
time and typing. Next version: more of them, please.

And now, the NatSpeak Frequently Asked Questions:

“Does NaturallySpeaking work on a Mac?” Yes, but only
when the Mac is running Windows and you’re using a USB headset adapter.
It works fantastically in Boot Camp and fast enough in VMware Fusion,
an emulator program.

Of course, it might be simpler just to buy MacSpeech
Dictate, a Mac program that uses the same Dragon recognition
technology. The current version is fast and accurate, but it lags
behind NatSpeak in features and power; it doesn’t even let you make
corrections by voice, and therefore the accuracy never improves.

But a 1.2 version, with voice correction and voice spelling, is in testing now.

“Can I transcribe interviews with it?” No. NatSpeak knows
only one person’s voice: yours. It also requires a clean audio signal,
like the one from a headset microphone 1cm from your mouth.

“Can I dictate with a wireless Bluetooth earpiece?” Yes.
In fact, version 10 greatly expands the number of compatible earpiece
models (18 so far, listed at Accuracy may take a hit,

“Can I dictate into a pocket recorder and transcribe it
later?” Yes. The setup is more involved, though: only some recorders
are compatible, and you have to record 15 minutes of training.

“Doesn’t Windows Vista come with speech recognition?”
Yes, and it’s really good — quite similar to NatSpeak, actually. But
Nuance says that, oddly enough, Vista has had virtually no effect on
NatSpeak sales.

I’m guessing that obscurity is part of the reason; most people aren’t even aware that Vista offers such a feature.

Vista doesn’t come with the required headset, either. Nor
does the Vista version offer the same accuracy, features or power of
NatSpeak, and it isn’t available in other languages (English, French,
Italian, German, Spanish, Dutch and so on).

NatSpeak is available in a number of versions. The
Standard edition (100) has the same accuracy as the others, but it’s
just for bare-bones dictation.

To get the more advanced goodies described in this
review — the natural-language commands, Bluetooth mikes and recorders —
you need the Preferred edition (200).

It also lets you design voice macros that type out
boilerplate text. For example, you can say, “Buzz off”, and it will
type: “Thanks for thinking of me! Unfortunately, I’m afraid I’m unable
to accept your kind offer at this time.”

There are also medical and legal editions (1600 and
1200), as well as a Professional edition (900) for corporate
administrators who want to manage many NatSpeak installations from a
central server.

And now, if you’ll excuse me, I have some real work to

From :