Why speech recognition do not work?

I have interviewed 4 heavy speech users. I tried to validate my hypothesis, and got the following results. My intention is to compare key typing with speech. But the most interesting finding is the problem of privacy issues of speech input which is described in the last part of this article.

Hypothesis:Users can speak and recognizer responds in real-time and it’s accurate. It is practical to input texts every day.

Result: Speech recognition is practical to use and users who started to use it don’t stop using.

I asked how many years you use speech recognizers. One answered two years. Two answered 10 years. None of them stopped using speech recognizer since he/she started using it.

Hypothesis: Speech recognition is faster than typing.

Result: Speech is faster than typing.

All answered speech is faster than typing in their scenarios.

Hypothesis: Users “think and speak” smoothly to compose sentences (there is no cognitive resource interference as “think and type”).

Result: Speech recognition is easy to use if there are idea already to express, or it is easy to express draft statements. It need some practice to use speech recognizer while thinking.

I asked what kind of task you use speech recognizer for.

  • One answered he uses it for writing an article. It is a thinking heavy scenario.
  • Two answered he/she uses it for simple tasks –
    • one is for input of translation from foreign language text
    • and the other is transcription of student compositions in handwriting to digital media.
  • One answered two scenarios –
    • one is somewhat between above two cases – re-construction of contents from a seminar voice record
    • the other is input draft statements for later composition.

I asked whether you “think and speak” or “speak while thinking”. Three out of five scenario use speech recognizer when thinking is already done to some extent.

  • One (article writing person) answered he “think then speak”. He said we need some practice.
  • Two (re-constructing contents, translation) answered he/she “speak while thinking”. The person who reconstruct voice contents said it is a matter of media difference of keyboard and speech once he get an idea to express. The person who translates said keyboard and speaking is on the same position.

There is a myth “think and speak” interferes cognitive resources but “think and type” does not. But I doubt it. When we converse with someone, we get an idea then talk to the person. Sometime I feel I converse while I am thinking to say. Speaking follows thinking, but speaking is slower than thinking. In case of typing or writing, outcome result is a well formed sentence. It looks a beautiful sentence comes out of simultaneous thinking and typing. But typing/editing is actually a reflective conversation with media. Typing follows thinking. Typing takes time and it looks they happens simultaneously. It is not different from speaking.

Hypothesis: He/she may use speech primarily to capture texts as an early adopter. He/she spend time to correct strings while he/she is authoring blog/article. The users of speech recognizer to author documents take time to correct strings and dissatisfied with the overall speed of input.

Result: Speech users are satisfied that speech is faster than keyboard even we count the correction time.

I asked how long you take to correct speech recognized text for 10 minutes.

  • One answered 10 minutes. He uses keyboard for correction.
  • One answered a few minutes. He uses keyboard for correction.
  • One answered less than 10 minutes. He uses keyboard for correction.
  • One (article writing person) answered he interactively correct errors. He uses speech command, keyboard, in a mixed way.Correction takes time, but overall turn-around time of speech input gives satisfaction.

All are satisfied speech is faster than keyboard even we count the correction time.

Hypothesis: Users don’t use speech recognition for quick messaging or quick note taking on the road, because they care about privacy and think it socially impolite. Users don’t use speech recognition in office, because it makes noise to the others, has privacy issue, and it’s socially impolite.

Result: Users don’t use speech recognition on the road or in office, because they care about privacy, noise, social strangeness, confidentiality.

I asked whether you use speech recognizer on the road or in your office. Nobody do it. They all use speech recognizer only when he/she is alone in a closed space.
The reasons are

  • (Two) Privacy
  • (Two) My voice becomes noise to the others.
  • (Two) I feel shy to speak to computer on the road.
  • It is strange to talk to a computer. If I speak, someone else will look back and hear me. Speaking is for human communication.
  • My words may be confidential, and may include private information.

In addition to privacy and being noise to the others, social strangeness against common sense, and confidentiality were mentioned. Silent speech can solve privacy, noise, confidentiality issues.


