Why I Cannot Write a Novel With Voice Recognition Software (Updated x 3)

Every time I mention my RSI people suggest that I use voice recognition software. I do use it. And though I hate it I know that it has transformed gazillions of people’s lives. There are people who literally could not write without it. For them VRS is a wonderful transformative thing. Bless, voice recognition software!

I am well aware that what VRS is trying to do is unbelievably complicated. Recognising spoken language and reproducing it as written language is crazy hard.1 The way we make sense of what someone says is not just about recognising sounds. We humans (and other sentient beings) are also recognising context and bringing together our extensive knowledge of our own culture every time we have a conversation. And even then there are mishearings and misunderstandings. Also remember one of the hardest things for VRS is for it to distinguish between the speaker’s sounds and other noises. Humans have no problem with that.

I know my posts here about VRS have been cranky so I’ll admit now that there are moments when I almost don’t hate it: VRS is a much better speller than I am. That’s awesome. And sometimes its mistakes are so funny I fall over laughing. Who doesn’t appreciate a good laugh?

I use VRS only for e-mails and blog posts. And sometimes when I chat. But I usually end up switching to typing because it simply cannot keep up with the pace of those conversations and I can’t stand all the delays as I try to get it to type the word I want or some proximity thereof. But mostly I don’t chat much anymore.

But I gave up almost straight away on using it to write novels. Here’s why:


1. The almost right word is the wrong word for fiction.

Near enough SIMPLY WILL NOT DO. I cannot keep banging my head against the stupid software getting it to understand that the word that I want is “wittering” NOT “withering.” THEY DO NOT MEAN THE SAME THING.

Recently it refused to recognise the word “ashy.” Now, I could have said “grey.” But guess what? I did not mean “grey” I meant “ashy.”

The almost right word is fine for an e-mail. Won’t recognise how I say “fat”? Fine, I’ll say “rotund” or “corpulent” or whatever synonym I can come up with that VRS does recognise. “I’m going to eat a big, corpulent mango” works fine for an e-mail. However, it will not do for fiction.2

2. Flow is incredibly important.

Most of my first drafts are written in a gush of words as the characters and story come flowing out of me. Having to start and stop as I correct the VRS errors, and try to get it to write what I want it to write, interrupts my flow, throw me out of the story I’m trying to write, and makes me forget the gorgeously crafted sentence that was in my head ten seconds ago.

Now, yes, when I’m typing that gorgeously crafted sentence in my head it frequently turns out to not be so gorgeously crafted but, hey, that’s what rewriting is for. And when I’m typing the sentence it always has a resemblance to its platonic ideal. With VRS if I don’t check after every clause appears I wind up with sentences like this:

    Warm artichoke had an is at orange night light raining when come lit.

Rather than

    When Angel was able to emerge into the orange night Liam’s reign was complete.

Which is a terrible sentence but I can see what I was going for and I’ll be able to fix it. But that first sentence? Leave it for a few minutes and I’ll have no clue what I was trying to say.

However, checking what the VRS has produced after Every Single Clause slows me down and ruins the flow.

3. It’s too slow.

I am medium fast typist. I’ve been typing since I was fourteen. I can get words down way faster and more accurately than VRS.3 Its slowness is very, very frustrating and is yet another factor that messes with my flow when writing.

Obviously, none of this is a huge problem for e-mail. I do persevere with it for blogging too despite the fact that means I am at most blogging once a month. Using VRS for those kinds of writings does save my arms. I’m grateful.

But for my novel writing? It’s a deal breaker. I can’t do it.

VRS is going to have to take giant strides to get to a point where it allows me to write fiction without grief and frustration and the hurling of head sets across the room.

Again, I’m really glad that it has helped so many of you. I have been hearing lots of wonderful stories about the ways VRS has changed lives since I started writing cranky posts about it. That’s all fabulous.

But for me? No, not yet.

Update: I should have also noted that every time I write one of these posts I get lots of people trying to help. That is very sweet of you and I totally get why. I have the same impulse. We all want to make things better.4

But, yes, it is also kind of annoying and overly helpy. This has been going on for years now. You can safely assume that unless you are suggesting a very recent breakthrough or a very left-field obscure idea—WEAR A ROTTEN WOMBAT ON YOUR HEAD—I have heard it all before and tried it all.5

So if you were wondering—everything suggested in the comments?—been there, done that.

Update the Second

Am getting many folks telling me that the error rate in the orange night example above is crazy high. You got me. I deliberately chose a super bad example because it’s funnier. My bad. Next time I rant about this I promise to choose a less crazy and amusing one, okay?

Funny thing, though, even the best VRS error rate I’ve ever managed is incredibly annoying and slows me down.

Update the Third

Thanks so much for all the lovely letters & comments of sympathy, support, me toos, and commiseration. Means the world to me.

  1. Very few humans are one hundred per cent accurate at the task. Even court reporters make occasional mistakes. []
  2. Actually I’m now thinking of all sorts of ways in which it would work for fiction but you get my point, people. []
  3. And, wow, am I not the world’s most accurate typist. []
  4. Unless we have an evil streak a mile wide. Ha! VRS rendered “a mile wide” as “a mild way.” Bless. []
  5. Well, not the wombat thing. But only because I can’t get past the smell of roadkill. And the fear of putrescence dripping down my face. []

29 comments

  1. Douglas on #

    I just did a topical search of your posts and didn’t find anything that overtly addressed the type of keyboard you use or if you’ve tried different ones to combat your RSI. I apologize if you have mentioned keyboards; I’m truly not trying to be what Scalzi calls ‘helpy’.

    The screenwriter John August is quite fond of this thing: http://www.safetype.com/ On the chance you’d not seen this particular model in your efforts I thought I’d pass along.

    I hope things get better for you!

  2. sean williams on #

    Wonderful post, Justine. I feel (and share) your pain!

  3. Nicole Murphy on #

    This is why I’m trying really hard to fix the problems I’m having with my arms now so I don’t have to get to the point of using VRS. Apart from being incredibly crotchety for my age (‘hate change’ ‘get off my lawn’) I love how I work right now and apart from the weakness of my body it works for me and I don’t want to risk losing that with a new system.

    Best of luck to you.

  4. Trudi Canavan on #

    Yes, yes, and yes-but-for-the-opposite-reason. RSI had me buying voice recognition software late last year, and it confirmed what I’d suspected all along: I don’t write in a linear fashion. I swap clauses and sentences and paragraphs around as I go. It turned out it’s not typing that gives me RSI – it’s the constant up, down, left, right, page up, page down that has worn out my first and second finger tendons and right wrist. Which gets worse when I’m editing.

    I could have put up with the slowness and inaccuracy of the voice recognition software, but because it doesn’t have good enough command structure to get the curser to move around the way I need it to, I gave up on it. Instead I do the cursor navigation with my left hand. (And I gave up hand knitting.)

  5. Cameron on #

    But on the bright side, how good is the bowling attack!

  6. Rafter on #

    Have you tried a different VRS software? Or microphone? The example you give seems pretty extreme and would prompt me to think something was wrong. And yes, I do use Dragon VRS.

  7. Jay on #

    I understand that interrupting your flow could be problematic, but why not just record yourself? Then, at your leisure, play it back to your VRS and correct the errors as they come w/o fear of losing your original train of thought.

  8. coyote on #

    Someone once told me that voice recognition on a computer is kind of like a dog riding a bicycle. Even if it’s lousy, it’s kind of amazing that it does it at all.

    That’s cold comfort, though.

    My issue when I use it – through different programs dating back maybe a dozen years, to the earliest days of VR on PCs – is that my speaking voice interrupts and drowns the flow of the voice inside my head. And I really need to hear that inner voice to be able to write properly.

  9. Don Lindsay on #

    How about just record your voice, and transcribe later?

    It doesn’t even have to be you who transcribes it, although that give a better result.

  10. Jenny Davidson on #

    I am with you on this – v. fortunate not to be in position to NEED to either, but fairly certain that if I were to find myself there, I would have to develop some other way of going about it. (Like writing 2 sentences a day with a soft black pencil in huge letters and editing on that and then dictating into a file once it was polished that someone else would type up! Or developing some sort of weird sandbox where I could spell things in letters, photograph the sentences and ditto get someone else to deal with the keyboard part…)

  11. Jas on #

    Have you considered originating your work as an audiobook and having a human transcribe it? It surely wouldn’t *eliminate* typing, but it would cut it down to manageable proportions.

  12. Patti on #

    my crippled fingers hear you! but typing is so good for them, despite the pain. I originally took up computing because my handwriting is indecipherable, even by me.
    Sounds like you have a case for a little e-notetaker ( we used to call them dictaphones or something like that. (my daughter used one in lectures) …and a job for someone to transcribe your audio notes!! that could be fun. and expensive.

  13. John on #

    I have to agree. This has been my experience as well.
    (this is Dragon uncorrected.)

    Coyotes comment above, “the speaking voice interrupts and drowns out the voice in my head”, pretty much describes what happens to me. This, is only one of the many problems of speech recognition (which, don’t get me wrong I think is wonderful too).

    1. Accuracy. This has improved beyond measure in the last 5 years (for me). But it still doesn’t get near enough to 100% to be useful for creative writing. As you say, the 1st time you put a line down on paper it is an exploratory process. There is a feedback process happening as you do it as you try and find the correct choice of words. If the transcription of your 1st effort is not 100% accurate your already fighting a losing battle. I’m an illustrator as well, and it would be akin to every line you draw being in slightly the wrong place, and as you try to correct it THAT line is in slightly the wrong place, and on and on, recursively.

    2. The voice in the head problem. I’m not Barbara Cartland, or one of those authors that is simply reading off from the transcript in their head. If I was, voice recognition software would be the most amazing gift ever. I don’t know what I’m going to write until I start writing it (the texture of it, I mean). My brain just doesn’t work like that.

    3. Moving things around. So much of my writing style involves vast amounts of cutting and pasting; moving, refining, sifting words to try and find the joke or the meaning. This kind of thing is extremely awkward with voice recognition. The mechanics of selecting text, insertion points, copying, pasting, adding space, getting it wrong, trying it again, means you lose the thread of what you were attempting to do. Especially is, as me, that thread is extremely elusive.

    4. You still have to edit by hand. No matter how hard you try there is always something that will not work and you have to type to fix it. So you end up “oh well, I’ll just type it anyway”. And then you end up with really sore hands.

    5. Dialects or accents. These are bad enough when you’re trying to dictate (for instance, there is simply no way that my voice recognition software will ever be able to understand my pronunciation of the word “poor”) but move into an entirely new bracket of difficulty if one of your characters (or your entire book) is composed of characters with a variety of different accents.

    If anyone has any ideas on how to improve this process please let me know.

    I’m holding out hope for gigantic keyboards. If someone would make a virtual keyboard, say about 2 to 3 feet across using a laser and the Microsoft Kinect, then I could play the keyboards like a drum kit, using my arms rather than my fingers. That would work for me.

  14. Gukii on #

    Got the same problem with RSI and SpeechRecognition. If you got an iPad, give TypeWay a try. It allows to position your fingers any way you want for typing and adjust the iPad keyboard accordingly. Re-do once you feel fatigue, which will be fast on the iPad.

    Dragon on Mac sucks, works better on the PC. I found that macros work well to fill in difficult names. Still the “flow” part is holding me back with SpeechRecognition. It simply doesn t feel the same. Left/Right brain work is just different that way.

    Still, work needs to get done. When inspiration hits its best to just go and dictate through without looking back, and save the error correction for non-inspirational times. theBoom microphones are great.

    Good luck.

  15. stoatsandwich on #

    Thank you so much for writing this. I have chronic pain issues that make typing very difficult from time to time, and I’ve found that VRS and/or dictating to a human typist work fine for email or nonfiction writing, but are completely impractical for the way I write fiction. But you’ve explained the situation far more elegantly than I could have, and I’m rather selfishly glad that I’m not the only one.

  16. Kyle H on #

    What I want is software that will let me record my notes, and then later on convert the notes to text while I’m listening to the recording, retrieving the verbal braindump.

    Dictation software is good for those times when I’m pacing around, working through the things I’m up against verbally, without a notepad. It sucks for pretty much anything else.

  17. Lordevin J Gould on #

    Thanks to my parents, I have a vaguely German accent when excited; VRS fails me when I’m in the zone.

    I’ve tried eye tracking input as well. It distorts my words. The best I’ve come up with is recording my voice and paying to have it transcribed (yay local university). But I notice the difference in tone between PC and iPhone, pens or pencils — the medium affects how I write. VRS disrupts the flow; the author doesn’t quite sound like me, though I’m the one who chose the words.

  18. Michael Brian Bentley on #

    The solution is to record what you say two ways in parallel. First, record everything you say into a digital recording system. Second, feed what you said into VRS.

    Find a combination of straight recording and VRS that works best. I would think that the best thing you can do is to record your voice straight up, moving what you say back and forth, inserting voice, overwriting, that sort of thing; and then go to VRS with the result. You can watch the conversion to text in real time, and see that the words are transcribed properly. This step can be easily done by someone else.

  19. Sergei Lewis on #

    “it’s the constant up, down, left, right, page up, page down that has worn out my first and second finger tendons and right wrist”

    Here’s a completely left-field suggestion: you might consider trying out Vim as a text editor. It’s one heck of a learning curve, but the general design idea is that once you’ve invested the time it only takes one or two key strokes to move the cursor anywhere you want it in your text and perform whatever sentence / paragraph editing operation you need to do, and the most common keys are on the home row; so it’s much, much easier on the hands than navigating around and manipulating text by repeatedly hitting arrow/pgup/pgdown/home/end.

  20. Emy Shin on #

    I have recently started dealing with RSI problems — and cannot for the life of me use VRS in place of typing. Perhaps it’s because I have a heavy foreign accent, but my error rate is often as high as the example you’ve used here. I am in total sympathy.

  21. JM on #

    I don’t know if you might’ve already tried some of these things but a friend just posted about how he’s dealing with RSI on a Mac:

    http://chrisltd.com/blog/2012/02/preventing-rsi-mac/

    I’m assuming you have one due to previous posts about Scrivener.

    When I worked in visual effects, I had friends that also swore by these arm brace things that attached to their desks.

    Best of luck,

    JM

  22. Steve on #

    Thanks. I thought it was just my problem. One day I guess it will be good enough, but it isn’t yet.

  23. AliceB on #

    How about a *live* wombat? The pain from the teeth and claws digging themselves in to keep its balance would totally distract you from any pain you might get from typing. (Although you’d probably have to figure out what to do about the blood dripping into your eyes.)

    Seriously, though. Sorry. It’s crap. If only our bodies would cooperate with everything we wanted them to do.

  24. piedoggie on #

    what a bunch of noobs. 🙂 I’ve been using speech recognition since 1994 because, yes, RSI. Many of the complaints people have had either have been solved or have decent workarounds.

    Problems with speech recognition:

    1) Nuance: yes, the wonderful creator and maintainer is our biggest problem. It comes down to it, if you can’t make them money, they don’t want to hear from you. us cripps are nothing but a drag on the bottom line so, “screw us”.

    2) speed: usually, this is your own damn fault. The two most common problems are not enough memory and lousy audio. I’m running with 4 GB in my laptop and I could make use of 8 GB without trying hard. 16 GB would be good enough for most days this year. As a result, I’m always paging to disk and the system runs really slow. If I use Microsoft’s ready boost, it works better but still not as wonderful as a system full of RAM. I know, Nuance didn’t tell you about this. They wouldn’t because it’s not in their own interests to do so.

    terrible audio will make a machine sit there going “huh? What did you say?” on all the audio until finally figures out what you said… maybe.

    3) editing: yeah it sucks. It mostly sucks because people try to do GUI and keyboard appointed tasks/operations with speech. ***Stop trying to speak the keyboard and mouse*** you have got to be out of your flocking mind to try new things the same old way with a tool that’s radically different.

    3a) editing environments: if you are dictating into a window that NaturallySpeaking doesn’t recognize and give you full select and say capability, you might as well take a stick with a bunch of rusty nails in it and beat yourself even harder. Nuance apparently has chosen to leave in bugs that make dictating into non-standard Windows (standard being their term) instead of something useful like it had been in the past (version 6).

    3b) editing least resistance: I’m a fine one to talk about this but if you come to know and love DragonPad and dictation box for editing chunks of text, you will become much happier. Second best is using speech adapted editors like WordPad and Microsoft Word. In theory, open office has been included in 11.5 but I haven’t tried it yet.

    3c) editing futures: http://blog.esjworks.com this is something I’m doing for code. I need someone who knows Windows edit controls who can enlighten me as to how to make the basic pieces work. I have someone who knows Emacs was willing to help in the editing mode.

    I’ve had some discussions with a couple of people about how to do the same thing for English how to make it easy to edit without wearing out your voice. Initial version unlike any other editor you seen the flow feels right.

    3d) editing strategies: as I said earlier, don’t try to speak the keyboard or mouse. Work in big chunks because that’s what speech is good for. If you can’t use Select-and-Say to alter something small, just say it all over again. NaturallySpeaking becomes more accurate/compliant when you have corrected some text and you say it again. Yes, it really is easier to say a paragraph over again, use Select-and-Say correct misrecognition errors that it is to try and go back later and make you change the same all the way only using your voice.

    4) flow: I’m also an amateur fiction writer and flow is no longer a problem for me. The first thing to remember is that you need to change. When your hands when bad, the world said “fark you” and you have to adapt. For me the first major adaptation was eliminating old ways of thinking about computer use. Even Nuance reinforces speaking the keyboard with their macros. Don’t do it. It’s really really stupid so don’t speak the keyboard.

    the second major improvement to flow was doing what I call learning how to speak written speech. Sounds kind of silly but we communicate using spoken speech, yes, dialogue is also spoken speech but it’s not really.

    I became aware of this difference when about six months or year after starting using speech recognition, I could feel my mind changing and found I was speaking written speech. my flow of writing became easier.

    The second level of piece of flow came when I accepted that it was easier to dictate entire paragraphs over rather than try to rearrange them. Most word processors just don’t have the level of granularity you need to work with text.

    I should probably edit this except for the fact that speech recognition and browser windows really don’t get along.

  25. piedoggie on #

    and one more thing. It’s not voice recognition software. Its speech recognition software. Recognizing your voice means a computer knows who you are, recognizing speech means the computer knows what you said.

  26. RB on #

    Wow!

    Great insight here. Is there a line (e.g. a short children’s story compared to a lengthier YA story) where VRS is not worth the hassle? Or is it primarily based on user’s experiences, disruption to creative flow (regardless of length) and/or preferences?

  27. Mitja on #

    This tip might be new: get a foot switch and use your feet for the most stressful keys. Mine are backspace, Shift (right) and enter.

  28. piedoggie on #

    @rb

    actually, a short message in a Web comment areais sometimes not worth the hassle. From the air in the previous line, try to figure out what caused it. Hint, it was a single click to start the sequence of failure.

    the scope of what corrections are easy two-way voice really depend on the environment. If you are writing in DragonPad or dictation box, corrections of even a single word are easy albeit tedious. If using any other environment, you’re better off replacing whole sentences or even paragraphs when rewriting.

    In the stories below, which I’ve written entirely using speech recognition and I would say about three quarters of the revisions were also entered using speech recognition. Lots of small errors have been corrected by hand because the editing environment does not support NaturallySpeaking.

    For what it’s worth, these are stories I wrote a few years ago but serious life crap (eight major life changing events in 10 years) got in the way. I’m hoping that things will improve in the next few days (touch wood) and I will be able to more effectively participate in my writers group.

    you need to use flash to view these works. I’m going to change it over to the opera paged reader real soon now.

    https://acrobat.com/#d=Oz0ik6r6sqpDSimSSMCbgQ
    https://acrobat.com/#d=GZs1YOugW8QsmWz0ewTdSg

  29. RB on #

    @ Piedoggie – Many thanks!

Comments are closed.