There’s a myth running around that, pretty soon, human transcriptionists will be replaced with speech recognition technology.
But there are a number of issues with that assumption — not least because software isn’t yet geared towards oral or speech-based accuracy.
In other words, until machines can capture the finer subtitles of human language — including colloquialisms, inflections and tone — transcriptionists will be required to capture and parse what is being said as well as, at times, what is intended.
Case in point: When using four separate pieces of audio transcription software, a professional transcriptionist, Mary Ellen, put 15 minutes of a podcast to the test. When comparing overall quality and the number of errors, she found out that the transcriptions were “good enough” but missed many filler words.
Her verdict? That even amongst two of the “three star tools” she tested, the transcribed document produced would still need a second review, further cleaning up and polishing by a transcriptionist anyway.
Why is transcription so tricky for software? What is it about a human transcriber that can bring value and clarity to the entire transcription process?
Background Noise
Audio quality and background noise often present a problem for transcriptionists. They’ll have to rewind, playback and go through the recording, piece by piece, step by step, to uncover parts that are particularly unclear or garbled.
Human transcriptionists can make out voices and words over background noise. They can automatically catch pieces of speech that may otherwise get lost in translation, so to speak.
Accuracy in transcription relies on more than the recognition of oral words into script.
It also requires someone to filter out the extraneous noises and make a judgement call when it comes to capturing words that may trail off due to background noise or audio quality.
Distinguishing Accents and Dialects
Even if an individual is not familiar with a particular accent or dialect, our natural ability to search for and land on human speech in our communication makes humans expert communicators.
We have a tendency to move towards a common language and this allows us to reach for understanding and comprehension, even when it comes to dealing with unfamiliar accents or dialects we may be unfamiliar with — as long as the base language is shared by all.
This means that even if speakers in an audio recording have a heavy accent, human transcriptionists can use their own natural ability to filter words, coupled with their professional skills to capture meaning and convey it through transcription.
Fact Checking in Content
While transcriptions are usually about directly translating recorded speech on text, there are times when a speaker on the audio either says something nearly indistinguishable or when the speaker says something that seems out of place from the rest of the audio.
While the recording is clear, a human transcriptionist can actually go the extra mile and either check the facts behind the content in the recording, or use the content within the recording to fill in the blanks.
This is especially useful in the case of content that is confusing or trails off. Once the transcriptionist has gained some context or knowledge about the topic, they can infer and better forecast what is being discussed and what the speaker may be about to say.
Gaining just a bit of background knowledge can actually help them anticipate, listen and transcribe with greater clarity.
Technical and Business-Specific Content
Human transcriptionists can have their own areas of expertise, niches where they have in-depth background knowledge on a topic. For example, a transcriptionist used to legal jargon can better translate acronyms and citations for bylaws, marking these down in a natural way.
If a transcriptionist works for medical-based businesses or does a lot of IT work, they will be familiar with the lexicon that is specific to these industries. Transcriptionists have memory and this memory learns and accumulates knowledge the more they encounters special terms.
This knowledge is invaluable when it comes to creating a comprehensive transcription document. The same is true for business-specific content — if a transcriptionist is working with industry jargon, they can accurately convert key terms or internal company language that would be otherwise foreign to voice-recognition software.
Multiple Speakers and Overlapping Content
The value of working with a human transcriptionist is not just about the ability to filter out language or the natural tendency we have to intuit meaning from pure communication. As humans, we have the ability to keep up with several threads of conversation in one recording, dialogues that move back and forth between multiple participants.
Let’s say, for example, that you had a recording of a conference or a speakers’ panel for your business. While the recording is clear, there is more than one speaker and they’re all interrupting each other, engaged as they are in a professional debate.
Where audio transcription software might get confused, mixing one individual’s voice and information with another, human transcriptionists can identify various speakers on an audio recording through tone, voice, accent and other speech cues — such as the tendency to repeat a particularly idiosyncratic turn of phrase or word.
We can make out overlapping content and properly separate speakers, giving each their own role — and their own lines — in a transcription document.
The value that a human transcriptionist brings to audio recording work is more than simply about being able to note the difference between “too,” “two” and “to.” A transcriptionist can and will go above and beyond in their line of work, creating a document that is not only well-translated but infuses meaning and flow within a body of work.
I am much faster at transcription than I am at editing a speech recognition document. I wish things were like they used to be instead of companies using speech recognition and paying their transcriptionists less money for having to edit it when we could have typed it faster in the first place and made less errors.