Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Editing Auto-Generated Captions

Media has become a foundational tool for communication. However, the media we produce must include accurate closed captions to be useful and accessible to our audiences. Even if we know captions are important, how do we get started adding them? This guide will walk you through one method to produce high-quality captions: beginning with auto-generated captions provided by a video hosting platform and editing them for accuracy.


Table of contents


Who is this guide for?

This guide is designed for people new to providing closed captions on video. This guide will help you make your media accessible, even if you don’t have the budget for professional captioning or transcription.

About auto-generated captions

Auto-generated captions, also known as automatic captions or Automatic Speech Recognition (ASR) captions, use machine learning to turn audio into written text. These captions can be accurate. However, they are not accurate enough to meet accessibility guidelines or users’ needs. Auto-generated captions need a human editor to fix mistakes and add elements like speaker names or sound effects to the captions for full accessibility.

When to add

When producing a video, build in time towards the end of your project for editing captions. Be sure that you’ve made all needed edits to your media, then upload it to your video hosting platform of choice (for example, YouTube). If you edit your captions, and then need to change your media, you will have to start over. Your video platform will take some time to generate captions. Once complete, you can edit them as a final step before publishing.

Platform-specific resources

This guide is intended to help you know what to edit and add to auto-generated captions, regardless of the platform you use. For details on how to find and edit auto-generated captions on a specific platform, visit the following resources.


Example: Before and after editing

Compare the closed captions on the two videos below. The first video uses captions automatically generated by YouTube without any human editing. The second video shows the same set of captions after the captions have been edited and enhanced. Try watching the videos without sound. What changes between the two caption tracks make the captions more useable?

Video with unedited captions

Text of unedited captions that whole process of galaxies forming and evolving over 13 plus billion years uh we've learned a lot about that but we're really missing a key piece of the puzzle which is how galaxies got their start so that's the piece that we haven't seen yet and that's the piece the James web Space Telescope will allow us to see for the very first time so the first stars in galaxies are really the big mystery for us we don't know how that happened we don't know when it happened we have a pretty good idea that they were very much larger than the Sun and that they would burn out in a tremendous burst of Glory in just a few million years which is really very short but they would also prepare the way for further generations of stars like the sun to be formed so those first Stars would produce the chemical elements of Life carbon and oxygen and nitrogen and iron and sulfur and calcium and all the things that were made of would have been produced in those first generations of stars that then explode and Liberate the material back out into space so the next generation of stars could form with planets with solid bodies and possibly have life

You can also view this video on YouTube: Unedited captions example: Evolution of galaxies

Video with edited captions

Text of edited captions

(Dr. Straughn)
That whole process of galaxies
forming and evolving over 13+ billion years,

we've learned a lot about that,
but we're really missing

a key piece of the puzzle,
which is how galaxies got their start.

So that's the piece that we haven't seen yet,
and that's the piece

the James Webb Space Telescope
will allow us to see for the very first time.

(Dr. Mather)
So the first stars in galaxies
are really the big mystery for us.

We don't know how that happened.

We don't know when it happened.

We have a pretty good idea that
they were very much larger than the Sun,

and that they would burn out
in a tremendous burst of glory

in just a few million years,
which is really very short.

But, they would also prepare the way
for further generations of stars

like the Sun to be formed.

So those first stars would produce
the chemical elements of life,

carbon and oxygen and nitrogen
and iron and sulfur and calcium

and all the things that
we're made of,

would have been produced
in those first generations of stars

that then explode and liberate
their material back out into space.

So, the next generation of stars
could form with planets,

with solid bodies,
and possibly have life.

You can also view this video on YouTube: Edited captions example: Evolution of galaxies


Guiding principles

When editing captions, best practices and guidelines can help you craft useful and accessible captions. However, all caption editors will have to make judgement calls about what to include and how to describe something. When making decisions, keep these principles in mind.

  • Accurate: Errorless captions are the goal for each production.
  • Consistent: Uniformity in style and presentation of all captioning features is crucial for viewer understanding.
  • Clear: A complete textual representation of the audio, including speaker identification and non-speech information, provides clarity.
  • Readable: Captions are displayed with enough time to be read completely, are in synchronization with the audio, and are not obscured by (nor do they obscure) the visual content.
  • Equal: Equal access requires that the meaning and intention of the material is completely preserved.

Elements of Quality Captioning from the DCMP

Tip: Don’t let perfect be the enemy of good

Writing captions for the first time may feel overwhelming. As you begin, don’t focus on memorizing all the specific guidelines and rules. Rather, edit with your audience and these five principles to guide you, and you’ll be able to learn the nuances over time.


Proofreading

Punctuation

Depending on the platform used, the auto-generated captions may or may not have punctuation added. Captions should be punctuated to promote readability and understanding. If your captions do not have punctuation, add it as you edit. If your captions have punctuation, make sure that the placement is correct. Occasionally, the caption generator will misinterpret pauses and add punctuation that does not reflect the message being communicated.

We do not always speak in fully-formed, grammatically-correct sentences. Use your best judgement in punctuating captions. In general, simple punctuation and multiple shorter sentences are better.

Tip: Think about display

Remember, only a portion of the sentence will appear on screen at a time. Use parenthesis, colons, and semi-colons sparingly. Your edits should support understanding, not distract.

Spelling

Check the spelling of words, and that the auto-generated captions have the correct word. Some caption editors will have spell check built in, but don’t rely fully on this.

Common errors

Watch for these common errors in your captions.

Tip: Search for repeated mistakes

If you notice that a mistake is repeated, many caption editors have a find and replace function. For example, if a name appears misspelled the same way throughout captions, you can use Find or Find and Replace to quickly identify and edit all similar mistakes.

What if you can’t tell what a word/sound is?

Sometimes the audio quality or background noises make it difficult to tell what was said. Make a best effort to figure out what the missing word is. Consult a second person (or the speaker) if possible. If the word is still not identified, add [inaudible]. Do not guess.

False starts, filler words, repeated words, and more

Many speakers may use filler words (um, uh) or repeat the same word multiple times in a row while they speak. They may also make a false start in their speech, then start over. These elements do not need to be captured in captions.

Captioning conventions vary on how they approach these situations. Make editorial choices that prioritize understandability and readability of the captions.

General practices

Example of quoted material

Text of captions with a quotation

(Jonny Kim)
And my favorite quote when I think
about purpose-driven service

is from the late Dr. Martin Luther King Jr.,
who said that,

"Everyone has the power for greatness.

"Not for fame, but greatness,

"because greatness
is determined by service."

And I love this quote
because Dr. King got it right.

Is that our lives are fulfilled,
and the lives of the people we love

are fulfilled when we have
a purpose-driven life.

You can also view this video on YouTube: Captions with quotations example


Additions

In addition to fixing errors made in the auto-generated captions, there may be additional information you need to add to the captions to make them useful.

Speaker names

If you have more than one voice speaking, you should capture who is speaking. To do so, add the speaker’s name before the caption segment each time the speaker switches. There are several standard ways to denote a speaker:

(Speaker Name) 
Captioned line of text

or

Speaker Name: Captioned line of text

Whichever method you choose, keep the following points in mind.

Crosstalk

Crosstalk, when two or more speakers talk at the same time, can be another challenge to capture in captions. There are several approaches.

Example of capturing speaker names and crosstalk

Text of captions with multiple speakers
Unedited captions: Multiple speakers

so crew doesn't have to actually place a
camera the ground can just fly up the crew

can take all of the Selfies that they want
that's right yeah there's actually there is

a touchcreen on the front also so that crew
can also interact with the robot oh cool

Edited captions: Multiple speakers

(Maria)
So, crew doesn't have
to actually place a camera.

The ground controllers can just fly up--

(Host #1)
So the crew can take all of the selfies that they want?

[Laughter]
[Crosstalk]

(Host #2)
That's actually the goal here,
I think.

(Maria)
There is a touch screen
on the front also

so that crew can also interact
with the robot.

(Host #2)
Oh cool!

You can also view this video on YouTube: Captions with multiple people speaking example

Added sounds

Auto-generated captions will generally only capture speech. For captions to be equal, non-speech sounds should be included. Other sounds like music or sound effects should be added to captions in brackets. For example [door slamming]. There are several ways you might capture music and sounds in captions.

Example of music in captions

Text of captions with music

(Florence Tan)
To commemorate SAM's birthday
and Curiosity's birthday on Mars,

we decided to play a little song.

If there's anyone listening on Mars
on this special occasion,

you will hear this.

[Computer tones playing Happy Birthday song]

(Florence Tan)
It's really neat,
and it's exciting.

This is a first for NASA,
and for the world.

And music brings us all together,
so this is fun.

You can also view this video on YouTube: Capturing music in captions example

Do I need to capture every sound?

Use your best judgement with capturing sound. Your goal is to provide an equivalent experience for someone reading captions and someone hearing. If too many sounds are captured, there may not be enough time to read the captions. For example, if an audience member coughs during a speech, that cough may not be important to capture. However, if the speaker stops and reacts to the cough, the cough would be useful information to capture.


Display and formatting

When editing auto-generated captions, many timing and formatting decisions will already be made for you. As you add to and edit captions, keep the following guidelines in mind.

Length

Each caption segment should be one to two lines. Information like speaker name may be placed on a third line. A caption segment should display one to six seconds.

Line breaks

When breaking up lines within a caption segment or between caption segments, put breaks in logical places.

Timing

The caption generator should set up the caption timings for you. You can manually adjust the start and end times as needed. If the timing is off for the entire video, it is better to start again.

Position

Many of the caption generators mentioned in this guide do not have settings for the position of the captions. If you can change this, make sure captions don’t cover up other text. Keep them in a consistent spot.

Tip: Plan for captions

When producing your video, find out in advance where captions will be placed in the video platforms you will use. Make sure nothing appears in this space that will be covered up by captions (such as title cards, etc.) Alternatively, leave black space below the video for captions to display.


Other approaches to captioning

This approach to editing captions is just one method. You may wish to consider alternative approaches based on your budget and workflow.

Resources

This guide is based on the Described and Captioned Media Program’s Captioning Key. While this guide will get you started in editing quality captions, the list of recommendations is not exhaustive. Consult the Captioning Key for more details.

Credits

Videos used for examples are published by NASA and are in the public domain. Specific videos include:

This site uses Just the Docs, a documentation theme for Jekyll.


Site created by Kaia Sievert. This work is licensed under a CC BY-NC 4.0 license, which means you are free to reuse the content for non-commercial uses with attribution to the original project.