A museum installation titled 'Electrostatic Bell Choir' featuring a large wall of stacked vintage cathode-ray tube televisions from the mid-20th century. Each TV screen displays a small suspended pendulum or bell mechanism, suggesting the sets generate electrostatic fields to animate the bells. The televisions vary in size, color, and style — ranging from cream and yellow to dark wood-toned models — and are arranged in a roughly pyramidal formation against a dark wall. A rope barrier in the foreground indicates this is a gallery exhibit.
A museum installation titled 'Electrostatic Bell Choir' featuring a large wall of stacked vintage cathode-ray tube televisions from the mid-20th century. Each TV screen displays a small suspended pendulum or bell mechanism, suggesting the sets generate electrostatic fields to animate the bells. The televisions vary in size, color, and style — ranging from cream and yellow to dark wood-toned models — and are arranged in a roughly pyramidal formation against a dark wall. A rope barrier in the foreground indicates this is a gallery exhibit.

Common Conundrums

Answers to common questions at my weekly Office Hours.

Common Conundrums

Answers to common questions at my weekly Office Hours.

Why your digital interview archive isn't accessible (and how to fix it)

By Chris Pandza on February 18, 2026

A central challenge while building the Obama Presidency Oral History's digital archive was producing interactive transcripts and high-quality captions for over 1,000 hours of video interviews.

To do so, I needed to synchronize our video with our human-written transcripts—a process that can produce caption files, power interactive transcripts, and allow users to navigate interviews by reading and clicking rather than guessing their way through a recording. Interviews without synchronized transcripts and captions create a clunky experience for all visitors and preclude equivalent access for many.

When I looked to see how other large oral history archives were handling captioning, I found a surprising answer: most weren't. Oral history has long privileged transcripts as the interview's authoritative record, but a static transcript, however carefully produced, does not help a user navigate and understand time-based media. A recording captures far more than is encoded in a transcript (including pacing, gesture, laughter, and expression). WCAG—the international standard underpinning digital accessibility law in the United States many other jurisdictions—agrees.

Manually synchronizing interviews is often prohibitively laborious, and automated captions are often inaccurate and non-compliant. As updated accessibility standards go into effect in the United States, some archives are choosing to take their media offline to stay compliant—but that doesn't have to be the case.

For many archives, the fix is closer than it appears.



A screenshot of an oral history interview interface from the Obama Presidency Oral History project. On the left, a video player shows Bill T. Jones, a dancer and choreographer, speaking during a video call. He is an older Black man with short gray hair and glasses, seated in a wood-paneled room. The video is paused at 18 minutes and 22 seconds, with a subtitle reading 'You asked me, "What did it feel like?" It felt like I.' On the right, the transcript panel is open, showing Jones reflecting on his experience at a White House arts ceremony — describing feeling proud but also 'sticky and a little soiled' after being told to be more disciplined and 'be like Mr. Obama.' The interview was conducted by Terrell D. Frazier.
A screenshot of an oral history interview interface from the Obama Presidency Oral History project. On the left, a video player shows Bill T. Jones, a dancer and choreographer, speaking during a video call. He is an older Black man with short gray hair and glasses, seated in a wood-paneled room. The video is paused at 18 minutes and 22 seconds, with a subtitle reading 'You asked me, "What did it feel like?" It felt like I.' On the right, the transcript panel is open, showing Jones reflecting on his experience at a White House arts ceremony — describing feeling proud but also 'sticky and a little soiled' after being told to be more disciplined and 'be like Mr. Obama.' The interview was conducted by Terrell D. Frazier.
A screenshot of an oral history interview interface from the Obama Presidency Oral History project. On the left, a video player shows Bill T. Jones, a dancer and choreographer, speaking during a video call. He is an older Black man with short gray hair and glasses, seated in a wood-paneled room. The video is paused at 18 minutes and 22 seconds, with a subtitle reading 'You asked me, "What did it feel like?" It felt like I.' On the right, the transcript panel is open, showing Jones reflecting on his experience at a White House arts ceremony — describing feeling proud but also 'sticky and a little soiled' after being told to be more disciplined and 'be like Mr. Obama.' The interview was conducted by Terrell D. Frazier.

The Obama Presidency Oral History digital archive allows users to navigate video by clicking on an interactive transcript (and vice-versa).




If your interviews are already transcribed, you're most of the way there.

If your archive already has high-quality transcripts, you are nearly there. The biggest challenge is producing the transcript itself; getting that text to "talk" to your media is much easier.

Forced alignment is a computational technique that takes an existing transcript and an audio or video recording and automatically synchronizes them, producing a timestamped, word-level alignment that can be exported as a standard caption file (SRT, VTT, or similar formats). Until recently, producing this kind of alignment was expensive and labor-intensive. Gone are those days!

For those without experience coding in Python, the most accessible path is using a consumer product like Descript. Descript transcribes audio and video automatically, but—crucially—it also has a "replace transcript" feature that allows you to import an existing human-produced transcript and align it to your media without any coding. Once aligned, you can export captions and timestamped transcripts with ease.

For archives with more technical capacity, open-source tools offer options that are powerful and free. The Montreal Forced Aligner (MFA) is a free, widely used tool that produces high-quality word-level alignments and can be run in batch across large collections. It requires some comfort with the command line but is well-documented and actively maintained.

Forced alignment is a well-understood problem in computational linguistics, and there may be people at your institution or in your community who have worked with these tools. A small collaboration (even a student research assistant with the right background) can dramatically reduce the time and cost of processing a large backlog.

Of course, there are limitations. Forced alignment tools can struggle with accented speech, poor audio quality, less-common languages and dialects, and overlapping speakers—common realities in interview collections. But for most archives, results are reliable enough to serve as a strong starting point for review and correction.

Without transcripts, how should you proceed?

If you don't have existing transcripts to work off of, creating captions is a heavier lift—though it's never been lighter.

I don't recommend relying on automatic transcription tools like Otter or Whisper alone, as they rarely yield sufficient quality transcripts or captions. However, these tools can create great first drafts that can be brought up to speed with some focused auditing and editing. Some products, like Descript, can be used to generate an initial transcription, edit text and media, and export synchronized transcripts and captions (making the workflow seamless).

Compliance doesn't have to mean taking content down

The instinct to remove non-compliant content from public access is understandable, but it's not an ideal outcome for the stories archives hold nor the communities archives serve. An interview that has been offline for years because of a missing caption file is a disservice.

Captions and transcripts are not the be-all and end-all of accessibility in interview archives, but they are usually the most significant barriers to access. Audio descriptions, keyboard-navigable interfaces, and sufficient color contrast are other avenues to explore when trying to build a digital archive that works better for everyone.

For most archives with existing transcripts, a path to compliance exists, and the tools to get there are more accessible than they were even a few years ago. Synchronized transcripts and captions don't just satisfy a compliance requirement—they make your archive more navigable for all users.

Psst—every week I hold free office hours for nonprofit organizations trying to do more with their archives.

Pictured in header: Electrostatic Bell Choir (2013) by Darsha Hewitt. Photo by Joanne Clifford, April 7, 2019.