MARC Misconceptions: Which Language of Sound Track? Use of the 041$a

This post continues a series on MARC Misconceptions, exploring the mismatch between what we want from our data and how we’ve designed it. These may caused by failings of MARC as a data standard, the content standards and practices we’ve developed over time, or our own assumptions. The first post in the series was about the “Incorrect ISSN” subfield, 022$y.

In this post, I’ll be looking at something I’ve talked about before, the use of the 041$a on bib records for DVDs (also Blu-rays and streaming video which offers multiple audio tracks, I’m just shorthanding here) and implications for its indexing. I’ll add on the implications for its use in assessment, because this has come up over the past year.

Technical Definition of the 041$a

The 041 “Language Code” contains:

Codes for languages associated with an item when the language code in field 008/35-37 of the record is insufficient to convey full information…

So to backpedal briefly, the 008 field contains 3 bytes, 35-37, which can be used to record the language of the work. It might be “eng” or “mul” for multiple languages or “zxx” for No linguistic content.1

The 041 contains numerous subfields for recording things like language of subtitles or language of sung/spoken text or language of captions.2 For audiovisual materials, it should be used when:

The sound track has different language versions
The accompanying sound (discs, tapes, etc.) has different language versions
The overprinted titles (subtitles) or separate titles for silent films are in different languages
The sound accompanying a work is in one language and the same text is printed on the work in the form of overprinted titles in another language
The accompanying printed script for works with no sound or, if with sound, no narration, is in different languages
The medium of communication includes sign language.

The 041$a, specifically, is for: Language code of text/sound track or separate title. The first instance would reflect the 008/35-57.

Use of the 041$a in DVDs

The data standard is complemented by the various content standards for use. So let’s look at what’s in the OLAC CAPC Video Language Coding Best Practices. Again, I’ve chosen the oldest readily available through an authoritative source because I’m interested in practice over time.

Let’s look at this example:

    English language film with English, French, or Spanish soundtracks; closed-captioned in English;
    optional subtitles in English, French, Spanish, Portuguese, Chinese, or Thai. English packaging
    and menus

    008/lang eng
    041 1# $a eng $a fre $a spa $j eng $j chi $f fre $j por $j spa $j tha $h eng
    546 ## Closed-captioned; English or dubbed French or Spanish soundtrack; optional
    English, French, Spanish, Portuguese, Chinese, or Thai subtitles

Most important to use are the first two lines, the first showing that it’s originally in English and the second showing what languages it can also be watched (heard) in. This includes English but also French or Spanish, the most common alternative dubs for DVDs sold in the US.

When taken in the context of the whole record, is it easy to determine the following:

  • The primary language of the work is English
  • The work can be viewed in English, French, or Spanish

But They’re Optional

Unlike the 022$y, this is not a case where I think cataloging practice standards absolutely screwed up. I think the errors arise from assumptions made by people making use of the field.

However, I do want to take a moment to note the fundamental difference between two kinds of dubs.

Quite a few Gundam Wing VHSes passed through my hands when I was a teenager. These were all English dubs.3 So for these works, the 041$a would have been simply “eng.” 041$h didn’t exist in the 90s, but it’d be “jpn” if they were cataloged now. The only anime I’ve owned as an adult is the Princess Tutu DVD box set. It’s got a special place in my heart.4 These DVDs have both English and Japanese tracks. I’ve watched them once in Japanese, but mostly watch in English. That’s why I’m including here, because they’re the only thing I personally watch in a dub vs. with subtitles. For them, the code should be: $a jpn $a eng.

I have no idea what the conversations were when the first DVDs were released and folks looked at this practice, but I am assuming that they focused on past practice for dubs and the fact that the language of these materials could be any of the dubs. But it would’ve been really, really, really nice for those of us trying to use this data if someone had come up with a subfield to record something like “Language code of optional dubbing.” 041 is busy but there are some letters available! That would’ve been thoughtful and created more granular data.

Implications for Its Use

Language codes, as opposed to longer free-text fields, are perfect for building advanced search and faceting indexing. However, a misunderstanding of what they mean can lead to search results that are likely not what the average user desired.

As I asked in my MARC Records vs. the Catalog presentation: why did a search for Spanish language DVDs in someone else’s catalog bring back the 2021 Dune?

A slide asking why a search for Spanish language DVDs in someone else’s catalog brought back the 2021 Dune and then showing that the 041 contained repeating $a of eng, fre, and spa

There are two questions that a patron may be asking when faceting a film search to Spanish:

  • What Spanish-language films do you have?
  • What films do you have can I watch in Spanish?

These are two pretty different questions. One is a cultural search and one is an access need search.

This is based on speculation vs. user testing, but I believe that most people who are actually applying a facet or using an advanced search field when looking for something to watch are asking the former. Someone asking the latter seems more likely to look up a film they wanted to watch and check whether they can do so in Spanish. Fortunately the 546 field provides that nice textual version of the information so, upon finding that we have a DVD, a person could quickly check whether it has Spanish dialogue. I put that field pretty high in our display, just under availability, because it could be an important point of assessment.

The other area where this field sometimes gets used is in collection assessment. How many of our titles are in which languages? I encountered this during the past year, when leading a BTAA group assessing the Colorado Alliance of Research Libraries’ Gold Rush collection assessment tool. In addition to chairing the committee, I led the MARC field assessment subgroup and we flagged this as something which needed to be changed.

For most materials, the 041$a works fine! But while a patron might have any of a number of motivations, if the library itself is assessing language across its collections, we are always thinking about the primary (not necessarily the original) language of the material, not DVD dubs.

Using the 041$a for assessment could lead us to believe that we have, say, a much better Spanish film collection than we actually do. In the unnamed catalog above, a full 18% of the DVDs and Blu-rays are apparently in Spanish. 14% are in English AND Spanish. This leads me to believe that about 4% might be considered Spanish language materials for the most common purposes of collection assessment. But who knows, maybe another 5% are originally in Spanish and contain English dubs! Either way, it’s impossible to tell and would throw off our numbers for the whole collection.

Recommendations for Using Language Codes

So, the reason that practice of putting optional dub languages in the 041$a is a frustration but not a disaster is that we’ve still got the 008/35-37.

If you have access to conditional logic, I recommend the following:

  • Index the 008/35-37.
  • If it is “mul” or blank or invalid, check whether the record has an 041$a and index it instead.

For many materials, the 041 isn’t even used because the work and any supplementary materials are entirely in one language. While its existence implies that there should be more to index than what’s in the 008, as we’ve seen above, there may not be anything that’s useful to us. When “mul” exists, we want to get that data. And if the 008’s data is somehow unusable, there may not be an 041 but it’s probably worth using if there is.

If you’re trying to get a report directly out of your system (vs. querying an external database or catalog or tool like Gold Rush), it may be a little more complicated. I’d run my desired report twice:

  • On the first pass, I’d get results with the 008/35-37 values and exclude “mul” (and ideally anything invalid).
  • On the second, I’d query for anything with 008/35-37 with “mul” and then add the 041a to my report.

… and then I’d merge the two using whatever tools seemed appropriate (OpenRefine? Python? Just reformatting the second one to paste below the first or vice-versa?). Gold Rush is based on Blacklight software, so I was simply able to propose the solution we use here.

Is There a Place for the 041$h?

The 041 includes subfield $h for Language code of original (R). Could we use this anywhere?

The short answer is: not for this particular concern.

$h was added in 2011, long after DVDs came into our catalogs, so coverage will not be complete for those kinds of collections. One could, in theory, test all a/v media to see if the $h is also present in the $a and then assume that field to be the primary language of the work. One would probably be correct. But the 008/35-37 is already there to answer the question about the primary language of the work in hand (vs the original). If a record is missing that, it doesn’t seem likely to me that it’ll have anything as added-value as the 041$h.


  1. Most often used for music without words but could be used for something that was entirely printed art with no linguistic content. ↩︎

  2. Which are a different Thing than subtitles—at least we got this one right. ↩︎

  3. It’s strange to have the memory of being really into Gundam, briefly, but not have strong memories of what or why like I do of Deep Space Nine, which I’ve been a fan of every since. ↩︎

  4. Recommend extra hard for people who used to dance ballet or who like folk stories. They start off kind of straight forward and then go absolutely off the rails (delightfully and surreally). It’s the kind of story where everyone gets saved, even if we don’t like them up front, and those are good for the heart sometimes. ↩︎