What Is a Controlled Vocabulary?

Jul 9

A controlled vocabulary is a foundational tool in metadata creation and information organization. According to the Society of American Archivists, a controlled vocabulary is an enumerated list of terms preselected from natural language and chiefly used to aid discovery in information retrieval systems. These vocabularies may be locally managed or widely shared, depending on the needs and authority of the institution.

Controlled Vocabularies and Taxonomies

Controlled vocabularies are closely related to taxonomies, which structure terms into hierarchies or relationships. They work hand-in-hand with metadata schemas, which provide the framework or categories for describing an object. Together, controlled vocabularies and schemas help standardize description, improve discoverability, and connect collections across institutions.

Why Use a Controlled Vocabulary?

Why use a controlled vocabulary? It brings standardization to your metadata practices. Instead of using several variations of a term or name, controlled vocabularies enforce consistency, making it easier for users to search, discover, and connect related items. They also facilitate interoperability between systems and institutions. For example, using Library of Congress Subject Headings (LCSH) or the Getty Art and Architecture Thesaurus (AAT) can make your materials more discoverable through union catalogs like WorldCat.

Where We See Controlled Vocabularies

Controlled vocabularies show up in places you might not expect. They are embedded in library catalogs, finding aids, and cooperative databases like the Social Networks and Archival Context (SNAC) Cooperative. They even power Wikipedia authority control tables. These vocabularies also surface in the descriptive text of exhibits and item records, whether on websites, spreadsheets, or cataloging systems.

How Controlled Vocabularies Work with Metadata Schemas

To understand how controlled vocabularies interact with metadata schemas, picture a metadata schema as a series of compartments or fields. Each compartment holds a different category of information: for example, the title, date, creator, or subject of an item. Controlled vocabularies define what values are valid for those fields. Think of cataloging a garden: one schema might include fields for flower variety name, petal color, and garden location. The schema dictates the fields, while the controlled vocabulary provides the approved color names.

Even if your metadata schema changes and includes more fields, such as stamen or leaf vein color, the same controlled vocabulary can still apply. You can also use multiple vocabularies within one metadata record. For example, a museum cataloging a painting might use LCSH for subjects, AAT for object types, and the Union List of Artist Names (ULAN) for the creator.

Metadata Schemas in Practice

Different metadata schemas exist for different contexts. Dublin Core is a simple schema with fifteen fields, intended for broad use. VRACore, which builds on Dublin Core, has more detailed fields and is particularly suited for describing visual materials like artworks and architecture. VRACore, for instance, distinguishes between the date an object was created and the date it was photographed. This level of specificity is not dictated by the controlled vocabulary, but by the schema structure.

Common Metadata Challenges

Controlled vocabularies help solve common metadata problems. Names are a perfect example. A person may be referred to in multiple ways: with a middle name, initial, maiden name, or abbreviation. Similarly, descriptions of concepts can vary wildly: photograph, b and w photograph, black-and-white photo, and bw photo might all describe the same item. Controlled vocabularies unify these variations, improving search accuracy and reducing redundancy.

Examples of Controlled Vocabularies

Examples of well-known controlled vocabularies include LCSH and LCNAF (Library of Congress Name Authority File), the AAT, ULAN, the Thesaurus of Geographic Names (TGN), and the Medical Subject Headings (MeSH). Each has its own focus and strengths. No single vocabulary will cover all your metadata fields, so you will likely mix and match.

Finding and Using Vocabulary Terms

These vocabularies are accessible online. Most have their own websites or are available through linked data repositories like the Linked Open Vocabularies dataset. In your own institution, it can be helpful to include your chosen vocabularies in a processing guide or cataloging manual. You might even narrow down to preferred terms for geographic locations or institutional departments.

Hierarchies and Depth

Many vocabularies are structured hierarchically. AAT, for instance, allows you to use broad terms like furniture or very specific ones like side chair. This hierarchy supports both general and granular description, and can deepen catalogers' understanding of the domain in which they work.

Mixing Vocabularies

In practice, metadata creators often use more than one vocabulary in a single field. This helps improve discoverability and reflects the multidimensional nature of cultural heritage materials. For instance, a finding aid might include both AAT and LCSH terms for subject headings, or draw names from both LCNAF and ULAN for biographical entries.

Developing Local Taxonomies

Sometimes your institution needs to create its own taxonomy. Think about common inconsistencies you've encountered in your metadata: building names, library names, or event titles that are described differently across records. Establishing a local taxonomy helps resolve these issues. For example, standardizing the name of your library from Day Library to Dorothy Day Memorial Library and defining variants can help streamline your records.

To develop a local taxonomy, start by identifying the terms that need to be standardized. Produce a guide for catalogers and include it in your documentation. Your collections platform may support dropdown lists for these values. Consult with staff, volunteers, and community members to determine the most accurate and inclusive terminology. Even something as simple as asking alumni for their preferred names can make a difference. Define your terms clearly, list variants, and update your taxonomy regularly as your collections and users evolve.

Putting It All Together

Ultimately, controlled vocabularies help standardize language within and beyond your institution. They are essential tools for cataloging, discovery, and collaboration. Using them teaches domain knowledge, improves access, and creates a more organized and consistent description environment. Local knowledge matters too, and developing your own taxonomies ensures that your metadata reflects your institution's unique context.

Watch the Webinar

Want to learn more about how controlled vocabularies work in real-world metadata projects? Watch our recorded webinar on controlled vocabularies, where we cover examples, tools, and tips for implementing these systems in archives, libraries, and museums.

Genna Duplisea

Genna Duplisea is an archivist, writer, and historian attuned to the challenges facing small cultural heritage organizations and the value of these organizations to their communities.

After working in her college’s archives as an undergrad, she worked in higher education for a few years and then earned her Master of Science in Library and Information Science with a concentration in archival management and her Master of Arts in history at Simmons College (now Simmons University) in Boston. For a decade she has been a “lone arranger,” first managing a university archives as a solo archivist, and now working as part of a collections team in a museum. She specializes in project management, policy and workflow development, archival processing, digitization, and training students as the next generation of cultural heritage workers.

She currently serves on the Rhode Island Historical Records Advisory Board (RIHRAB), and previously was the president of New England Archivists. Additionally, she a member of the 2017 Archives Leadership Institute cohort.

Her professional and research interests center on archives labor, women’s and environmental history, and archives in Gothic fiction. As a founding member of Archivists Responding to Climate Change (Project ARCC), she is also interested in the intersection of archives, human rights, and climate change.

As part of the Backlog team, Genna contributes to our archival needs assessments, often designing workflows and making recommendations on archival organization and processing, collections care, and metadata standards. She has presented over a dozen webinars for Backlog, including the following:

Encoded Archival Description

Digitization Projects

Revolutions in 19th-Century Handwriting

Deciphering Handwriting and Print

Dublin Core for Omeka

https://www.linkedin.com/in/gennaduplisea/