Metadata 101: Structure, Standards, and Why It Shapes Access
Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. That definition, from the National Information Standards Organization, is widely cited because it captures both the technical and practical dimensions of metadata. It is not simply description. It is structured description, created intentionally, within a defined framework.
Metadata does not occur naturally. It is constructed. It is created by people who decide what information to record, how to record it, and how that information should function within a system. It is constructive in that it serves a purpose. And it is actionable, meaning it can be processed, searched, filtered, and transformed by both humans and machines.
Understanding metadata begins with understanding what it describes.
Data Versus Metadata
Metadata is often summarized as data about data. The distinction becomes clearer with examples.
If you have a photograph, the image itself is the data. The file name, date created, photographer, location, and format are metadata. If you have a book, the text is the data. The author, title, publisher, publication date, and subject headings in the catalog record are metadata. If you have a digital audio file, the recording is the data. The duration, file format, creator, and rights information are metadata.
Metadata defines the boundaries of the thing being described. Are you describing the physical object, the digital surrogate, or the intellectual content? Each choice produces different metadata. A photograph of a building is not the same as the building itself. A scan of a letter is not the same as the original letter. Clarity at this level shapes everything that follows.
The Core Functions of Metadata
Metadata supports four primary functions in information environments: discoverability, consistency, interoperability, and long term preservation.
Discoverability is the most visible function. Titles, creators, subjects, and dates allow users to search and retrieve items. Without descriptive metadata, collections remain effectively hidden.
Consistency supports both staff workflows and user experience. When names, dates, and formats are recorded consistently, search results behave predictably. Inconsistent metadata creates ambiguity and undermines trust in the system.
Interoperability allows metadata created in one system to function in another. Shared repositories, collaborative platforms, and aggregated search systems depend on standardized metadata practices. When institutions follow shared standards, their records can be combined, compared, and harvested without loss of meaning.
Long term preservation depends on metadata that documents context, rights, and technical characteristics. File format information, creation dates, rights statements, and administrative history provide the information necessary to manage digital materials over time. Without preservation metadata, digital objects quickly lose essential context.
Types of Metadata
Metadata is commonly grouped into three categories: descriptive, structural, and administrative.
Descriptive metadata identifies and characterizes a resource. It includes fields such as title, creator, subject, abstract, and date of creation. Its primary function is to support discovery and identification.
Structural metadata describes how a resource is organized or how its components relate to one another. For a book, this may include page count, chapter structure, or volume information. For a digital object, it may include file relationships or sequence. In archival contexts, structural metadata supports hierarchical relationships between collections, series, folders, and items.
Administrative metadata supports management and control. It includes technical information such as file format and size, rights information, preservation notes, and metadata creation dates. Administrative metadata often overlaps with preservation metadata, which documents the information required to maintain and protect digital objects over time.
These categories are conceptual tools. In practice, a single metadata field may serve more than one function.
Standards and Schemas
Metadata is not created in a vacuum. It is structured according to standards and schemas.
A standard, often called a schema, defines the fields used to describe a resource and establishes rules for how those fields function. Different domains use different schemas because their descriptive needs vary.
Dublin Core is a general purpose schema widely used in digital collections. It includes core elements such as title, creator, subject, description, publisher, date, format, and identifier. MARC is a bibliographic standard developed for library catalogs. MODS builds on bibliographic description in a more flexible format. Darwin Core is designed for scientific data. The Federal Geographic Data Committee standard supports geospatial information. The Visual Resources Association standard addresses images and visual culture. Describing Archives: A Content Standard provides rules for archival description.
Selecting a schema is not about choosing the most comprehensive option. It is about choosing a framework that aligns with the materials, the users, and the level of detail required.
Registries and Controlled Vocabularies
A metadata registry documents schemas and defines the meaning of each element. If a field is labeled date, the registry clarifies whether that date refers to creation, publication, modification, or something else. Registries also specify formatting guidance, such as recommended date standards.
Controlled vocabularies further enhance consistency and interoperability. They establish standardized terms for subjects, names, and other fields. By selecting terms from a controlled vocabulary, institutions reduce ambiguity and improve search performance. Controlled vocabularies also support data exchange between systems by reducing semantic variation.
Application Profiles and Local Practice
Even when institutions adopt established schemas, they must determine how to implement them locally. An application profile documents how an organization uses a standard in practice. It defines required and optional fields, establishes formatting rules, and clarifies how certain elements will be interpreted.
For example, an application profile may specify that every item must include a title and a creator, that dates follow a specific international standard, and that subject terms be drawn from a designated controlled vocabulary. This documentation supports internal consistency and provides continuity as staff change.
When institutions use multiple systems, crosswalks map fields between schemas. A crosswalk aligns terminology and structure across platforms, enabling data to move between systems without losing meaning. Crosswalks are essential in environments where digital asset management systems, archival management systems, and public discovery layers coexist.
Repositories and Interoperability
A repository stores both data and metadata. In digital environments, repositories often ingest content from multiple sources. Interoperability depends on the ability of those sources to provide metadata structured according to shared expectations.
Large collaborative repositories illustrate this clearly. When multiple institutions contribute digitized materials, their metadata must align with agreed standards. If one institution omits language information while others include it, or if date formats differ significantly, search and filtering functions become unreliable.
Interoperability is achieved through shared schemas, consistent formatting, and clearly defined application profiles. Without these elements, aggregation becomes fragmented.
Syntax, Semantics, and Machine Readability
Metadata operates at two interconnected levels: semantics and syntax.
Semantics concerns meaning. It defines what a field represents and how its content should be interpreted. Syntax concerns structure and format. It determines how metadata is encoded and transmitted.
Extensible Markup Language, commonly known as XML, is a widely used syntax for representing metadata in machine-readable form. XML structures metadata elements within clearly defined tags, allowing systems to exchange information in a standardized format. Protocols such as the Open Archives Initiative Protocol for Metadata Harvesting rely on structured, machine-readable metadata to collect and aggregate records across repositories.
The semantic rules established by schemas and the syntactic structure provided by encoding standards work together. Meaning without structure cannot travel between systems. Structure without meaning produces empty records.
Metadata as Ongoing Work
Metadata is not a one-time activity. It evolves as standards change, technologies advance, and user expectations shift. Administrative metadata records when metadata itself was created or modified, acknowledging that descriptive work has a history.
At its most effective, metadata provides clarity. It defines what an item is, how it is organized, how it may be used, and how it connects to other information. It allows institutions to manage complexity at scale. It supports access today and preservation tomorrow.
In every archive, library, and digital collection, metadata is infrastructure. It shapes what can be found, how it is understood, and whether it will remain accessible in the future.
Want to learn more? Watch our webinar on metadata here: