Enhance Newspaper Digitization with METS/ALTO Standards

Enhance Newspaper Digitization with METS/ALTO Standards

December 20, 2024

We list the benefits that come with conforming to recognized metadata standards like METS and ALTO, as used by existing, well-established projects.

Recap: What is METS/ALTO?

  • METS Metadata Encoding and Transmission Standard: This is an XML schema maintained by the Library of Congress. It provides a framework for describing the structure of a digital object and showing how different types of metadata (technical, administrative, descriptive) relate to the files within it. In digitized newspapers, METS defines the hierarchy (issue → page → article) and links each level to the corresponding image and text files

  • ALTO Analyzed Layout and Text Object: This is also an XML schema maintained by the Library of Congress. It encodes the layout and content of digitized text, particularly the output of OCR (Optical Character Recognition). ALTO includes details such as word coordinates on a page, font styles, and the recognized text itself—making it essential for supporting online search functions

Related reading: METS/ALTO Metadata Standards Explained


The benefits of using METS/ALTO

  • The long-term sustainability of your digital objects is greatly enhanced. If METS/ALTO ever becomes obsolete it is certain that a suitable migration path will be developed for the many hundreds of millions of digitized pages already in this format, for large projects at the Library of Congress and elsewhere.

  • If it’s ever desirable to share content between projects it’s easy to do so, when those projects use the same open standards.

  • Projects using METS/ALTO benefit from the knowledge and tools created by/for other projects using the same standards.


In addition to the above METS/ALTO is simply a better, richer format for capturing digitized newspapers than any currently available alternative.

  • Not only does ALTO store the full-text content of each page and word, it also captures structural information like column, line, and word locations.

  • METS has the capability to support article segmentation, so articles, headlines, bylines, and other article-level metadata can be captured.

  • The combination of METS and ALTO captures very “rich” data, allowing the development of innovative discovery and delivery interfaces.

  • Both METS and ALTO are open XML standards — no proprietary software is required to read or transform the digitized objects.

Related reading