Why data conversion matters

High-quality scanning is only the first step. The true value of a digital collection comes from the data behind it.

Data conversion transforms digital images into structured information that can be searched, organised, filtered, and explored by users. It also creates the foundation for metadata, interoperability, and long-term collection growth.

For more than 20 years, Veridian has worked with libraries, archives, and cultural heritage organisations to convert complex historical materials into standards-based outputs that support a wide range of digital collections.

 

What data conversion delivers

High-resolution images (typically uncompressed TIFFs from the scanning phase) are converted into structured, standards-based formats that prepare collections to be accessed online.

Depending on project requirements, outputs can include, METS/ALTO metadata. PDFs and JPEG 2000 images

Standards that support discovery and long-term access

Data quality directly affects how usable and sustainable a digital collection will be. Where appropriate, we encourage the use of METS and ALTO XML standards, maintained by the Library of Congress. These widely adopted standards help collections remain interoperable, flexible, and easier to manage over time.

METS/ALTO helps collections:

  • Store searchable text at page and word level
  • Capture physical layout, including blocks, lines, and word positions
  • reserve logical structures such as articles, headlines, and bylines
The result is richer, more reliable digital collections that are easier to search, explore, and grow over time.
Data Conversion Service
Data_Convertion_Service

Data conversion options

We offer three data conversion approaches to suit different collection types, content structures, and user needs — from page-based publications to more complex, article-level experiences.

Automated Page-Level METS/ALTO

This fully automated option produces page-level METS/ALTO, making content full-text searchable and standards-compliant. Being automated, it is the most cost-effective choice for large collections where maintaining the article-level structure is not required.

Estimated cost: US$0.15 per page

Best suited for:

  • Page-based, text-heavy materials
  • Newspapers, magazines, journals, and reports
  • Collections with good-quality scans or microfilm
Option1-Data-Conversion

Page-Level METS/ALTO with Text Block Auditing

Text blocks are manually audited to reduce common OCR issues such as incorrect block ordering or mis-captured content, improving usability while maintaining a page-based structure..

Estimated cost: US$0.28 per page

Best suited for:

  • Page-based materials with more complex layouts
  • Newspapers and magazines with varied columns or dense content
  • Books and journals where improved reading order is important
Option2-Data-Conversion

METS/ALTO with Article Segmentation & Headline Cleanup

This option enhances the logical structure of the content, improving how users navigate and understand complex, multi-article pages. It requires human review and is therefore more resource-intensive.

Estimated cost: US$0.71 per page

Best suited for:

  • Newspapers and periodicals where content is organised into distinct articles
  • Magazines or journals where section-level navigation improves usability
  • Projects prioritising user experience and structured browsing
Option3-Data-Conversion

Quality assurance 

Quality assurance is a core part of our data conversion process. Before files are delivered, we validate outputs to identify any issues that could affect search performance, usability, or data integrity.

Our quality checks focus on structure, consistency, and completeness, helping ensure converted data meets agreed specifications and performs as expected in discovery and access environments.

This final review reduces downstream issues and helps ensure collections perform as expected once published online.

Cost of data conversion

Data conversion is typically priced per page and varies depending on content structure, the level of automation required, and the conversion approach selected.

Simpler page-level outputs are generally more cost-effective, while enhanced structural processing — such as article segmentation or audited text blocks — requires additional human review and therefore greater investment.

We'll help you balance cost, structure, and usability to determine the most appropriate approach for your collection and access goals.

Need help with data conversion
for your collection?