Data Conversion Service
Why data conversion matters
High-quality scanning is only the first step. The true value of a digital collection comes from the data behind it.
Data conversion transforms digital images into structured information that can be searched, organised, filtered, and explored by users. It also creates the foundation for metadata, interoperability, and long-term collection growth.
For more than 20 years, Veridian has worked with libraries, archives, and cultural heritage organisations to convert complex historical materials into standards-based outputs that support a wide range of digital collections.
What data conversion delivers
High-resolution images (typically uncompressed TIFFs from the scanning phase) are converted into structured, standards-based formats that prepare collections to be accessed online.
Depending on project requirements, outputs can include, METS/ALTO metadata. PDFs and JPEG 2000 images
Standards that support discovery and long-term access
Data quality directly affects how usable and sustainable a digital collection will be. Where appropriate, we encourage the use of METS and ALTO XML standards, maintained by the Library of Congress. These widely adopted standards help collections remain interoperable, flexible, and easier to manage over time.
METS/ALTO helps collections:
- Store searchable text at page and word level
- Capture physical layout, including blocks, lines, and word positions
- reserve logical structures such as articles, headlines, and bylines

Data conversion options
We offer three data conversion approaches to suit different collection types, content structures, and user needs — from page-based publications to more complex, article-level experiences.
Automated Page-Level METS/ALTO
This fully automated option produces page-level METS/ALTO, making content full-text searchable and standards-compliant. Being automated, it is the most cost-effective choice for large collections where maintaining the article-level structure is not required.
Estimated cost: US$0.15 per page
Best suited for:
- Page-based, text-heavy materials
- Newspapers, magazines, journals, and reports
- Collections with good-quality scans or microfilm
Page-Level METS/ALTO with Text Block Auditing
Text blocks are manually audited to reduce common OCR issues such as incorrect block ordering or mis-captured content, improving usability while maintaining a page-based structure..
Estimated cost: US$0.28 per page
Best suited for:
- Page-based materials with more complex layouts
- Newspapers and magazines with varied columns or dense content
- Books and journals where improved reading order is important
METS/ALTO with Article Segmentation & Headline Cleanup
This option enhances the logical structure of the content, improving how users navigate and understand complex, multi-article pages. It requires human review and is therefore more resource-intensive.
Estimated cost: US$0.71 per page
Best suited for:
- Newspapers and periodicals where content is organised into distinct articles
- Magazines or journals where section-level navigation improves usability
- Projects prioritising user experience and structured browsing
Quality assurance
Quality assurance is a core part of our data conversion process. Before files are delivered, we validate outputs to identify any issues that could affect search performance, usability, or data integrity.
Our quality checks focus on structure, consistency, and completeness, helping ensure converted data meets agreed specifications and performs as expected in discovery and access environments.
This final review reduces downstream issues and helps ensure collections perform as expected once published online.
Cost of data conversion
Data conversion is typically priced per page and varies depending on content structure, the level of automation required, and the conversion approach selected.
Simpler page-level outputs are generally more cost-effective, while enhanced structural processing — such as article segmentation or audited text blocks — requires additional human review and therefore greater investment.
We'll help you balance cost, structure, and usability to determine the most appropriate approach for your collection and access goals.