This article explains the three options available for data conversion using the Veridian service. 

We created this model so that digital collection owners can select a data conversion route that best suits the material to be digitized, and their budget.

Our service includes project management of the data conversion process, ensuring it runs smoothly and follows best practices. Good quality data using open standards will endure, even if you change content delivery systems in the future! 

The final step in the process is ingesting the data into the Veridian Quality Assurance (QA) view. This critical step will safeguard your data from errors which are often overlooked. 

You can read more about each part of the Veridian data conversion service below.

Choose from three data conversion options 

Once your newspaper or other collection material is in digital format (i.e., uncompressed TIFF), we take care of the rest of the data conversion process to generate METS/ALTO and other associated files (PDF and JPEG2000). There are three main types of data conversion we can offer:

CONVERSION OPTION 1. Automated page-level METS/ALTO - $0.15 USD per page

This is a great option if you are starting with:

  • Good quality images
  • An archive of original newspapers
  • Images scanned from good quality microfilm

It’s easy and cost effective for us to create a METS/ALTO collection similar to those found in Chronicling America. Compared to a PDF display, this is a better and more standards compliant option.

A screenshot of automated page-level METS/ALTO

Pictured: An automated page-level METS/ALTO collection

CONVERSION OPTION 2. Page-level METS/ALTO with Textblocks audited - $0.28 USD per page

As opposed to the fully automated process of option 1, our partner engineers will manually audit the TextBlocks to avoid errors such as “incorrect block ordering” and “incorrect capturing”. You can read more about these errors here.

A screenshot of page-level METS/ALTO with text blocks audited

Pictured: A page-level METS/ALTO collection with TextBlocks audited

CONVERSION OPTION 3. METS/ALTO with Article Segmentation and Article Headline Cleanup - $0.71 USD per page 

Since both the article segmentation and headline cleanup process cannot be fully automated and require human resources, there is a higher cost involved compared to page-level METS/ALTO. It does however improve the document browsing structure for the end user.

A screenshot of METS/ALTO with Article Segmentation and Article Headline Cleanup

Pictured: A METS/ALTO collection with Article Segmentation and Article Headline Cleanup

Data quality is key

The quality of data within a digital collection is incredibly important, so part of the Veridian service is ensuring your data is digitized and processed well from the outset. 

We encourage our clients to use the METS and ALTO XML standards maintained by the Library of Congress. Following best practice industry standards allows your collection material to be preserved more comprehensively and ensures long term sustainability.

METS/ALTO has the capability to store the full-text content of each page and word, it captures physical structural information like block, line, and word locations, and it may optionally support logical structure (i.e., article segmentation), so articles, headlines, bylines, and other article-level metadata is recognized.

METS/ALTO collection content will also be full-text searchable and more accessible by your users. And that’s ultimately what we are after - rich and stable digital collections that can be engaged with and enjoyed by a wide audience into the future. 

Quality Assurance (QA)

As the final part of the Veridian digitization process, we ingest the output data into Veridian and use Veridian's QA mode to safeguard your data from many widespread errors often often overlooked if you are not an expert in this area (Read more on ALTO hidden errors).

In order to avoid these “hidden errors”, our solution is to make them visible so they can be picked up as part of the data conversion and QA process. This solution has helped us identify many widespread quality issues and has been a very useful tool to raise the quality of page-level METS/ALTO. 

The Veridian QA view (the ability to visualize ALTO XML) allows Veridian collection administrators to inspect the underlying ALTO data once it is ingested into Veridian. This can help all of those using Veridian to communicate better with their data conversion vendors to produce better quality page-level METS/ALTO.

Questions?

Please feel free to contact us with any questions about the Veridian data conversion service. 

We can offer advice to suit the specific needs of your collection and if needed, can put together a cost proposal or demo collection site for you to share with your stakeholders.