In 1878 the Linderman Library was opened on the Lehigh University Campus in Bethlehem, Pennsylvania. This stunning building is a fitting home for Lehigh’s Special Collections which preserves and provides access to historical collections of rare books, manuscripts and the Lehigh University digital archives.
Amongst the digitized materials is a collection of 6,548 issues of Lehigh’s student newspaper - The Brown and White, which was founded in 1894.
In recent years the newspaper has been presented online via a self-hosted CONTENTdm instance.
Because OCLC was ceasing support for this version of the platform and charging significantly more for their hosted solution, the Lehigh digital archives team sought out a new solution.
From CONTENTdm to Veridian
In 2019 the decision was made to migrate The Brown and White newspaper collection to Veridian, which would refine the collection’s directory structure and also provide significant improvements to its appearance, usability and scalability.
It was very important to the University that the article segmentation they had previously invested in would be preserved during the migration project.
The investment in article segmentation is a big one, and the enhancements it can bring to a digital collection can be just as great. By identifying, segmenting, and categorizing individual newspaper articles as they appear on the page, much richer search options are available.
Lehigh’s Digital Archivist Alex Japha explains:
“Once the Lehigh Libraries decided to migrate away from our legacy CONTENTdm repository, it provided an opportunity for us to reassess how our student newspaper could be searched and used. Most important to us was the preservation of article segmentation. The metadata that enables this feature was time and labor intensive to create when the newspaper was first digitized over a decade ago. Very few software products allow for the searching and viewing of article level data, so this was a distinct advantage of Veridian.”
In the past we had migrated CONTENTdm collections to Veridian where the articles had not been defined (page level segmentation), but the Lehigh University project presented an opportunity for us to convert an article segmented newspaper collection from CONTENTdm for the first time.
The project scope
During the migration project, Veridian engineers were tasked with converting 60,827 items (208.11GB) of CONTENTdm data and loading it onto a customized Veridian presentation system which would visually represent the Lehigh University brand.
We were also engaged to provide annual support and maintenance to keep the collection up to date, running smoothly and ready for future growth.
The data conversion process
The first step was for the university’s digital archivist to upload the CONTENTdm data to a server where our Veridian engineers could access it. Because of the very large volume of data we worked on this same server to avoid duplication and keep the data safe.
Next, we wrote a script to convert the CONTENTdm data to METS/ALTO - the industry standard format for newspaper digitization.
Using this automated process ensured all of the CONTENTdm data was transferred safely without any loss. Key elements such as article segmentation, word location, article headlines and so on were all maintained.
The conversion process also organised the directory structure neatly into publication and publication date which was lacking in the data supplied. The new data could now be navigated much more easily within the file system, with an intuitive naming scheme which is better for preservation.
Then we set up a Veridian staging site and ingested the new data ready for QA (Quality Assurance) by the library.
A fresh new look
We then built a new online user interface for The Brown and White. The design was customized by integrating the Lehigh University logo and colours into the design, and displaying a historic image of the campus buildings as the background.
To further provide context for the collection, the front page of the interface has an ‘About the Collection’ section and a ‘History’ tab gives a more detailed background of the newspaper’s evolution.
New features to engage an audience
Searching and browsing
To enhance the usability of The Brown and White collection we added Veridian features such as full text searching.
Search result snippets give context to each result, displaying a few lines of surrounding text to help determine relevance.
Search facets and filters can further refine these results by categories such as article, page and advertisement or decade. Researchers may also browse the collection by date or by user defined tags.
“The interface for desktop and mobile browsers offers our users an easy and powerful portal to a large, otherwise difficult to search collection. In particular, the text and image search previews allow for more efficient searches without requiring users to browse through irrelevant material.” says Alex.
Collection users are able to register with the collection and set up an account to manage their research. Once registered they can opt to record their search and viewing history, share content with others, or create a private list of material to come back to. Users can also comment on or tag items within the collection.
Veridian’s unique User Text Correction (UTC) module allows users to correct OCR errors as they come across them in the text. These corrections not only serve to improve search results, they also help to build a virtual community of engaged users.
We’ve learnt that traffic to a collection is increased with the UTC feature, as it encourages more frequent and longer visits.
Search Engine Optimization (SEO)
Finally we set up SEO (Search Engine Optimization) to maximize the collection’s exposure and Google Analytics to help analyze visitor information and generate traffic reports.
Ready for collection growth
Because Veridian was initially designed for very large newspaper digitization projects, the ability to scale well is another key benefit of the platform.
The Brown and White digital collection currently contains 6,548 issues comprising 60,354 pages and 260,376 articles. On the Veridian platform this collection could safely grow 100 times bigger with no loss of performance.
Support and Maintenance
Now that The Brown and White digital collection is live we will upgrade it regularly so that it will never become obsolete. Our team will ingest new data as it becomes available and provide troubleshooting and support whenever it is needed.
Lehigh’s migration project from CONTENTdm to Veridian took less than five months to complete. The Veridian team were delighted to help to secure the future preservation of such an important student archive.
We’ll give the final word to Lehigh’s Digital Archivist Alex Japha:
“While migration processes can often be complicated when moving from proprietary software, the Veridian team was able to rapidly convert the data and metadata and into a standardized open format.
By transitioning to the METS/ALTO format, Lehigh is confident in the long-term preservation and usability of this important institutional history collection into the future. We are excited to introduce this new platform to students, faculty, staff, alumni, and beyond.”