A New Way to Search Digital Collections: Introducing RAG

In today’s world of Google, ChatGPT, and intelligent assistants, users expect smart, intuitive search experiences. Those searching the digital historic collections of libraries, cultural and heritage organizations and educational institutions are no exception.

That’s why here at Veridian we’ve been investing in developing a powerful new search capability: Retrieval-Augmented Generation (RAG). While the name may sound technical, it has huge potential to make digital collections more accessible, discoverable, and engaging.

RAG? Wait, what?

At its core, RAG is an artificial intelligence (AI) model that merges two capabilities:

  • Retrieval: It uses a semantic search to find and pull the most relevant content from large databases—such as your digitized newspaper or archive collections—based on a user’s search query.

  • Generation: It then uses a large language model (LLM) to generate a response—starting with links to the primary source documents it’s based on, followed by a human-like summary grounded in your collection’s content. Users can also ask follow-up questions to explore a topic further. This approach helps surface relevant material that might otherwise remain buried—guiding users to the content that matters most to them.

In short, RAG allows users to pose a question to a collection in plain English—and receive a clear, plain-English answer in return. Just as importantly, it points them directly to the source documents behind that summary—encouraging deeper discovery and interaction with the collection itself.


Read more: Keyword vs. Semantic Search

An important note on how RAG handles collection data

The LLM in RAG doesn’t learn from or train itself on content hosted in a collection. Instead, it works in real time:

  • It generates answers only using the specific content retrieved from a collection in response to a user’s query.

  • It ensures transparency and makes it easier for users to engage directly with the original source content (and the collection)

  • And it does not fall back on general internet knowledge (as ChatGPT might if asked the same question direct

These distinctions are important—as RAG preserves the integrity, ownership, and copyright protections of a digital collection.

What does RAG search look like? 

RAG can transform digital collections to behave more like a knowledgeable guide than a static database. For example:

A genealogist can search: What was the Impact of the 1918 New Zealand influenza pandemic? and receive a concise, AI-generated summary based on multiple newspaper articlesalong with links to view the original sources.

1. Relevant content is retrieved with links back to the original content

Search results linked to source content

 

2. Human-like responses can be generated, helping users quickly make sense of the information retrieved and easily explore related topics (again with links back to the source content)

Generated human-like response

RAG isn't just about summarising information—it’s about guiding users to the original content that best matches their needs, especially in large or complex collections where valuable items can be hard to find. This makes digital collections more discoverable, more accessible, and easier to navigate—whether you're a seasoned researcher or exploring for the first time.

 

Please contact us if you have any questions or would like to discuss this topic further–we're here to help.