At Veridian, we're proud to offer complete services in building digital collections, managing large digitisation projects, providing customised versions of our presentation software to serve web-based digital collections, and digital collection hosting services. When backing up your hosted digital collections, our utmost priority is to ensure the safety and integrity of your data. This is achieved via a multi-faceted backup and recovery strategy, allowing us to handle everything from minor issues to significant crises (in the unlikely event they should occur). This article will provide you with an overview of the principles guiding our backup and recovery system.
Robust and redundant data preservation
Our data preservation strategy is comprehensive and geographically diverse. Not only do we maintain the original source data batches, typically consisting of images, METS/ALTO, and PDFs, but we also back up all user-generated data. This includes text corrections, comments, tags, and user account information. By regularly checking and verifying data, and backing it up to redundant facilities located in various regions, we ensure your digital collections' safety and accessibility.
BagIt and checksums: Guarding data integrity
One crucial component of our backup system is BagIt. As previously explained in our article about BagIt's role in data preservation, we utilise this file packaging format to maintain data integrity and completeness. Every day, our system automatically validates all underlying source data batches using BagIt checksums. Additionally, we compare the live data and backup data daily, ensuring they remain identical.
Daily snapshots: The power to 'rewind'
All of this data - the source data batches, live machine volumes (including the installed Veridian software), and user data - is backed up daily via snapshots. These snapshots are retained for 90 days, providing us with the power to “rewind” the state of each hosted collection to any point within the past three months if necessary.
Fast disaster recovery
In a disaster scenario, aside from safeguarding the data during recovery, timeliness is also important. If a complete loss of a live collection occurs, we would expect to be able to restore a hosted collection within 24 hours.
Geographic redundancy: Extra layer of safety
In order to further ensure the safety and integrity of your data, live machines and data batches are hosted in one geographic location, whereas a complete off-site backup of all our hosted source data and live collection data is hosted in another entirely separate geographic location. This geographic redundancy adds an extra layer of safety, ensuring that your data remains secure and accessible.
'11 nines of data durability'
Our cloud storage solution is designed with '11 nines of data durability'. This 99.999999999% durability implies an extremely high level of reliability. This design means that we only expect to lose one object out of every 10 billion objects stored over a year.
The key to achieving this exceptional level of data durability is redundancy: data is replicated across multiple locations. Thus, even if one copy of the data fails, multiple other redundant copies ensure that data is not lost. In other words, for each data batch, multiple redundant copies exist in each separate geographic location.
The likelihood of all redundant copies of a batch failing in all geographic locations simultaneously is extraordinarily low, making this a robust safety measure.
When you host with Veridian, we use very robust systems and processes to protect your data. With our redundancy and carefully designed backup verification processes, we make sure your data remains safe and accessible at all times.