Due to the nature of our work, clients will often need to transfer large numbers of files to our team, whether it's to upload source images to be digitised, or data to be ingested into a Veridian collection such as METS/ALTO files or born-digital PDFs.
While there are several ways that clients like to transfer data to us, including Google Drive, Dropbox or other file transfer tools, we’ve found the best way to transfer data is via S3. S3 is a secure object storage service offered by Amazon Web Services (AWS), thus an S3 bucket is a cloud storage space within S3.
Here we will explain how to upload data to an S3 bucket via DragonDisk (an S3 compatible client) on Windows using the following sample information:
- S3 Bucket: s3://upload.4225.dlconsulting.com
- Remote Path: /upload.4225.dlconsulting.com
- Username/Access Key ID: AccessKeyId (will be provided to you)
- Pass Phrase/Access Key Secret: SecretAccessKey (will be provided to you)
Note that as you work through the process, there are two methods of upload described, copy and synchronisation. While both work well, synchronisation has some important advantages:
- when uploading large amounts of data, if the connection is interrupted there is no need to start from scratch. Instead, synchronisation will resume the upload of data where it was interrupted, potentially saving a significant amount of time.
- synchronisation also provides the option of excluding the upload of certain files, e.g. if large archival TIFFs need to be excluded from the upload.
Download and install a DragonDisk installer appropriate for the operating system on your computer.
Setting up an account in DragonDisk
Launch DragonDisk and from the 'File' menu, choose 'Accounts...'
From the 'Accounts' dialog, click on 'New' and enter the credentials:
Account Name: Veridian-Sample-S3bucket (Any name that makes sense to you).
Access Key: Enter the AccessKeyID provided to you.
Secret Key: Enter the SecretAccessKey provided to you.
You should see the new account entry on the 'Accounts' dialog with a greenlight on the left hand side. If this is the case, click 'Close' to return to the main DragonDisk window.
Method 1: Copying data to the S3 bucket
Setting up the remote path: From the main DragonDisk interface click on the right hand side 'Root' dropdown and select the correct 'S3 bucket' (e.g. upload.4225.dlconsulting.com, the name of the bucket will be provided to you as part of your credentials).
You should now see the contents of the bucket, which may only contain a *.details file.
Setting up the local path: You now need to know the path to the local folder (on your computer) where the data you intend to upload is stored. Use e.g. Windows File Explorer to determine the path to the correct local folder.
From the main DragonDisk interface click on the left hand side 'Root' dropdown and select the correct local drive (on your computer), then choose the correct folder containing the data to upload.
Select correct local drive: e.g. D:/
Select correct local folder: e.g. D:/Upload
Copy the data to the S3 bucket: Once you reach this point, you should be able to simply drag the data folder over to the right pane and the uploading process should start. You should see the upload progress in DragonDisk's bottom pane.
Method 2: Synchronising data to the S3 bucket
When uploading large amounts of data, using the synchronisation feature can prevent the need to restart a large upload from the beginning if the connection between your local computer and S3 is interrupted, potentially saving a lot of time.
Set up a synchronisation job: First you will need to create a folder to sync to on the S3 bucket. Create a folder in the S3 bucket by:
Right clicking in the right pane and selecting 'Create folder' ...
... then naming the folder e.g. 'Batch1-2022-10-06'
From the 'Synchronization' menu, click 'Manage Sync Jobs'.
From the Manage Sync Jobs dialog box, click 'Add', then in the 'Synchronization job' dialog:
- Enter a name for the job: e.g. Batch 1 upload
- Select a source folder on your local computer. e.g. D:/Upload/Sample-Data/Batch1-2022-10-06
- Select the target folder you created: e.g. s3://upload.4225.dlconsulting.com/Batch1-2022-10-06/
- Under the 'Options' tab -> ensure all options are unchecked.
- Optional: If you need to exclude certain files from being uploaded (e.g. archival TIFFs), you can accomplish that by adding an exclude filter (e.g. *.tif) via the 'Filters' tab -> 'Exclude' -> 'Add'.
- Click 'OK'.
- Click 'Close'.
Now to initiate the sync job, from the 'Synchronization menu' click the job you created, e.g. 'Batch 1 upload'.
A confirmation dialog will appear. Review the source and target paths and if they are correct, click 'Yes'.
Your synchronisation job will now run, uploading the data you specified.
You can view progress in the bottom pane.
If the internet connection between your computer and S3 is interrupted during this process, you can re-run the sync job which will only re-upload data which hasn't already been uploaded, potentially saving a lot of time.
Once you are happy that your files are copied or sync'd, you can email your DL Consulting contact to let us know the transfer is complete, and we will continue with the process of dealing with the data at our end.