Long-term storage and Archiving

The Bucket storage and the Naruto tape system together form our secure long-term research storage infrastructure. It is used for live storage of research data; for off-site backups; and for long-term archiving of data sets that are no longer needed in day-to-day use.

Bucket

Bucket is our large-capacity live network file storage system. You can access it from the HPC clusters, from research instruments via HPAcquire, and from personal computers through SSH and mounted as a remote disk.

Bucket
Total capacity 12 PB
Per-unit allocation 50 TB (expandable)
Back-up weekly to off-site tape storage

Each unit gets a 50TB storage allocation by default, and this can be increased if needed.

Incremental Backups are done weekly to off-site storage. No data placed on Bucket is lost even if subsequently deleted, as long as it was present during a weekly backup.

For ways to access Bucket, please see our page on transferring data.

Naruto

Naruto is the OIST tape storage system. It is located off-site in Nago, so even if a disaster strikes the OIST datacenter the data on Naruto is still safe.

Naruto
Total capacity ~7.5 PB
Tapes ~1000 (number and capacity varies)

The tape system is an IBM TS3500 tape robot system with around 1000 tapes and a total capacity in excess of 7.5 PB. The system manages all tape operations, automatically verifies tapes on a regular basis and manages data duplication and migration as needed. We add tape to the system as needed to ensure there is sufficient capacity for both backups and archives.

Archiving

Our archiving service will securely store any data you no longer need access to. OIST research policies specify that reserch data belongs to OIST, and according to Japanese law such data must be archived for future reference.

If you have data sets that are no longer used, you are not allowed to throw it away. Instead, you can archive it on to tape. This will free up storage for your unit, and help you reduce clutter.

The procedure is straightforward:

  1. Tell us that directories on Bucket you want to archive.
  2. We archive the data. Depending on the size of the data and how busy the tape system (and we) are, this can take several days or more.
  3. When we tell you it has been archived, you can delete the directory and all the original data on bucket to free up the space.

Restoring data. The purpose of the archive service is to permanently store data that will likely never be used again. If you think you may want to refer to it in the future you should keep the data on Bucket.

In exceptional cases you may need to access the data again. You can ask us to load the archive for you, but this may take several days to weeks, and you need to free up enough storage for the data to fit.

Use Cases

The most common use-cases for the archiving service are:

  • Archive former unit members' data. When a member leaves your unit, they often leave behind a trove of data - experimental data, program source code, analysis pipelines, paper drafts and so on. As this is owned by OIST it can't be deleted; by archiving it you can free up the space.

  • Archive former projects. A research project is not forever. Once the results has all been analysed and the papers all written you rarely need to access that data again. When you archive it on tape you can clear it away.

Archiving or Backup?

As we wrote earlier, Bucket is fully backed-up. All data stored there is saved on tape already. What is the benefit of archiving the data, rather than just let it be backed up?

The main difference is that backups are not kept forever. If you delete something on Bucket it may eventually (in months to years) be gone from the backups as well.

But also, your archived data is organized into a distinct archive with all related data collected into one place. Should the data ever be needed again it is possible to ask for that specific archive, though it may take a long time to actually restore it.

Backed-up data, on the other hand, is not meant to be restored piecemeal. Back-ups are primarily a safeguard against catastrophic loss of the entire storage system. If you want to restore data from a back-up you need to know the time and the actual file name and path of every file and directory you want to restore. The restoration process may take many days, and if the data was deleted a long time ago it might no longer be available.

If you are unsure on how to manage your data, please contact us for a discussion and advice.