Archiving Research Data
Table of contents
Why Archiving?
Long-term archiving of research data is a requirement for transparency and traceability of scientific results. Both the DFG (Safeguarding good scientific practice) and the TU Dresden (Safeguarding good scientific practice) require the storing of research data for at least 10 years. The publication of the data also allows re-use and viewing the data under a new scientific context.
When archiving data or several versions of data, the data is permanently and safeley available for a certain period of time on memory. The original files can be deleted afterwards.
Archiving at ZIH
ZIH currently offers two systems for archiving:
- The intermediate archive for the realm of high performance computers used for archiving data that are to be kept within the framework of project durations.
- The long-term archive is used for long-term retention of data (e.g. for the purpose of the TUD 10-years-policy). It is offered to all university members.
In preparation, there is a long-term archive with attached repository, which will allow the data to be released and searched in the archives with metadata. A list of specialized data repositories can be found on this website: re3data.org
Intermediate Archive
The intermediate archive is a file system on high performance computers. Data can be stored during the lifetime of a project and archived during this project period. It is under the name "/archiv" on the visible-system taurusexport. Every user who has HRSK resources has a file-system ("/archiv/[login]"), in which the data to be archived can be easily copied into its own directory. The files are automatically stored onto magnetic tapes and safely stored as multiple copies in distant locations.
The datamover tools can be used in this directory to copy within the high performance computers. Files outside the high performance computer can be brought to the high performance computers by using Linux with SFTP after using taurusexport ("sftp meinedaten.tar.gz login@taurusexport.hrsk.tu-dresden.de:/archiv/login/"). For example, with Windows, WinSCP can be used. A login to the high performance computers is required. For forther guidance concerning the use of intermediate archives, please see below.
Long Term archive
For long-term archiving, data is saved onto magnetic tapes and safely stored in multiple copies in distant locations. If long-term archiving is necessary, or if you are interested in long-term archiving please contact our Service Desk. We will gladly assist you with the use of the archives. For more information on using the long-term archive, please see below.
Important instructions for archiving
- It is difficult to archive lots of individual data. Please compress files that beloing together to be archived. This compressing ensures that you can efficiently access the number of archived files and keeps storage manageable. Compressing indvidual files to archive may be done with common tools such as "tar", "gzip", "bzip2", "p7zip", etc.
- The compressed files to archive should generally be no greater than 500 GB. This number is based on the experience of handling the archives. It is a goal for efficient retrieving of files from the archive.
- Archives are used for permanent storage of data. An update of the files in the archive is possible, but should be avoided. Therefore it should be thoroughly considered which version of the data is to be used for archiving, before actually archiving.