DMP linkable icons
DMP-7: Data preservation
Data will be protected from loss and preserved for future use; preservation planning will be for the long term and include guidelines for loss prevention, retention schedules, and disposal or transfer procedures.
Related terms: Archive, Digital Migration, Long Term, Long Term Preservation, Succession Plan.
Explanation of the principle
Data are a valuable asset for reuse and underpin the scholarly record. The preservation of data in digital format requires certain actions to be performed: this includes such things as preservation planning, scheduled transformation of file-type to avoid obsolescence and plans for asset transfer in the eventuality that the repository is obliged to close. These actions are detailed in the Reference Model for an Open Archival Information System (OAIS). Repositories which through their mission, organisational setup and business processes are able to fulfill these actions in a sustainable way qualify as Trusted Digital Repositories (TDRs).
- Has an explicit mission to provide access to and preserve data in its domain or in accordance with a stated collection policy;
- Has a continuity plan ensuring ongoing access and preservation of holdings;
- Assumes responsibility for long-term preservation and manages this function in a planned and documented way; and
- The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.
Guidance on Implementation, with Examples
Data contributed to GEOSS should be preserved for the long term and protected from loss for future use in trusted digital repositories (TDRs). Each requirement above is accompanied by guidance text— as part of the certification criteria for the Data Seal of Approval, the ICSU World Data System (WDS Certification criteria and guidance) or the joint DSA-WDS Criteria (DSA–WDS Partnership WG Catalogue of Common Requirements) currently under development—to assist GEOSS data contributors to conduct a self-assessment.
The guidance below indicates the types of evidence required to certify the trustworthiness of a data repository.
- TDRs are responsible for stewardship of digital objects, ensuring that they are stored in an appropriate environment for required durations and that the holdings are accessible and available, both currently and in the future. Depositors and users must understand that preservation of, and continued access to, the data is an explicit role of the repository;
- The repository, data depositors, and Designated Community need to understand the level of responsibility required for each deposited item in the repository. The repository must have the legal authority to complete their responsibilities and must document procedures to assure their completion; and
- Repositories must ensure that data can be understood and used effectively into the future despite changes in technology. This Requirement evaluates the measures taken to ensure that data are reusable.
Metrics to measure level of adherence to the principle
Recommended compliance levels for each of the requirements in the section above:
0 -- Not applicable;
1 -- The repository has not considered this yet;
2 -- The repository has a theoretical concept;
3 -- The repository is in the implementation phase; and
4 -- The guideline has been fully implemented in the repository.
Recommended metrics for the evaluation of a trustworthy data repository:
- Explicit statements of the long-term preservation role within the organization’s mission, with approval by the governing authority.
* Continuity of access:
- The level of responsibility undertaken for data holdings, including any guaranteed preservation periods.
- Medium-term (3-5-years) and long-term (> 5 years) plans ensure continued availability and accessibility of the data. Descriptions of contingency plans and responses to rapid changes of circumstance and long-term planning indicate options for relocation or transition of activities to another body or return of data holdings to their owners (i.e., data producers). For example, what will happen in the case of cessation or withdrawal of funding, a planned ending of funding for a time-limited project repository, or a shift of host institution interests?
* Organizational infrastructure:
- The repository is hosted by a recognized institution (ensuring long-term stability and sustainability) appropriate to its Designated Community; and
- The repository has sufficient funding, including staff resources, IT resources, and a budget for attending meetings when necessary. Ideally this should be for a three- to five-year period.
- What is the repository’s approach if the metadata provided are insufficient for longterm preservation?
* Documented storage procedures:
- How is data storage addressed by the preservation policy?
- Does the repository have a strategy for redundant copies? If so, what is it?
- Are data recovery provisions in place? What are they?
- Are risk management techniques used to inform the strategy?
- What checks are in place to ensure consistency across archival copies?
- How is deterioration of storage media handled and monitored?
* Preservation plan:
- Is the ‘preservation level’ for each item understood? How is this defined?
- Does the contract between depositor and repository provide for all actions necessary to meet the responsibilities?
- Is the transfer of custody and responsibility handover clear to the depositor and repository?
- Does the repository have the rights to copy, transform, and store the items, as well as provide access to them?
- Is a preservation plan in place?
- Are actions relevant to preservation specified in documentation, including custody transfer, submission information standards, and archival information standards?
- Are there measures to ensure these actions are taken?
* Data reuse:
- Are plans related to future migrations in place?
- How does the repository ensure understandability of the data?
Resource Implications of Implementation
The Common Requirements described above reflect the basic characteristics of trustworthy repositories based on the Catalogue of Common Requirements developed by the DSA-WDS Partnership Working Group on Repository Audit and Certification, a Working Group (WG) of the Research Data Alliance. Their goal is to create a set of harmonized common criteria for certification of repositories at the basic level, drawing from the requirements already put in place by the Data Seal of Approval (DSA) and the ICSU World Data System (WDS). The ultimate aim is to build a global framework for repository certification that moves from the basic level to the extended level (nestorSEAL DIN 31644) to the formal (ISO 16363) level.
As should be expected of a comprehensive accreditation process, providing sufficient evidence is somewhat involved and the amount of time and effort needed for the self-assessment depends on the level of maturity of the repository. Entities with existing business process and records management procedures or experience with audits or certifications should spend less time preparing the selfassessment. In general, while very well-prepared repositories may only need a few person-days to complete the assessment, the process usually takes two weeks to three months.
Several individuals may need to contribute to the assessment, which can require discussion with other data management and technical experts in the organization. Thus, it is difficult to estimate resource requirements for the self-assessment phase.
Text extracted from the Data Management Principles Implementation Guidelines